--- | Log | opened Wed May 04 00:00:04 2022 |
01:01 | -!- | redgloboli [~redglobol@0002ba80.user.oftc.net] has quit [Quit: ...enter the matrix...] |
01:05 | -!- | redgloboli [~redglobol@redgloboli.de] has joined #xen |
01:05 | -!- | redgloboli is "redgloboli" on #xen #virt #kvm #bitlbee #alpine-linux |
01:11 | -!- | weylkesi1 [~weylkesiq@7YZAABBRE.tor-irc.dnsbl.oftc.net] has joined #xen |
01:11 | -!- | weylkesi1 is "weylkesiq" on #xen |
01:13 | -!- | weylkesiq [~weylkesiq@0BGAAADY9.tor-irc.dnsbl.oftc.net] has quit [Ping timeout: 480 seconds] |
01:28 | -!- | jgross_ is now known as jgross |
01:41 | -!- | ChmEarl [~prymar56@0002b86c.user.oftc.net] has quit [Quit: Leaving] |
03:38 | -!- | Maxi[m] [~M189934ma@0002d8c3.user.oftc.net] has quit [Quit: Bridge terminating on SIGTERM] |
03:38 | -!- | Rhys- is now known as Rhys |
03:51 | -!- | Maxi[m] [~M189934ma@2001:470:1af1:101::4435] has joined #xen |
03:51 | -!- | Maxi[m] is "org.matrix:189934" on #xen #debian-xen #debian-kde |
07:55 | -!- | seba [~seba@kratzbaum.someserver.de] has quit [Ping timeout: 480 seconds] |
09:24 | -!- | ChmEarl [~prymar56@098-147-150-167.res.spectrum.com] has joined #xen |
09:24 | -!- | ChmEarl is "Mark Pryor" on #xen ##xen-packaging #mock #packaging #virt |
10:32 | -!- | seba [~seba@kratzbaum.someserver.de] has joined #xen |
10:32 | -!- | seba is "Sebastian" on #xen @#openstack-meetings |
11:03 | -!- | neilthereildeil [~oftc-webi@pool-71-191-164-234.washdc.fios.verizon.net] has joined #xen |
11:03 | -!- | neilthereildeil is "OFTC WebIRC Client" on #xen |
11:03 | <neilthereildeil> | hey guys |
11:04 | <neilthereildeil> | im seeing an issue where the xen 4.16.1 server hangs under heave load |
11:04 | <neilthereildeil> | i amd creating and destroying many VMs a lot |
11:09 | <julieng> | neilthereildeil: If the hang is not forever, then it may be related to https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=d0887cc6b16e72829ac7e117bd65697463aabfe7. The patch is missing in 4.16. |
11:13 | <royger> | neilthereildeil: you might want to enable the watchdog |
11:14 | <neilthereildeil> | im pasting the kernel buffer output and i ll explain it to u guys |
11:15 | <neilthereildeil> | nah its too large |
11:16 | <neilthereildeil> | im getting an NMI |
11:16 | <julieng> | You could use pastebin.com for large output. |
11:16 | <neilthereildeil> | this is over 512K |
11:16 | <neilthereildeil> | https://pastebin.com/LX5zq1d3 |
11:18 | <neilthereildeil> | i have a warning |
11:18 | <neilthereildeil> | May 3 16:41:52 server kernel: [38330.274859] WARNING: CPU: 41 PID: 1232789 at arch/x86/xen/multicalls.c:102 |
11:18 | <neilthereildeil> | theres a ------------[ cut here ]------------ |
11:18 | <neilthereildeil> | May 3 16:41:52 server kernel: [38330.274859] WARNING: CPU: 41 PID: 1232789 at arch/x86/xen/multicalls.c:102 xen_mc_flush+0x16a/0x1a0 |
11:19 | <neilthereildeil> | and then i see another error |
11:19 | <neilthereildeil> | May 3 16:41:52 server kernel: [38330.274913] INFO: NMI handler (ghes_notify_nmi) took too long to run: 37.432 msecs |
11:19 | <neilthereildeil> | everytime this server hangs, i see an NMI |
11:22 | <neilthereildeil> | royger: whats the watchdog and how will it help me? |
11:22 | <julieng> | Interesting, I saw this message 1h ago on 5.10 an hour ago. Looking at the code, the warning should be followed by a error message looking like "X of X multicall(s) failed". If you have it, can you post it? |
11:23 | <royger> | hm, also `xl dmesg` (or serial) might contain some more information about what failed |
11:25 | <royger> | neilthereildeil: watchdog detects if Xen gets stuck (ie: a deadlock for example or an operation taking too long). It's enabled in the Xen command line, see https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html watchdog option |
11:25 | <neilthereildeil> | yea im running kernel 5.10 also |
11:25 | <neilthereildeil> | May 3 May 4 09:36:19 server kernel: [ 0.000000] Linux version 5.10.0-13-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.106-1 (2022-03-17) |
11:26 | <neilthereildeil> | royger: i cannot xl dmesg because the physical server is hung |
11:26 | <royger> | neilthereildeil: I guess you also don't have a serial console attached to tthe server? |
11:27 | <neilthereildeil> | julieng: the only reference to multicalls i see is "May 3 16:41:52 server kernel: [38330.274859] WARNING: CPU: 41 PID: 1232789 at arch/x86/xen/multicalls.c:102 xen_mc_flush+0x16a/0x1a0" |
11:27 | <neilthereildeil> | nothing baout multicalls failing |
11:27 | <neilthereildeil> | royger: i could work on attaching serial to the server |
11:28 | <royger> | neilthereildeil: without a serial attached watchdog is not going to help, since in case it triggers the information about what triggered the watchdog will be lost |
11:29 | <royger> | to debug this you likely want serial attached plus a debug build of Xen so it's more verbose |
11:29 | <neilthereildeil> | yea im already running debug build |
11:30 | <neilthereildeil> | so im looking at this dmesg log i pasted |
11:30 | <neilthereildeil> | i see a lot of stacks dumped |
11:31 | <neilthereildeil> | and only 1 warning |
11:31 | <neilthereildeil> | is it a problem is stacks are dumped, or only if theres a warning? |
11:36 | <royger> | there's an operation inside of the muticall that has failed, but yoour trace doesn't contain which one it is. So it's hard to know what's going one |
11:37 | <neilthereildeil> | multicall is the term for hypercall in xen, right? |
11:37 | <royger> | we could likely get more output from the serial |
11:37 | <royger> | multicalls are multiple hypercalls batched into a single hypercall |
11:43 | <neilthereildeil> | so it looks like an NMI was sent from CPU 42->41, and CPU41 was originally executing xen_mc_flush, but a little bit later when the warning was printed, CPU41 was executing __xen_mc_entry |
11:43 | <neilthereildeil> | is my analysis correct? |
11:44 | <neilthereildeil> | is there anythin else that you all see in this log that i dont see? |
11:54 | <neilthereildeil> | also, can someone please explain there are 2 callstacks separated by lines 41-43? |
11:56 | <neilthereildeil> | it seems like the first callstack has 3 more functions that were called? xen_unpin_page->__xen_mc_entry->xen_mc_flush? |
11:56 | <neilthereildeil> | why does it print a similar callstack twice? |
13:46 | -!- | Rhys [Rhys@help.lux.melted.me] has quit [Quit: R.I.P] |
13:46 | -!- | Rhys [~Rhys@help.lux.melted.me] has joined #xen |
13:46 | -!- | Rhys is "Rhys" on #xen #virt #alpine-linux |
14:13 | -!- | neilthereildeil [~oftc-webi@pool-71-191-164-234.washdc.fios.verizon.net] has quit [Quit: Page closed] |
15:39 | <ClyneS> | I still have not heard anything from my submission to the ML re: my issue with 5.15.29+ and the xen-netback reverts |
16:35 | -!- | myx_ [~quassel@0002d8c3.user.oftc.net] has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] |
17:11 | -!- | myx_ [~quassel@2a02:2455:5a0:e0a:df21:d763:d1c1:4615] has joined #xen |
17:11 | -!- | myx_ is "myx" on #debian-xen #xen |
22:08 | -!- | jgross_ [~juergen_g@2a01:41e1:2f5f:e400:365:7035:3e12:c85] has joined #xen |
22:08 | -!- | jgross_ is "realname" on #xen |
22:14 | -!- | jgross [~juergen_g@2a01:41e1:2f21:bb00:5fc4:ca52:df97:420a] has quit [Ping timeout: 480 seconds] |
--- | Log | closed Thu May 05 00:00:06 2022 |