Back to Home / #xen / 2022 / 05 / Prev Day | Next Day
#xen IRC Logs for 2022-05-04

---Logopened Wed May 04 00:00:04 2022
01:01-!-redgloboli [] has quit [Quit: ...enter the matrix...]
01:05-!-redgloboli [] has joined #xen
01:05-!-redgloboli is "redgloboli" on #xen #virt #kvm #bitlbee #alpine-linux
01:11-!-weylkesi1 [] has joined #xen
01:11-!-weylkesi1 is "weylkesiq" on #xen
01:13-!-weylkesiq [] has quit [Ping timeout: 480 seconds]
01:28-!-jgross_ is now known as jgross
01:41-!-ChmEarl [] has quit [Quit: Leaving]
03:38-!-Maxi[m] [] has quit [Quit: Bridge terminating on SIGTERM]
03:38-!-Rhys- is now known as Rhys
03:51-!-Maxi[m] [~M189934ma@2001:470:1af1:101::4435] has joined #xen
03:51-!-Maxi[m] is "org.matrix:189934" on #xen #debian-xen #debian-kde
07:55-!-seba [] has quit [Ping timeout: 480 seconds]
09:24-!-ChmEarl [] has joined #xen
09:24-!-ChmEarl is "Mark Pryor" on #xen ##xen-packaging #mock #packaging #virt
10:32-!-seba [] has joined #xen
10:32-!-seba is "Sebastian" on #xen @#openstack-meetings
11:03-!-neilthereildeil [] has joined #xen
11:03-!-neilthereildeil is "OFTC WebIRC Client" on #xen
11:03<neilthereildeil>hey guys
11:04<neilthereildeil>im seeing an issue where the xen 4.16.1 server hangs under heave load
11:04<neilthereildeil>i amd creating and destroying many VMs a lot
11:09<julieng>neilthereildeil: If the hang is not forever, then it may be related to;a=commit;h=d0887cc6b16e72829ac7e117bd65697463aabfe7. The patch is missing in 4.16.
11:13<royger>neilthereildeil: you might want to enable the watchdog
11:14<neilthereildeil>im pasting the kernel buffer output and i ll explain it to u guys
11:15<neilthereildeil>nah its too large
11:16<neilthereildeil>im getting an NMI
11:16<julieng>You could use for large output.
11:16<neilthereildeil>this is over 512K
11:18<neilthereildeil>i have a warning
11:18<neilthereildeil>May 3 16:41:52 server kernel: [38330.274859] WARNING: CPU: 41 PID: 1232789 at arch/x86/xen/multicalls.c:102
11:18<neilthereildeil>theres a ------------[ cut here ]------------
11:18<neilthereildeil>May 3 16:41:52 server kernel: [38330.274859] WARNING: CPU: 41 PID: 1232789 at arch/x86/xen/multicalls.c:102 xen_mc_flush+0x16a/0x1a0
11:19<neilthereildeil>and then i see another error
11:19<neilthereildeil>May 3 16:41:52 server kernel: [38330.274913] INFO: NMI handler (ghes_notify_nmi) took too long to run: 37.432 msecs
11:19<neilthereildeil>everytime this server hangs, i see an NMI
11:22<neilthereildeil>royger: whats the watchdog and how will it help me?
11:22<julieng>Interesting, I saw this message 1h ago on 5.10 an hour ago. Looking at the code, the warning should be followed by a error message looking like "X of X multicall(s) failed". If you have it, can you post it?
11:23<royger>hm, also `xl dmesg` (or serial) might contain some more information about what failed
11:25<royger>neilthereildeil: watchdog detects if Xen gets stuck (ie: a deadlock for example or an operation taking too long). It's enabled in the Xen command line, see watchdog option
11:25<neilthereildeil>yea im running kernel 5.10 also
11:25<neilthereildeil>May 3 May 4 09:36:19 server kernel: [ 0.000000] Linux version 5.10.0-13-amd64 ( (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.106-1 (2022-03-17)
11:26<neilthereildeil>royger: i cannot xl dmesg because the physical server is hung
11:26<royger>neilthereildeil: I guess you also don't have a serial console attached to tthe server?
11:27<neilthereildeil>julieng: the only reference to multicalls i see is "May 3 16:41:52 server kernel: [38330.274859] WARNING: CPU: 41 PID: 1232789 at arch/x86/xen/multicalls.c:102 xen_mc_flush+0x16a/0x1a0"
11:27<neilthereildeil>nothing baout multicalls failing
11:27<neilthereildeil>royger: i could work on attaching serial to the server
11:28<royger>neilthereildeil: without a serial attached watchdog is not going to help, since in case it triggers the information about what triggered the watchdog will be lost
11:29<royger>to debug this you likely want serial attached plus a debug build of Xen so it's more verbose
11:29<neilthereildeil>yea im already running debug build
11:30<neilthereildeil>so im looking at this dmesg log i pasted
11:30<neilthereildeil>i see a lot of stacks dumped
11:31<neilthereildeil>and only 1 warning
11:31<neilthereildeil>is it a problem is stacks are dumped, or only if theres a warning?
11:36<royger>there's an operation inside of the muticall that has failed, but yoour trace doesn't contain which one it is. So it's hard to know what's going one
11:37<neilthereildeil>multicall is the term for hypercall in xen, right?
11:37<royger>we could likely get more output from the serial
11:37<royger>multicalls are multiple hypercalls batched into a single hypercall
11:43<neilthereildeil>so it looks like an NMI was sent from CPU 42->41, and CPU41 was originally executing xen_mc_flush, but a little bit later when the warning was printed, CPU41 was executing __xen_mc_entry
11:43<neilthereildeil>is my analysis correct?
11:44<neilthereildeil>is there anythin else that you all see in this log that i dont see?
11:54<neilthereildeil>also, can someone please explain there are 2 callstacks separated by lines 41-43?
11:56<neilthereildeil>it seems like the first callstack has 3 more functions that were called? xen_unpin_page->__xen_mc_entry->xen_mc_flush?
11:56<neilthereildeil>why does it print a similar callstack twice?
13:46-!-Rhys [] has quit [Quit: R.I.P]
13:46-!-Rhys [] has joined #xen
13:46-!-Rhys is "Rhys" on #xen #virt #alpine-linux
14:13-!-neilthereildeil [] has quit [Quit: Page closed]
15:39<ClyneS>I still have not heard anything from my submission to the ML re: my issue with 5.15.29+ and the xen-netback reverts
16:35-!-myx_ [] has quit [Quit: - Chat comfortably. Anywhere.]
17:11-!-myx_ [~quassel@2a02:2455:5a0:e0a:df21:d763:d1c1:4615] has joined #xen
17:11-!-myx_ is "myx" on #debian-xen #xen
22:08-!-jgross_ [~juergen_g@2a01:41e1:2f5f:e400:365:7035:3e12:c85] has joined #xen
22:08-!-jgross_ is "realname" on #xen
22:14-!-jgross [~juergen_g@2a01:41e1:2f21:bb00:5fc4:ca52:df97:420a] has quit [Ping timeout: 480 seconds]
---Logclosed Thu May 05 00:00:06 2022