Back to Home / #uml / 2008 / 05 / Prev Day | Next Day
#uml IRC Logs for 2008-05-27

---Logopened Tue May 27 00:00:20 2008
00:54-!-balbir [~balbir@122.167.176.181] has quit [Ping timeout: 480 seconds]
01:46-!-cheako [~mmestnik@pyrrha.visi.com] has quit [Ping timeout: 480 seconds]
02:11-!-balbir [~balbir@59.145.136.1] has joined #uml
04:06-!-balbir [~balbir@59.145.136.1] has quit [Ping timeout: 480 seconds]
04:18-!-balbir [~balbir@59.145.136.1] has joined #uml
05:11-!-balbir [~balbir@59.145.136.1] has quit [Ping timeout: 480 seconds]
06:28-!-balbir [~balbir@59.145.136.1] has joined #uml
07:11-!-balbir [~balbir@59.145.136.1] has quit [Ping timeout: 480 seconds]
07:16-!-GoNoGo [~GoNoGo@pc114.pallas.cines.fr] has joined #uml
08:39-!-GoNoGo [~GoNoGo@pc114.pallas.cines.fr] has quit [Quit: Ex-Chat]
08:40-!-dang [~dang@75.38.192.168] has quit [Quit: Leaving.]
09:02-!-dang [~dang@aa-redwall.ghs.com] has joined #uml
09:07-!-balbir [~balbir@122.167.176.181] has joined #uml
10:03-!-jdike [~jdike@pool-96-237-183-188.bstnma.fios.verizon.net] has joined #uml
10:03<jdike:#uml>Hi guys
10:17<peterz:#uml>hey Jeff
10:28-!-JochenA [jochen@chinaloca.ozark.de] has joined #uml
10:36<JochenA:#uml>One of my uml guests just froze. I was removing a package (debian testing) and it stopped while 'Processing triggers for man-db ...'. It's not using any CPU, just frozen. Since it is still there should I get some debug data and if so how? Kernel is 2.6.25.4um running on Debian 2.6.18 with debian packaged skas patch applied.
10:37<jdike:#uml>can you attach gdb to it?
10:38<JochenA:#uml>I first thought it ran out of memory but that in the past gave some output of the console.
10:38<JochenA:#uml>How would I do that?
10:38<jdike:#uml>gdb linux lowest-uml-pid
10:39<JochenA:#uml>jdike: I'll give it a try...
10:39<JochenA:#uml>it is a stripped binary btw
10:39<jdike:#uml>that's not useful
10:40<jdike:#uml>gdb won't give you any useful information in that case
10:40<jdike:#uml>try stracing it and see what that says
10:41<JochenA:#uml>jdike: It is not stripped, just remembered that I did that in the past, but not anymore.
10:41<JochenA:#uml>sorry
10:42<JochenA:#uml>gdb is attached...
10:42<jdike:#uml>bt
10:43<JochenA:#uml>jdike: so what would you like me to do?
10:45<JochenA:#uml>the last line shows '0xb7f2c410 in __kernel_vsyscall ()'. Is that where it is actually stuck right now?
10:54<jdike:#uml>paste it to pastebin or something
11:10<JochenA:#uml>it doesn't really show anything yet. http://pastebin.com/d179ca64e
11:10<JochenA:#uml>There is nothing I should do to probe something?
11:12<jdike:#uml>yeah
11:12<jdike:#uml>bt
11:12<jdike:#uml>like I said above
11:14<JochenA:#uml>http://pastebin.com/d73534e21
11:15<jdike:#uml>OK
11:15<jdike:#uml>more or less what I expected
11:15<jdike:#uml>does it ping?
11:16<JochenA:#uml>no
11:17<jdike:#uml>give it a sysrq t
11:19<JochenA:#uml>The command isn't returning.
11:25<jdike:#uml>interesting
11:25<jdike:#uml>so interrupts totally aren't working
11:25<jdike:#uml>can you strace it, then ping it?
11:25<JochenA:#uml>I tried to attach with strace -p PID but: attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted
11:25<JochenA:#uml>I'm logged in as root. The guest is running as user uml
11:26<jdike:#uml>detach gdb from it
11:26<JochenA:#uml>if there is something more elaborate you'd like to try I could let you login as uml
11:27<JochenA:#uml>that returned the mconsole: ERR Sysrq not compiled in
11:27<jdike:#uml>at this point, what I want to see is how it reacts to an interrupt
11:27<JochenA:#uml>strace worked now and outputs quite franaticly...
11:28<jdike:#uml>strace -o log-file
11:29<JochenA:#uml>it is in an endless loop it seems. And it looks suspicious considering that I was just running ntpdate before: http://pastebin.com/dc3abefc
11:29<JochenA:#uml>ok, it pings
11:30<JochenA:#uml>so not as frozen as I thought, I'll try ssh
11:30<jdike:#uml>har
11:30<jdike:#uml>I have fixes for that
11:30<JochenA:#uml>I can ssh into it! But the console it absolutely unresponsive
11:30<jdike:#uml>hmm
11:31<jdike:#uml>the ssh connection is normal, though?
11:31<JochenA:#uml>what do you mean by normal? I just executed ps aux and I get a nice list. the dpkg process (which is haning at the other login) is <defunct>
11:32<jdike:#uml>what's its parent?
11:32<JochenA:#uml>it was spawned by aptitude
11:33<jdike:#uml>what's aptitude's state?
11:33<JochenA:#uml>I just tried mail. it prompted for subject, the body, then the CC but instead of exiting it is just waiting now.
11:33<JochenA:#uml>root 3343 0.0 6.8 75040 26840 pts/3 Sl+ 15:47 0:00 aptitude
11:34<jdike:#uml>strace it
11:35<JochenA:#uml>http://pastebin.com/d27696caf
11:36<JochenA:#uml>the output loops through the same as before but occasionally the output I just pasted flying by.
11:37<jdike:#uml>strace mail and/or aptitude
11:37<JochenA:#uml>I tried to ^C mail, which doesn't kill it. But I could ^Z it into the background
11:37<JochenA:#uml>I'll strace it...
11:40<JochenA:#uml>No strace installed, and guess what dpkg is doing upon installing it...
11:41<jdike:#uml>yeah
11:42<JochenA:#uml>http://pastebin.com/d52fa8783
11:42<jdike:#uml>do you have a similar system around from which you can just copy the binary?
11:42<JochenA:#uml>it unpacked strace but didn't set it up, but that was enough to make it available.
11:43<jdike:#uml>what about 3439?
11:44<JochenA:#uml>restart_syscall(<... resuming interrupted call ...>
11:45<JochenA:#uml>do you think it has to do with uml or is it something debian related. It is running testing not stable.
11:45<jdike:#uml>is that it
11:45<jdike:#uml>?
11:45<JochenA:#uml>yes
11:46<jdike:#uml>what does kill -ALRM 3434 do?
11:47<JochenA:#uml>that killed it and the mail just got delivered
11:47<jdike:#uml>so timers seem confused still
11:48<jdike:#uml>what does time sleep 1 do?
11:48<JochenA:#uml>it's not returning. But I got all the dpkg killed
11:51-!-ram [~ram@pool-71-245-96-80.nycmny.fios.verizon.net] has quit [Ping timeout: 480 seconds]
11:51<JochenA:#uml>for the timeline: I ran aptitude, spawning dpkg ->defunct. Ran apt-get (install strace) which spawned dpkg -> defunct. Killed apt-get which returned that terminal but dpkg still defunct. Killing aptitude got rid of all processes including the dpkg.
11:51<JochenA:#uml>I ^C time. Returned all 0.0000s
11:53<JochenA:#uml>system clock is halted
11:53<JochenA:#uml>date outputs the same time which is two hours behind.
11:54<JochenA:#uml>doesn't change anymore
11:54<JochenA:#uml>it seems my playing around with ntpdate wasn't appreciated.
11:55<JochenA:#uml>I started ntpdate sequentially (manually) dozens of time to see whether it chooses a IPv6 or IPv4 to my ntpd (running on the host).
11:56<jdike:#uml>playing with ntp on the host?
11:56<JochenA:#uml>no, ntpdate on the guest.
11:56<jdike:#uml>OK
11:56<JochenA:#uml>ntpd is running on the host.
11:56<JochenA:#uml>Has been doing so for years without problems, though.
11:57<jdike:#uml>this was correlated with playing with ntp on the guest?
11:57<JochenA:#uml>thats what I did before this happend. I installed ntpdate to test that and then wanted to purge it again, when dpkg just hung.
11:57<jdike:#uml>OK
11:58<jdike:#uml>I have a fix for this (and other related problems)
11:59<JochenA:#uml>good. Since I don't intend to run ntpdate again it won't be a problem. Just wanted to give you a chance to see if there is a bug for you.
11:59<jdike:#uml>there is, and it took me a while to track down, but it is fixed
12:00<JochenA:#uml>This uml is my toybox to run debian testing and to run desktop programs on that computer (dedicated server which I've never seen face to face).
12:09<JochenA:#uml>I'm surprised that the system works at all with a stalled clock! I'm shuting it down now.
12:12<JochenA:#uml>Ok, shutting down is just too much for it. Need to kill it.
12:12<jdike:#uml>hehe
12:13<jdike:#uml>in the first case I saw, everything stopped working when the clock stopped
12:14<JochenA:#uml>I'm not sure because I didn't write it down but it seems that the hosts didn't see any cpu time used up. Especially when closed everything in the desktop session I would have thought it just create some load.
12:34-!-ram [~ram@bi01p1.co.us.ibm.com] has joined #uml
12:36-!-cheako [~mmestnik@pyrrha.visi.com] has joined #uml
13:07-!-hfb [~hfb@pool-71-118-254-245.lsanca.dsl-w.verizon.net] has joined #uml
13:49-!-kos_tom [~thomas@humanoidz.org] has joined #uml
14:05-!-Netsplit charon.oftc.net <-> reticulum.oftc.net quits: Hunger, SNy, desaster
14:07-!-Netsplit over, joins: SNy, Hunger, desaster
15:32<jetlag:#uml>anybody awake?
15:38<caker:#uml>jdike doesn't sleep. He waits.
15:45*jdike:#uml is jetlagged
15:48<jetlag:#uml>I'm trying to run a nested guest and I'm not having much success
15:49<jdike:#uml>you won't
15:49<jdike:#uml>it needs a bit of work before that's possible
15:50<jetlag:#uml>oh
16:18-!-aindilis [andrewdo@75.146.96.195] has quit [Read error: Connection reset by peer]
16:18-!-aindilis [andrewdo@75.146.96.195] has joined #uml
17:10-!-dang [~dang@aa-redwall.ghs.com] has quit [Quit: Leaving.]
17:22-!-cheako [~mmestnik@pyrrha.visi.com] has left #uml []
18:47-!-kos_tom [~thomas@humanoidz.org] has quit [Quit: I like core dumps]
19:01-!-hfb [~hfb@pool-71-118-254-245.lsanca.dsl-w.verizon.net] has quit [Quit: Leaving]
19:04-!-dang [~dang@75.38.192.168] has joined #uml
19:39-!-jdike [~jdike@pool-96-237-183-188.bstnma.fios.verizon.net] has quit [Quit: Leaving]
22:40-!-ram [~ram@bi01p1.co.us.ibm.com] has quit [Ping timeout: 480 seconds]
23:59-!-VS_ChanLog [~stats@ns.theshore.net] has left #uml [Rotating Logs]
23:59-!-VS_ChanLog [~stats@ns.theshore.net] has joined #uml
---Logclosed Wed May 28 00:00:18 2008