Back to Home / #uml / 2007 / 02 / Prev Day | Next Day
#uml IRC Logs for 2007-02-24

---Logopened Sat Feb 24 00:00:59 2007
03:08|-|pgstudy [] has joined #uml
03:33|-|Netsplit <-> quits: tchan
03:33|-|weasel [] has quit [Remote host closed the connection]
03:34|-|Netsplit over, joins: tchan
03:45|-|weasel [] has joined #uml
04:04|-|flatronf700B [~flatronf7@] has quit [Ping timeout: 480 seconds]
04:14|-|tyler [~tyler@] has joined #uml
04:19|-|tyler [~tyler@] has quit [Remote host closed the connection]
04:19|-|tyler [~tyler@] has joined #uml
04:42|-|kokoko1 [~Slacker@] has quit [Ping timeout: 480 seconds]
06:48|-|baroni [] has joined #uml
07:34|-|pgstudy [] has quit [Remote host closed the connection]
08:36|-|orionrobots [~danny@] has joined #uml
08:37<orionrobots>Hi all, I have some interesting process behaviour. The process is inside a UML process, and may or may not be related to that - my suspicion is that it is. Given that there are some kernel hacking guru's here - you may understand what is going on anyway.
08:38<orionrobots>The process is simple - "cat /var/run/". I found this when apache would not stop properly or start, leaving indefinite runs of the init.d scripts which had to be killed to gain shell interaction again..
08:38<orionrobots>I was able to actually kill the apache processes - but this one, and another identical one will not even respond to a "kill -9".
08:38<DeHackEd>tried to strace it?
08:38<orionrobots>I attempted to attach gdb to it - gdb hung too..
08:39<orionrobots>Hmm - strace as root returns "attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted"
08:40<orionrobots>Attaching to the /other/ cat process hung strace too.. No ctrl-c response..
08:40<DeHackEd>you can't trace a program already being traced/debugged or a kernel process
08:40<orionrobots>Doh.. That should probably be obvious.
08:40<DeHackEd>is there a hanging gdb process?
08:41<DeHackEd>then it's holding the ptrace lock
08:42<orionrobots>Hmm - and the gdb on the first process 7347 will not respond to a ctrl-c at all...
08:42<DeHackEd>I've seen apache start up really slowly due to /dev/random blocking, but this takes the cake.
08:42<orionrobots>An attempt to ssh in (the console and other ssh session being tied up by hung gdb and strace) does not seem to be presenting me with a bash prompt.. B*gger.
08:43<orionrobots>I may have to try an mconsole halt. I could at this point try to strace/or get bt from the uml kernel.
08:45<orionrobots>Stracing the uml kernel shows a very busy output.. Would you be able to make anything of a pastebin'd chunk of that? I cant say it looks helpful - mostly about gettimeofday, setitimer, sigprocmask, sigreturn and nanosleep - a repeating block.
08:46<orionrobots>In this state - it is not actually consuming all that much, if any cpu cycles on the host - so it sounds like an io lockup maybe - possibly down to the two cat processes both being "cat /var/run/" - the same file..
08:50<orionrobots>I will chuck a link to a bt and strace chunk from the kernel on pastebin once it gets there..
08:56<orionrobots>Hmm- eventually I see the "soft lockup detected" message. Still not sure how relevant those are..
09:14<DeHackEd>pastebin doesn't freaking work.
09:14<orionrobots>Hold on -- jsut setting up my own pastebin on orionrobots...
09:15<DeHackEd>does 'ps' show the process in an S or D state ?
09:16<orionrobots>The kernel no. The process inside - I was using -ef (force of habit)- no state shown. I can't get an interactive shell inside the vm to try it another way though.
09:18<DeHackEd>sysrq t ?
09:18<orionrobots>How can I send that to the vm (sorry - bit of a newbie here)?
09:20<orionrobots>Backtrace -
09:20<orionrobots>Oh sorry- got it - mconsole..
09:21<orionrobots>Hmm - "ERR Sysrq not compiled in"
09:25<DeHackEd>permission denied to view url
09:26<orionrobots>Oh - hold on then - should be anon viewable.. Let me sort it out..
09:27<orionrobots>Try again now..
09:29<orionrobots>Hmm - maybe I can get something from the proc filesystem. Mconsole allows listing contents of proc files..
09:31<DeHackEd>I'm no dev, but everything looks typical compared to my system
09:32<orionrobots>Basically - it is just idling. So the process may be hung waiting for an IO lock...
09:33<DeHackEd>what you'd want is to see the stack trace for that thread, and I don't know how to do that.
09:33<orionrobots>Hmm - well each thread in uml kernel maps to a thread on the host - does it not? However getting the right one could be interesting..
09:33<DeHackEd>I don't think it uses pthreads to do it though.
09:35<orionrobots>Just stfw - I can see it is the glibc threading API...
09:39<orionrobots>Hmm - if those processes are linked in anyway - couldn't I list them on the host, and try to pick on those with an abnormal status?
09:40<orionrobots>S is an uninterruptable sleep - sounds likely..
09:40<DeHackEd>they wouldn't show an abnormal status because it's the UML kernel that considers them in an abnormal status. from the host's perspective they're just being traced by the UML kernel and are currnetly stopped in debugging.
09:40<DeHackEd>D is uninterruptable. That's bad. S is okay.
09:41<orionrobots>And T would be the UML trace then on most of the others?
09:41<DeHackEd>T means paused/debugged. those would be the userspace threads inside the UML
09:42<orionrobots>And the proc/<pid> files - I don't seem to see much of use there. I am trying to look at the host proc system as an example to use with the mconsole proc command..
09:43<orionrobots>proc/<pid>/status - That might do it..
09:44<orionrobots>And bingo - 7374 - the process that is hung - status is D - disk sleep..
09:46<orionrobots>Shame we cannot do an ls as well as more. /proc/<pid>/fd may show me what files it has in use..
09:50<orionrobots>Hmm - the gdb is S - sleeping. I am not sure how I can kill it without an interactive console to send a -9 to it..
09:52<orionrobots>Any ideas? I seem to remember mconsole allowing you to spawn a "sh -c" command, but the current help does not show such a thing here..
10:03<orionrobots>Ah well - I think I may have to halt it, reboot and carry on - hope it doesn't happen again.. Not sure what else to do now - too much time wasted..
10:20<DeHackEd>I think you need a patch to allow mconsole to do that
10:20<orionrobots>Ah. I had to cut my losses and halt it - but next time I will make sure I have more consoles ready to go if I get into a state like that again - to make sure I don't loose all interaction..
10:21<DeHackEd>how much RAM space did you allocate the UML?
10:21<orionrobots>256Mb + 512Mb swap.
10:21<DeHackEd>I assume you have more than 512 MB of RAM in your system
10:21<orionrobots>1Gb in the host.
10:22<orionrobots>Top (on the guest) gives me this: Mem 255928k total, 229312k used, 26616k free, 7828k buffers
10:22<orionrobots>And Swap: 524280k total, 0k used, 524280k free, 106256k cached
10:22<orionrobots>Mostly using it to test LAMP stuff.
10:26|-|baroni [] has quit [Quit: :wq]
12:05|-|orionrobots [~danny@] has left #uml []
12:22|-|baroni [] has joined #uml
12:29|-|baroni [] has quit [Remote host closed the connection]
12:47|-|baroni [] has joined #uml
13:06|-|HuK0B [~HuK0B@] has joined #uml
13:38|-|tyler [~tyler@] has quit [Ping timeout: 480 seconds]
14:03|-|tyler [~tyler@] has joined #uml
14:06|-|tyler [~tyler@] has quit [Remote host closed the connection]
14:11|-|tyler [~tyler@] has joined #uml
14:17|-|ousado__ [] has joined #uml
14:24|-|ousado_ [] has quit [Ping timeout: 480 seconds]
14:40|-|ram [] has joined #uml
15:36|-|tyler [~tyler@] has quit [Read error: Connection reset by peer]
15:44|-|tyler [~tyler@] has joined #uml
15:48|-|tyler [~tyler@] has quit [Read error: Connection reset by peer]
16:04|-|tyler [~tyler@] has joined #uml
16:26|-|tyler [~tyler@] has quit [Read error: Connection reset by peer]
16:27|-|tyler [~tyler@] has joined #uml
17:01|-|HuK0B [~HuK0B@] has quit [Read error: Connection reset by peer]
17:06|-|ram [] has quit [Read error: Operation timed out]
18:05|-|tyler [~tyler@] has quit [Ping timeout: 480 seconds]
19:42|-|ousado_ [] has joined #uml
19:49|-|ousado__ [] has quit [Ping timeout: 480 seconds]
20:03|-|mjf [] has quit [Quit: leaving]
21:08|-|ram [] has joined #uml
21:43|-|ousado__ [] has joined #uml
21:50|-|ousado_ [] has quit [Ping timeout: 480 seconds]
22:58|-|VS_ChanLog [] has left #uml [Rotating Logs]
22:58|-|VS_ChanLog [] has joined #uml
23:01|-|baroni [] has quit [Quit: :wq]
23:04|-|ram [] has quit [Read error: Operation timed out]
23:30|-|flatronf700B [~flatronf7@] has joined #uml
---Logclosed Sun Feb 25 00:00:50 2007