Back to Home / #uml / 2007 / 02 / Prev Day | Next Day
#uml IRC Logs for 2007-02-19

---Logopened Mon Feb 19 00:00:38 2007
00:09|-|cmantito [] has joined #uml
01:41|-|Netsplit <-> quits: mode2, fo0bar, baroni, Nem^
01:42|-|Netsplit over, joins: Nem^, baroni, fo0bar, mode2
01:42|-|fo0bar [] has quit [Remote host closed the connection]
01:43|-|fo0bar [] has joined #uml
02:12|-|motp [~motp@] has joined #uml
02:37|-|motp [~motp@] has quit [Quit: Leaving]
03:16|-|tyler [~tyler@] has joined #uml
03:29|-|baroni [] has quit [Remote host closed the connection]
04:38|-|Netsplit <-> quits: peterz
04:40|-|Netsplit over, joins: peterz
04:46|-|ousado__ [] has joined #uml
04:52|-|ousado_ [] has quit [Ping timeout: 480 seconds]
05:01|-|tyler [~tyler@] has quit [Ping timeout: 480 seconds]
05:03|-|tyler [~tyler@] has joined #uml
05:48|-|pgstudy [] has joined #uml
06:00|-|kokoko1 [~Slacker@] has joined #uml
06:18|-|mode2_ [] has joined #uml
06:25|-|mode2 [] has quit [Ping timeout: 480 seconds]
06:49|-|alb [] has quit [Read error: Connection reset by peer]
06:53|-|albertito [] has joined #uml
06:57|-|tyler [~tyler@] has quit [Read error: Connection reset by peer]
07:05|-|tyler [~tyler@] has joined #uml
07:12|-|tyler [~tyler@] has quit [Read error: Connection reset by peer]
07:17|-|tyler [~tyler@] has joined #uml
07:45|-|baroni [] has joined #uml
09:07|-|silug_ [~steve@] has quit [Read error: Connection reset by peer]
10:01|-|DeHackEd [] has joined #uml
10:25|-|jdike [] has joined #uml
10:25<jdike>Hi guys
10:28<DeHackEd>I'm having one of those kernel-wont-go-after-mounting-rootfs hanging problems. I don't think it's due to too new/old a system... attempts to debug are not going so hot, gdb can't seem to properly set a breakpoint
10:30<DeHackEd>host + SKAS3 (using skas0 for debugging) + supermount, guest 2.6.20 vanilla
10:31<jdike>Can you paste the boot log someplace?
10:34<jdike>It's sucking up CPU time at the hang?
10:34<DeHackEd>it is
10:35<jdike>I need access to a system on which this happens to figure out what's happening
10:35<jdike>caker said he could provide this
10:36<DeHackEd>apparently I also have one.
10:36<jdike>I don't see that here, and I'm running more or less the same versions of everything
10:36<caker>good morning
10:36<caker>another data point: I'm seeing this happen with 2.6.18-um through 2.6.20-um on a 2.6.16 *host*, which is interesting
10:36<DeHackEd>Host is Gentoo with all the fixings. Guest is fedora core 6. I also tried specifying init=busybox which is statically linked without nptl.
10:36<caker>jdike: I'll set up a machine right now for ya
10:37<DeHackEd>you can use mine. no root, but no serious security limitations in place either.
10:37<jdike>I don't think it's nptl
10:37<DeHackEd>elimination of tls was something that worked in the past. had to try it.
10:37<jdike>I briefly looked at this, and init is segfaulting strangely early on
10:38[~]caker can never remember adduser or useradd and which one creates home dirs and whatnot
10:38<caker>for the record, it's adduser
10:38<jdike>one exists and one doesn't, so I use the one that exists
10:38<jdike>and if both exist, I am screwed
10:38<DeHackEd>I made a little patch to print system calls made by user space, one line printf addition. I did get some output. The system call it died on is #91 (munmap) after some mprotects
10:39<jdike>init is reading a 0 from its stack and rereferencing it
10:40<jdike>I didn't figure out what that 0 was really supposed to be
10:40|-|richardw [] has joined #uml
10:41<caker>jdike: for debugging this -- is space available inside the UML's filesystem important? Otherwise, I'll just copy the template FS with little free space without resizing
10:41[~]caker installs emacs for you
10:41<jdike>init is dying so early, you could probably make an fs with init + a few libraries and that would do
10:41<caker>x11-common required ?? geesh
10:41<jdike>it'll feel just like home
10:42<jdike>I was debugging from wstearns box, using xemacs over the internet with good interactivity
10:42<jdike>not too different from the local lan
10:43<caker>uh, so how does one exit emacs? :)
10:43[~]caker n00b
10:44[~]jdike has to have the fingers pretend to do it, and watch them, to answer such questions
10:44<jdike>the fingers know, the brain has forgotten
10:48|-|baroni [] has quit [Remote host closed the connection]
10:53<jdike>(mis)behaves as advertised
10:53<DeHackEd>so progress is being made?
10:53<jdike>no symbols
10:53<jdike>where's the source tree?
10:54<caker>on another box -- just grab vanilla and recompile?
10:54<jdike>so it doesn't depend on the configuration at all?
10:55<caker>I really don't change many config options between versions, but you can snarf the config from that existing kernel to make sure
10:55<jdike>100%[====================================>] 43,375,937 10.96M/s ETA 00:00
10:55<jdike>11M/s, me likee
10:59<caker>jdike: also, make -j8 :)
11:00<jdike>hehe, OK
11:00<jdike>I usually do -j 4
11:00<DeHackEd>should I be envious of your many CPUs and high speed internet?
11:01<caker>weird ..take a look at vmstat 1 on that box
11:01<caker>seems like all the UMLs are in D+ state
11:01<caker>but IO is fine
11:02|-|silug_ [~steve@] has joined #uml
11:02<caker>DeHackEd: hehe
11:03<jdike>can you tell if the UMLs are OK?
11:03<caker>I don't think so -- people coming into #linode saying no
11:04<jdike>there's nothing in dmesg
11:05[~]DeHackEd tries a 2.6.17.x guest on the same machine.
11:05<jdike>very little I/O activity
11:06<DeHackEd>100%[++++++++++++++++++==================>] 41,285,784 1.12M/s ETA 00:00
11:08<DeHackEd>arch/um/os-Linux/sys-i386/registers.c:137: error: 'JB_PC' undeclared (first use in this function) somebody hates me.
11:09<jdike>that's been fixed more recently
11:09<jdike>UML used jumpbuf access macros which were later removed
11:11<jdike>caker, I see fewer D's
11:11<jdike>did things just clear up?
11:12<caker>ok, the BBU failed on the RAID card and it turned off cache
11:13<jdike>why is dmesg so short?
11:13<jdike>only 1-2 lines
11:14<jdike>UML compiles right quick
11:16<caker>I cleared dmesg ...
11:17<ousado__>hi, I'm testing a new kernel feature (kevent by Evgeniy Polyakov) in UML, but it's not included into kernel yet, so I have to use direct syscalls in the test-apps. gcc complains about various conflicting includes, are there any general hints for compiling user-space apps that are using direct syscalls in UML? (missing ARCH=um is not the problem)
11:21<jdike>i.e. using syscallX() in a userspace program?
11:22<jdike>that's not UML-related at all
11:22<jdike>syscallX is going away, or has gone away
11:22<jdike>I had to remove them from UML
11:23<jdike>use the syscall() macro instead
11:24<ousado__>ah, ok, that's what I'm doing ..
11:29<jdike>so, what's gcc complaining about?
11:31<ousado__>It's complaining about conflicting definitions in standard includes vs. build specific includes, which it doesn't outside UML
11:33|-|HuK0B^OUT [~HuK0B@] has joined #uml
11:34<jdike>it's got to be a distro-related thing
11:34<jdike>it can't have anything to do with UML itself
11:35<jdike>if you just want the thing to run, build on the host and copy it into the UML
11:35<HuK0B^OUT>skas-fremap-2.6.18-v9-pre9 --> what is this
11:36<jdike>a UML speedup that BB has been working on
11:39|-|ram [] has joined #uml
11:42<DeHackEd>sysrq t doesn't list init or busybox... that might add to the "hosed stack" theory..
11:44<jdike>I'm seeing init branching to a bogus address
11:45<jdike>caker, how about quilt?
11:48<ousado__>jdike: thanks anyway
11:50<DeHackEd>anything I can do to assist?
11:52<jdike>I'm about to single-step init to get an instruction trace of it
11:52<jdike>to find out where it was just before it branched to oblivion
11:59<caker>jdike: done (quilt)
11:59<caker>well, in 5 seconds, anyway...
11:59<jdike>-bash: quilt: command not found
11:59<jdike>heh, OK
12:00<caker>jdike: want some pizza? My treat :)
12:00[~]caker wants pizza, too
12:01<jdike>might be a bit cold when it gets here :-)
12:01<caker>there's this thing called delivery -- they make it near you then drive it over
12:01<caker>FREE PIZZA FOR #UML
12:05<jdike>caker, weren't you complaining about needing X for xemacs? Did you end up just installing plain emacs?
12:06<caker>jdike: I did just Ubuntu's emacs package -- and it wanted x11-common
12:06<caker>need xemacs?
12:06[~]jdike copies his .emacs over
12:06<jdike>I'd like to see if I get decent response with it
12:08<caker>jdike: ok, xemacs21 installing, eta 3min (slow mirror)
12:13<DeHackEd>doesn't work with 2.6.19 guest either...
12:13[~]kokoko1 halting one UML, client no longer wants it
12:13<caker>DeHackEd: nope ... what's your host kernel?
12:13<DeHackEd> + SKAS3 + supermount
12:14<caker>ok .. any host kernel >= 2.6.17 with any 2.6 UML kernel [that I tried -- at least 2.6.16 and up] hangs for me
12:14[~]kokoko1 successfully upgraded 5 UMLS and host to fc6 and rebooted :D
12:15<DeHackEd>this is one machine. my local machine is host 2.6.20 (same patches) and guest 2.6.19.x and working.
12:15<kokoko1>still using + skas3 on host
12:15<caker>DeHackEd: same distribution in the guest? Because for me it's only happening with Ubuntu 6. and I think Gentoo
12:15<caker>works with everything else
12:15<DeHackEd>no, the guests are different. gentoo vs fc6
12:16<DeHackEd>though I tried sending a statically linked gentoo busybox into fc6 and use that as init= and no go
12:16<DeHackEd>what I have determined is that the init process does get some work done. A few system calls (open, read, mmap2, close) occur, it looks like the dynamic linker doing its think. then it hangs.
12:17<jdike>DeHackEd, right
12:34<DeHackEd>just to piss me off, copying the working guest kernel to the not-working machine doesn't work AND network mounting the filesystem image and running the UML from here WORKS
12:42<jdike>my single-stepper just worked the first time after reproducing it from memory
12:42<jdike>it took me an hour to get it working before
12:47<jdike>the offending instruction is 0x083af25b: call *%gs:0x10
12:53<DeHackEd>is that a TLS related thing?
12:54<jdike>there is a set_thread_area call before the hang too
12:54<jdike>I thought tls used %fs though
13:03<jdike>guess #1 as to the fix doesn't pan out
13:04|-|arun [~arun@chobie.cs.Virginia.EDU] has quit [Server closed connection]
13:04|-|arun [~arun@chobie.cs.Virginia.EDU] has joined #uml
13:05<jdike>debugging operations interrupted by
13:05<DeHackEd>a perfectly normal interruption
13:06<DeHackEd>that I don't recognize
13:06|-|strontian [] has quit [Server closed connection]
13:19<DeHackEd>if not a fix, can you recommend a workaround?
13:25|-|VS_ChanLog [] has quit [Server closed connection]
13:35|-|VS_ChanLog [] has joined #uml
13:54[~]jdike suitably refueled
14:07|-|strontian [] has joined #uml
14:21|-|tyler_ [~tyler@] has joined #uml
14:23|-|tyler [~tyler@] has quit [Read error: Connection reset by peer]
14:32[~]DeHackEd wants to help but doesn't have a clue how to proceed.
14:35<caker>DeHackEd: wait and see what Jeff discoveres...
14:35<caker>DeHackEd: or, revert to 2.6.16.x on the host
14:47<DeHackEd>sorry, I need some drivers only in 2.6.19
14:47<jdike>note to self - changes only take effect after recompilation
14:56|-|HuK0B^OUT [~HuK0B@] has quit []
15:11<jdike>well, the segment registers seem to be OK
15:11<jdike>just one of them points to an address with a rediculous value in it
15:12|-|ousado_ [] has joined #uml
15:17[~]caker gives jdike a highfive
15:18[~]jdike still needs to listen to it :-)
15:19|-|ousado__ [] has quit [Ping timeout: 480 seconds]
16:08|-|richardw_ [~richardw@] has joined #uml
16:14|-|pgstudy [] has quit [Quit: Konversation terminated!]
16:15|-|richardw [] has quit [Ping timeout: 480 seconds]
17:12<DeHackEd>would hardware make a difference?
17:14|-|tyler_ [~tyler@] has quit [Remote host closed the connection]
17:24|-|richardw_ [~richardw@] has quit [Quit: Leaving]
17:29<jdike>found it
17:30<jdike>vdso support got broke
17:30<DeHackEd>glibc ?
17:31<jdike>no, it's a kernel thing, with libc support
17:36<DeHackEd>so, is there a patch for it? I'm quite interested.
17:36<jdike>not yet, I need to figure out what's happening
18:40<jdike>OK, who wants a patch?
18:42<jdike>caker, and what's the root password for this filesystem?
18:43[~]DeHackEd raises his hand for the patch
18:46<DeHackEd>I'll have this up in just a minute.
18:49<caker>jdike: who knows -- may I suggest booting with init=/bin/bash; mount -o remount,rw /; passwd root
18:56<caker>wow, cool
18:56<caker>I'll build one tonight and give it a shot
18:56<DeHackEd>oh, nope... it seems to have bombed out after trying to login to my shell...
18:57<DeHackEd>okay, it's just going unresponsive every now and the
18:57<DeHackEd>I have a few soft lockup warnings on my kernel log
18:57<caker>DeHackEd: I've always just turned off softlockups for UML -- pausing the UML will do that, too
18:58<DeHackEd>for debugging purposes I would agree. I logged in and both the UML and userspace thread ate huge amounts of CPU. I think the total exceeded 100%.
18:59[~]caker builds one now
19:00<caker>huh . down?
19:00<caker>there we go
19:10<caker>pastebin ate the @@ in the patch, but the "download" link is correct
20:04|-|jdike [] has quit [Quit: Leaving]
20:04<DeHackEd>the CPU problem seems to be related to one of the IO threads... I'll figure this one one myself.
21:13|-|ousado__ [] has joined #uml
21:21|-|ousado_ [] has quit [Ping timeout: 480 seconds]
21:21|-|rasix [~jeruk@] has joined #uml
21:44<DeHackEd>for whenever jdike returns, something else is up. this time it's some massive CPU usage some processes. not the main process or the userspace process, but one of the others. some kinda IO thing I guess...
21:45<caker>DeHackEd: is the guest doing IO?
21:46<caker>also, is UBD_SYNC turned off in your .config
21:46<DeHackEd>I believe I turned it off in the last rebuild, and I think it's applied.
21:46<caker>and is /tmp mounted as tmpfs
21:46<caker>UBD_SYNC should be off, btw
21:46<DeHackEd>the last listed process is eating 100% CPU sometimes, almost all of it kernel time. the directory with the memory image is TMPFS and there's plenty of RAM for it.
21:48<caker>DeHackEd: ./linux --showconfig | grep CONFIG_BLK_DEV_UBD_SYNC ?
21:48<DeHackEd>this is the system I made with jdike's "fix crash on startup" patch from earlier today
21:48<DeHackEd>is not set
21:48<caker>it's running OK for me, but on another host kernel which wasn't affected by the bug
21:56<DeHackEd>it's running, just really freaking slowly. something's wrong with IO in recent kernels, I swear...
22:40|-|Nem^1 [] has joined #uml
22:47|-|Nem^ [] has quit [Ping timeout: 480 seconds]
22:47|-|Nem^1 changed nick to Nem^
22:59|-|VS_ChanLog [] has left #uml [Rotating Logs]
22:59|-|VS_ChanLog [] has joined #uml
23:04|-|dgraves [] has quit [Server closed connection]
23:04|-|dgraves [] has joined #uml
23:06|-|rasix [~jeruk@] has quit [Quit: Leaving]
---Logclosed Tue Feb 20 00:00:43 2007