Back to Home / #uml / 2007 / 10 / Prev Day | Next Day
#uml IRC Logs for 2007-10-15

---Logopened Mon Oct 15 00:00:56 2007
00:07|-|mgross [] has quit [Quit: Ex-Chat]
00:08|-|mgross [] has joined #uml
00:12|-|Ice [] has joined #uml
00:12|-|Ice [] has quit []
01:25|-|aroscha [] has joined #uml
01:28|-|aroscha [] has quit []
01:49|-|aroscha [] has joined #uml
01:52|-|mgross [] has quit [Read error: Operation timed out]
02:47|-|aroscha [] has quit [Quit: aroscha]
03:04|-|aroscha [] has joined #uml
03:18|-|tyler29 [] has joined #uml
03:29|-|aroscha [] has quit [Quit: aroscha]
04:00|-|tyler29 [] has quit [Ping timeout: 480 seconds]
04:06|-|aroscha [] has joined #uml
04:14|-|balbir [~balbir@] has joined #uml
04:22|-|tyler29 [] has joined #uml
04:29|-|balbir [~balbir@] has quit [Ping timeout: 480 seconds]
05:31|-|krau [~cktakahas@] has quit [Quit: Varei!!!]
06:54|-|tyler29 [] has quit [Ping timeout: 480 seconds]
07:10|-|tyler29 [] has joined #uml
07:38|-|krau [~cktakahas@] has joined #uml
07:42|-|tyler29 [] has quit [Ping timeout: 480 seconds]
07:59|-|tyler29 [] has joined #uml
07:59|-|dang [] has joined #uml
08:11|-|balbir [~balbir@] has joined #uml
08:26|-|arun [] has joined #uml
08:32|-|mgross [] has joined #uml
08:41|-|Urgleflogue [~plamen@] has quit [Ping timeout: 480 seconds]
08:57|-|hfb [] has joined #uml
09:18|-|balbir [~balbir@] has quit [Ping timeout: 480 seconds]
09:30|-|balbir [~balbir@] has joined #uml
09:32|-|karol [] has joined #uml
09:32|-|karol changed nick to Magotari
09:32|-|mgross [] has quit [Ping timeout: 480 seconds]
09:34|-|tyler29 [] has quit [Ping timeout: 480 seconds]
09:35|-|aroscha [] has quit [Quit: aroscha]
09:45|-|balbir [~balbir@] has quit [Read error: Operation timed out]
09:52|-|tyler29 [] has joined #uml
09:58|-|balbir [~balbir@] has joined #uml
10:11|-|tyler29 [] has quit [Ping timeout: 480 seconds]
10:20|-|balbir [~balbir@] has quit [Ping timeout: 480 seconds]
10:25|-|tyler29 [] has joined #uml
10:26|-|mgross [] has joined #uml
10:32|-|balbir [~balbir@] has joined #uml
10:38|-|Netsplit <-> quits: dsoul, linbot, albertito
10:39|-|Netsplit over, joins: dsoul, linbot, albertito
10:42|-|balbir [~balbir@] has quit [Read error: Operation timed out]
10:54|-|balbir [~balbir@] has joined #uml
11:20|-|jdike [] has joined #uml
11:20<jdike>Hi guys
11:20|-|balbir [~balbir@] has quit [Ping timeout: 480 seconds]
11:22<caker>jdike: hola
11:22<caker>jdike: what's new?
11:24|-|aroscha [] has joined #uml
11:24<jdike>Figuring out why crashme works so well
11:24<jdike>and how to stop it
11:25<Magotari>Hi, jdike.
11:32|-|balbir [~balbir@] has joined #uml
11:32|-|aroscha [] has quit [Quit: aroscha]
11:36<jdike>do_syscall_stub : ret = 6, offset = -1080299524, data = 09398ffc
11:36<jdike>do_syscall_stub: syscall 0 failed, return value = 0x6, expected return value = 0x0
11:36<jdike> syscall parameters: 0xfd37 0x201000c 0x2e 0xfa01 0x2020ff4 0x2e2e
11:37<jdike>this is apparently not related to the signal 10 thing
11:37<jdike>coz I fixed that, and this is still happening
11:44|-|aroscha [] has joined #uml
11:45|-|dang [] has quit [Read error: Operation timed out]
11:59|-|dang [] has joined #uml
12:15<dgraves>jdike: its been 2 weeks now!
12:15<dgraves>you're getting slow in your old age. :)
12:15<jdike>since when?
12:15<jdike>since Magotari reported it?
12:16<dgraves>actually, its only been two weeks in my time accelerated world.
12:16<dgraves>i need less sugar, i think.
12:16[~]jdike waves his cane at dgraves
12:16<dgraves>careful, old man. you'll knock your glasses off!
12:17<jdike>but a few days since I was able to reproduce it myself
12:21<Magotari>Yeah, crashme is pure genius.
12:21<Magotari>Is the crash also non-deterministic for you?
12:21<Magotari>Crashme is supposed to be predictable.
12:23<dgraves>what's crashme?
12:27<jdike>it's not
12:27<jdike>it requires a load for some reason
12:27<jdike>crashme generates random data, then executes it
12:28<jdike>very good at exposing OS/hardware bugs
12:28<jdike>or at least the crash is non-deterministic
12:31<Magotari>jdike: For the same input crashme will generate the same random data.
12:31<Magotari>Thus by entering the last thing before the crash should crash the OS at once.
12:31<Magotari>This is not true of UML though.
12:32<Magotari>I had to enter it many times. It generated the same program many times. Sometimes I got a message, sometimes it crashed, mostly nothing.
12:33<jdike>yes, deterministic in that sense
12:34<jdike>but not deterministic in the sense that I care about
12:35|-|ElectricElf [] has joined #uml
12:39|-|Electric1lf [~dbharris@] has quit [Ping timeout: 480 seconds]
12:48|-|tyler29 [] has quit [Ping timeout: 480 seconds]
13:01|-|tyler29 [] has joined #uml
13:08|-|remus [~remus@] has joined #uml
13:22<dgraves>where does one get crashme/
13:23[~]caker gives himself a highfive
13:24|-|tyler29 [] has quit [Ping timeout: 480 seconds]
13:25<Magotari>dgraves: Go to debian. They have the source and the patches to make it work on a modern system.
13:27<dgraves>::ROFL:: @ caker.
13:34|-|tyler29 [] has joined #uml
13:38|-|remus [~remus@] has quit [Ping timeout: 480 seconds]
13:40<jdike>apparently a single instruction can generate both a SIGSEGV and a SIGTRAP
13:48<ds2>single stepping a mov instruction that uses a NULL pointer? =)
13:48<jdike>I would handle single-stepping correctly
13:49<jdike>I spent some time seeing if crashme had somehow induced single-stepping
13:50<jdike>set-gs ; jmp somewhere
13:50<jdike>The intel docs say that set-gs is undefined when followed by a branch
13:51|-|remus [~remus@] has joined #uml
13:51<jdike>when somewhere needs to be faulted in, I'm guessing that causes the segfault and the undefinedness of the whole thing causes the trap
13:52<jdike>not positive though
13:52<ds2>is that an errata?
13:52<jdike>you means the docs?
13:52<ds2>or updates as intel calls them
13:53<jdike>standard volume 2
13:53<jdike>the fix failed spectacularly
13:53<ds2>interesting... is a double fault considered a SIGTRAP?
13:54|-|mgross [] has quit [Ping timeout: 480 seconds]
13:55<jdike>the odd thing is, I would expect any undefined instructions to generate a SIGSEGV, not a SIGTRAP
13:55<ds2>why SIGSEGV not a SIGILL?
13:56<jdike>or a SIGILL
13:56<jdike>well, defined instruction, but undefined behavior
13:56<ds2>(been working on non-x86 lately so if it is not supported, sorry)
13:57<ds2>but if linux maps a double fault to a SIGTRAP it may make sense
13:59<jdike>except that I still get the SIGSEGV
14:16[~]jdike considers a more diabolical pssibility
14:17<jdike>when the stub runs, its first instruction is considered to have the set-gs prefix, and it SIGTRAPs
14:33<Magotari>jdike: I know the time is not yet ripe but... in the future, shall I keep going to the newest -mm before applying a patch, or stick to 2.6.23-rc8-mm1 ?
14:33<jdike>the newest -mm
14:33<jdike>that's where I usually am
14:46|-|Iceman [] has joined #uml
14:47<Iceman>Caker: are u there ??
14:52|-|Iceman [] has quit []
14:57<peterz>jdike: I messed up my kernel, and am getting spurious segfaults on addr 0. Thing is, they appear to come from userspace, even though I know userspace is good - it doesn't crash without my funny patches
14:57<peterz>jdike: any ideas on where to go look?
14:57<peterz>yeah, neat indeed
14:57<peterz>never encountered such a mess
14:57<jdike>what do your patches do?
14:58<peterz>its my swap over nfs work
14:58<peterz>once I enable nfs swap it does this
14:58<peterz>with regular bdev swap it all works
14:58<jdike>OK, are you swapping zeros in instead of whatever used to be there?
14:59<peterz>hmm, there is an idea
14:59[~]peterz goes put in some sanity much in swap_readpage()
15:00<peterz>thanks for the idea
15:00<jdike>just look at the page resulting from the swapin
15:00<jdike>if there's something grossly wrong with it, that'll be obvious
15:03<jdike>also, with UML there are some other tricks you might play
15:03<peterz>do tell :-)
15:04<jdike>like keep a swapfile on the host and read/write pages there as well as to/from nfs
15:04<jdike>and compare on swapin
15:04<peterz>funny idea
15:05<peterz>would need some co-ordination wrt the end_writeback notification
15:10[~]jdike is now getting a SIGTRAP on a nop
15:10<peterz>how'd you manage that?
15:10<peterz>single stepping a nop?
15:11<jdike>beats the s**t out of me
15:11<jdike>no, I know how to do single-stepping
15:11<jdike>Here's the story as far as I've figured it out
15:12[~]Magotari is curious.
15:12<jdike>crashme installs a SIGFAULT handler
15:12<jdike>it generates and executes a gs; br somewhere
15:12<jdike>err, a SIGTRAP handler, I think
15:13<jdike>the gs; br generates a SIGTRAP
15:13<jdike>UML passes that back to the handler
15:13<jdike>the sigcontext is right on a page boundary and the first instruction in the handler is a push
15:14<jdike>the next stack page is not present, so the push generates a page fault
15:15<jdike>UML allows the SIGSEGV to continue to the stub in order to get the page fault info out of that sigcontext
15:15<jdike>the first instruction in the segfault handler gets a SIGTRAP
15:15<jdike>NO MATTER WHAT IT IS
15:16<jdike>and it's not like the SIGTRAP happened at the same time as the page fault and is just being delivered on that instruction
15:17<Magotari>What a mess.
15:17<jdike>wait() says no pending signals just before I continue into the stub
15:17<jdike>so it actually is SIGTRAPing that nop
15:18<peterz>shadow of the initial trap maybe?
15:18<jdike>and somehow this is dependent on the UML being loaded
15:18<jdike>it doesn't happen with crashme running by itself
15:18<jdike>only with a kernel build running at the same time
15:19<peterz>have you tried another cpu , generation/vendor ?
15:19<jdike>I'm pondering that
15:20<peterz>_if_ that helps, you might question your esteemed colleagues
15:20<jdike>but the thing is that many many instructions have passed between the original gs and the trap that's killing me
15:20<jdike>I have a hard time seeing how it could still be in effect
15:21<Magotari>jdike: I have an idea. Running an empty loop might give different results that a kernel compile. Different load, more consitent.
15:21<jdike>esp since I explicitly put a ds at the start of the stub to override the gs
15:21<peterz>any iret or other funny state fiddling functions in between?
15:21<jdike>and a nop before that in case the ds had no effect because of immediately following the gs
15:21<jdike>hence the presence of the nop
15:22<jdike>we've been in and out of the kernel, and switching between UML and the process, so probably yes
15:23<jdike>Magotari, it actually is fairly consistent - it's always the scenario above, pretty much
15:23<jdike>just not too consistent in how long it takes to reproduce
15:25<peterz>so uml-userspace does trap, host kernel catches, generates signal, guest kernel gets signal, guest kernel faults, host kernel generates sigsegv, guest kernel catches sigsegv, and gets an extra trap
15:25<jdike>basically, yes
15:25<Magotari>I am new to all this, and the kernel especially, but kernel builds do a lot of forking and other system calls? The idea here is to see if just plain load would be a problem, or is this to do with some interference from context/process switches...
15:26<jdike>Magotari, that's the thing
15:26<jdike>there are no context switches or interrupts in the general vicinity of the crash
15:27<Magotari>Oh. I guess you are a step in front of me then. Sorry about that.
15:27<jdike>BTW, that explains the crazy data just before the crash
15:27<Magotari>You mean syscall 0 and negative huge offset and so on?
15:27<jdike>it's printing out the contents of the last faultinfo
15:28<caker>init_new_context_skas - new_mm failed, errno = -24
15:28[~]caker stabs
15:29<Magotari>caker: skas3?
15:29<caker>Magotari: yes
15:29<jdike>caker, increase the open file limit
15:29<caker># cat /proc/sys/vm/max_map_count
15:30<jdike>nope *OPEN FILE* limit
15:30<jdike>not vm area limit
15:30<jdike>whoops, OK
15:34<Magotari>Speaking of skas3. Any idea what is going on with BaisorBlade (sp?) ? I have not seen him yet, and his patches (no surprise) do not apply to the current kernels. I would really love to play with that thing, but I cannot have a kernel under 2.6.23.
15:36<jdike>He's been quiet all summer, and so far into the fall
15:37<caker># cat file-nr
15:37<caker>1760 0 1616692
15:37<caker>num of allocated fh, num of free fh (wtf 0?), max poss fh (set with file-max)
15:38<jdike>caker, look at ulimits first
15:39<caker>open files (-n) 1024
15:39[~]caker feels stupid
15:39<caker>thanks :)
15:44|-|kos_tom [] has joined #uml
15:59|-|balbir [~balbir@] has quit [Ping timeout: 480 seconds]
16:06[~]dgraves makes another offering at the altar of the almighty (and old) Jdike.
16:11[~]jdike ignores the provocation and ponders dinner
16:11<jdike>... and looks for his dentures
16:13<Magotari>Aha. I just remembered. I tried to run UML in a chroot. It would not boot, it would eat 100%CPU instead. Running with con=null solved the problem. Now, I know I was missing something in the chroot, but UML should die or something... Anything but staying there forever with 100%CPU.
16:14<Magotari>That was quite long ago, but I just remembered it.
16:19|-|aroscha [] has quit [Quit: aroscha]
16:21<Magotari>I just had to use QEMU. Yuck. Horrible. UML at least can handle my dvorak keyboard without a problem.
16:26<Magotari>Xen just does not agree with me. Lguest had a horrible crashing streak. Bochs is a configuration nightmare and don't get me started on the performance. Qemu is horrible all around except the super easy networking setup which works out of the box.
16:26<Magotari>OpenVZ is an interesting idea, but just did not do it for me.
16:26<jdike>UML needs some stuff from /proc
16:26<jdike>but should tolerate it not being there
16:27<Magotari>Let me check that setup, sec.
16:27<Magotari>There was no /proc at all.
16:40|-|tyler29 [] has quit [Ping timeout: 480 seconds]
16:41|-|dang [] has quit [Quit: Leaving.]
16:58|-|tyler29 [] has joined #uml
17:05|-|aroscha [] has joined #uml
17:09|-|tyler29 [] has quit [Ping timeout: 480 seconds]
17:17|-|Nem^ [] has quit [Read error: Operation timed out]
17:24|-|tyler29 [] has joined #uml
17:24<jdike>so, I can work around the bug by continuing the system call stub if it hasn't finished
17:24<jdike>and it just needs one extra continue, so there's no magic single-stepping involved
17:28|-|Nem^ [] has joined #uml
17:30|-|kos_tom [] has quit [Remote host closed the connection]
17:33|-|tyler29 [] has quit [Ping timeout: 480 seconds]
17:37|-|Electric1lf [] has joined #uml
17:44|-|ElectricElf [] has quit [Ping timeout: 480 seconds]
17:47|-|hfb [] has quit [Quit: Leaving]
17:57|-|Nem^ [] has quit [Ping timeout: 480 seconds]
17:57<Magotari>Good luck with work everyone, I am going to sleep.
18:07|-|Nem^ [] has joined #uml
18:15|-|Nem^ [] has quit [Read error: Operation timed out]
18:25|-|aroscha [] has quit [Quit: aroscha]
18:26|-|Nem^ [] has joined #uml
18:36|-|aroscha [] has joined #uml
18:56|-|aroscha [] has quit [Quit: aroscha]
18:59|-|jdike [] has quit [Quit: Leaving]
20:44|-|albertito [] has quit [Quit: q]
20:58|-|albertito [] has joined #uml
21:03|-|remus_ [~remus@] has joined #uml
21:03|-|remus [~remus@] has quit [Read error: Connection reset by peer]
21:08|-|Nem^ [] has quit [Ping timeout: 480 seconds]
21:18|-|Nem^ [] has joined #uml
21:57|-|remus_ [~remus@] has quit [Read error: Operation timed out]
22:58|-|VS_ChanLog [] has left #uml [Rotating Logs]
22:58|-|VS_ChanLog [] has joined #uml
23:08|-|balbir [~balbir@] has joined #uml
---Logclosed Tue Oct 16 00:00:32 2007