Back to Home / #uml / 2007 / 03 / Prev Day | Next Day
#uml IRC Logs for 2007-03-23

---Logopened Fri Mar 23 00:00:55 2007
00:40|-|baroni [] has quit [Remote host closed the connection]
06:59|-|albertito [] has joined #uml
07:28|-|flatronf700B [~flatronf7@] has joined #uml
07:41|-|Ancalagon [] has joined #uml
07:46|-|krau [~cktakahas@] has joined #uml
08:23|-|ram [] has joined #uml
09:41|-|jdike [] has joined #uml
09:41<jdike>Hi guys
10:06|-|Ancalagon [] has quit [Quit: Chatzilla 0.9.75 [Firefox 1.0.4/20070116]]
10:21|-|hfb [] has joined #uml
10:29|-|waldner [] has joined #uml
10:59<waldner>EXT3-fs: mounted filesystem with ordered data mode.
10:59<waldner>[42949375.530000] VFS: Mounted root (ext3 filesystem) readonly.
10:59<waldner>[42949375.530000] Kernel panic - not syncing: do_syscall_stub : PTRACE_SETREGS failed, errno = -5
10:59<waldner>host 2.6.21-rc4
11:02<waldner>64-bit guest kernels from (any)
11:08<caker>!errno -5
11:08<linbot>caker: (unknown) (#-5): Unknown error 4294967291
11:11<waldner>that's encouraging...
11:12<jdike>messed up the sign on the error message, I guessed
11:13<caker>!errno 5
11:13<linbot>caker: EIO (#5): Input/output error
11:16<jdike>That can happen if the register set address in the parent isn't valid or it's trying to give a segment register a bogus value
11:16<jdike>#1 should be EFAULT, I would think
11:16<jdike>but it isn't
11:16|-|ram [] has quit [Ping timeout: 480 seconds]
11:16<waldner>is that a host or guest problem?
11:26|-|ram [] has joined #uml
11:32<waldner>how can I avoid that?
11:34<jdike>although I don't see how to download that as plain text
11:35<jdike>use that instead
11:35<jdike>and let me know what it says
11:35<jdike>just more debugging
11:41<waldner>ok, I'm going to try that
11:57<waldner>ok, here it is:
12:01<waldner>if you need more info, just ask
12:18<jdike>waldner, willing to be a gdb bot for me?
12:19<waldner>well, let's try :)
12:20<jdike>OK, another patch for you
12:24<waldner>even if the patches are meant for 2.6.17, I can still (manually editing) apply them to, right?
12:25<waldner>I ask because is the version for which I have the source at hand
12:26<jdike>the filename says 2.6.17 because I update that directory in place without renaming it
12:26<jdike>it's really 2.6.21-rc4 now
12:27<waldner>ah ok
12:27<waldner>ok, here we go
12:28<waldner>it hangs two lines after the kernel panic:
12:28<waldner>(pasting here since it's just 3 lines)
12:28<waldner>[42949375.540000] Kernel panic - not syncing: do_syscall_stub : PTRACE_SETREGS failed, errno = 5
12:30<jdike>the rest should have been there
12:30<jdike>oh well
12:30<waldner>"ps aux" shows five instances of the executable (linux)
12:30<jdike>get the lowest UML pid and
12:30<jdike>gdb linux <pid>
12:31<waldner>ok, done that
12:31<waldner>Attaching to process 22272
12:31<waldner>Reading symbols from /home/davide/uml/linux-
12:31<waldner>Using host libthread_db library "/lib/".
12:31<waldner>0x00000000602e51f0 in __nanosleep_nocancel ()
12:32<jdike>p/x exec_regs
12:32<waldner>it's ok pasting here, right?
12:33<waldner>$1 = {0x7fffad92f748, 0x7fffad92f3c8, 0x2b32fd17a000, 0x5700, 0x2b32fd17afc8, 0x5703, 0x246, 0x0, 0x2b32fd17afe8, 0x0, 0x0, 0xffffffffffffffff, 0x0, 0x13, 0x5703, 0x3e, 0x602d4d77, 0x33, 0x246,
12:33<waldner> 0x2b32fd17afc0, 0x2b}
12:39<jdike>apply that and see what happens
12:40<waldner>do I keep the heng patch?
12:43<waldner>do I have to reattach gdb and issue the p/x command?
12:44<jdike>did it panic again?
12:45<waldner>but now most of the registers are 0
12:45<jdike>OK, do the same thing again
12:45<waldner>$1 = {0x0 <repeats 27 times>}
12:47<jdike>back out that last patch
12:48<waldner>ok, done
12:48<jdike>it's just moving that memset
12:51<waldner>it still hangs
12:51<waldner>$1 = {0x7fff0b98f7a8, 0x7fff0b98f428, 0x2b069f11a000, 0x5af3, 0x2b069f11afc8, 0x5af6, 0x246, 0x0, 0x2b069f11afe8, 0x0, 0x0, 0xffffffffffffffff, 0x0, 0x13, 0x5af6, 0x3e, 0x602d4d87, 0x33, 0x246,
12:51<waldner> 0x2b069f11afc0, 0x2b, 0x0, 0x0, 0x0, 0x0, 0x63, 0x0}
12:52<jdike>and the debug printk isn't happening?
12:53<waldner>yes, just before the kernel panic line
12:53<jdike>OK, can you send that to pastebin or rafb?
12:54<jdike>[42949375.540000] 23 0x44
12:54<jdike>is what I'm concerned about
12:55<jdike>from my reading of ptrace.c, that appears to be an illegal value
12:55<waldner>I can give you more info about the host, if you need
12:55<jdike>I wouldn't know what I want to know :-)
12:55<jdike>yet, anyway
12:56<waldner>if you remember two days ago, when I was using 2.6.19 host, it did not panic but I had those segfault problems
12:56<jdike>attach gdb to it, 'i regs' and tell me what you have for ds
12:56<jdike>Oh yeah
12:57<jdike>and you upgraded and it starts doing this
12:57<waldner>(gdb) i regs
12:57<waldner>Undefined info command: "regs". Try "help info".
12:57<jdike>i reg
12:57<waldner>ds 0x0 0
12:58<jdike>so where is the 0x44 coming from
12:58<jdike>can you paste the results of 'bt'
13:00<jdike>let's try something nasty to see if it is ds causing the problem
13:04<jdike>either back out the previous patch and drop this in
13:04<jdike>or by hand add the two new HOST_DS lines
13:05<jdike>I'd prefer backing out the old and dropping in the new
13:05<jdike>reduce the likelihood of mistakes
13:08<waldner>ok, and so registers.c should have one or two memset()s now?
13:09<waldner>ok, so I remove the one inside init_registers
13:09<jdike>the init_registers one is good
13:09<jdike>the init_thread_registers one is bogus
13:10<waldner>but your patch adds that
13:10<jdike>yeah, I just noticed that
13:10<jdike>the next version of this patch will have that fixed
13:11<waldner>so, we just keep the one inside init_registers?
13:11<jdike>I started on an x86 box, then moved to an x86_64 system to make sure that what I'm giving you compiles and runs
13:11<jdike>and that memset fix got lost in the shuffle
13:13<waldner>ok, everything is here:
13:16<jdike>hum de dum
13:16<jdike>ds didn't change
13:17<waldner>seems so
13:17<jdike>messed up again
13:17<jdike>not sure whether it made any difference though
13:19<jdike>might work better
13:22<waldner>seems different, for what my opinion is worth (lol)
13:23<jdike>so it's not ds
13:28<jdike>that fixes a bug, not sure if it'll help here
13:30<waldner>looks the same to me
13:33<waldner>it's identical to the previous patch
13:34<jdike>you're right, hold on
13:35<jdike>forgot to pull the new version into emacs
13:38<waldner>now it hangs /without/ panic
13:38<waldner>I'll paste to debian, hold on
13:39<jdike>would that be an improvement or not?
13:39<jdike>what UML is this?
13:40<waldner>sorry, (fat fingers)
13:41<waldner>Might I try to remove the hang line?
13:41<jdike>detach (or run it again)
13:42<jdike>and strace it
13:42<jdike>and paste 2-3 cycles of the repeat
13:43<waldner>sorry but I'm not exactly comfortable with strace
13:43<waldner>I run "strace <same command used before>"
13:44<jdike>the same pid as with gdb
13:44<waldner>it produces lots of output, and then exits
13:44<jdike>strace -o log -p <pid>
13:44<waldner>ah, while it's blocked?
13:44<waldner>ah ok
13:44<jdike>let it hang, then strace it
13:45<waldner>Process 25378 attached - interrupt to quit
13:45<waldner>and stays there
13:46<waldner>ah ok, wait
13:46<waldner>now I exit and open the log file
13:46<jdike>then look at the log file, figure out what the repearting unit is, and give me a couple of them
13:46<jdike>yeah, ^C the thing after a few seconds
13:46<waldner>yes yes, sorry for being a little slow to learn :)
13:48<jdike>that's in mainline, but not -stable yet
13:49<waldner>ok, but IIUC that was not the only required get there we had to modify some other bit..right?
13:49<waldner>trying that fix now
13:50<jdike>for most people the patch is enough
13:51<jdike>for strange people like you, I also need the memcpy fix
13:52<jdike>it hung again?
13:55<jdike>same thing
13:55<jdike>can you paste /proc/<pid>/maps?
13:59<jdike>gdb it and print the values of __kernel_vsyscall and vsyscall_ehdr
14:01<waldner>I started another instance, I hope it does not make difference
14:02<waldner>it's the "p" command, I suppose
14:02<waldner>(gdb) p __kernel_vsyscall
14:02<waldner>$1 = 0
14:02<waldner>(gdb) p vsyscall_ehdr
14:02<waldner>$2 = 0
14:03<jdike>can you mail me the binary?
14:03<waldner>the guest kernel binary?
14:03<jdike>see how host-specific this is
14:04<waldner>it's ~53MB!
14:04<jdike>symbol tables are big
14:04<jdike>is that a problem?
14:04<waldner>ok, I'm afraid it will take some time though
14:05<waldner>no, but I have ADSL and it's slow when uploading things
14:05<jdike>I don't mind if you don't
14:05<waldner>anyway, hold on
14:06<waldner>I guess that's the only possible way to go to understand things, isn't it?
14:07<jdike>if it works here, then the next thing I would want is access to your host so I can debug it there
14:07<waldner>ok, so you must first test it there
14:08<jdike>and that would be much easier
14:09<waldner>ok, I'm bzip2'ing it
14:11<waldner>well, then just check your email in 30/40 mins or so
14:24|-|baroni [] has joined #uml
14:25<baroni>hi all. does anybody has got a gdb breakpoint at module.c:init ?
14:25<waldner>it's transferring
14:26<baroni>gdb doesn't break at there
14:26<jdike>can you set breakpoints anywhere else?
14:26<waldner>jdike: it was built with gcc 4.1.1
14:26<baroni>(after add-symbol-file <module>)
14:26<baroni>hi jdike :)
14:27<jdike>have you double-checked the symbol information?
14:30<jdike>baroni, in other words, after the add-symbol-file
14:31<jdike>print the module struct again and see that the init and destroy fields say <module-init+0> rather than <module-init+0x10> or something
14:31|-|baroni [] has quit [Read error: Connection reset by peer]
14:31<jdike>I scared him away
14:35|-|baroni [] has joined #uml
14:36<baroni>back. sorry, one of my 123412 tabs of firefox should have some bad script eating the rest of my memory
14:36<baroni>how to check the symbol ? in gdb?
14:37<jdike> after the add-symbol-file
14:37<jdike>print the module struct again and see that the init and destroy fields say <module-init+0> rather than <module-init+0x10> or something
14:38<baroni>thanks jdike
14:38<baroni>let's see
14:42<baroni>what seems to be happening is that because i haven't used _any_ module, the variable "modules" is pointing to trash addr
14:43<baroni>and it just got pointed to something after the module is completely loaded
14:44<baroni>making impossible to get a break inside of the module before it got loaded
14:44<baroni>i'll try load anything else before the proper module
14:48<waldner>jdike: the kernel /should/ have been sent to you (fingers crossed)
14:56<jdike>you set a breakpoint in the module loader first
14:57<jdike>when it's read, and you have a module struct, you can set breakpoints in it
14:57<jdike>and you can use a couple fields in the module struct to check that the add-symbol-file address was correct
14:58<jdike>waldner, got it, I'll take a look a bit later
14:59<waldner>jdike: yes, when it's more comfortable for you, of course
14:59<baroni>thanks jdike ! yeah, i got your idea, is what i'm doing right now :)
14:59<waldner>jdike: moreover, I have to go now
14:59<waldner>jdike: many thanks for now!
15:00|-|waldner [] has quit [Quit: using sirc version 2.211+KSIRC/1.3.12]
15:55|-|krau [~cktakahas@] has quit [Read error: Operation timed out]
15:59|-|baroni [] has quit [Remote host closed the connection]
16:02|-|baroni [] has joined #uml
16:09|-|krau [~cktakahas@] has joined #uml
16:26|-|krau [~cktakahas@] has quit [Quit: Leaving]
16:54|-|kos_tom [] has joined #uml
17:20|-|tyler [] has joined #uml
17:30|-|tyler [] has quit [Read error: Connection reset by peer]
18:01|-|karmatik [] has joined #uml
18:01|-|cmantito [] has quit [Quit: I aer quitted.]
18:01|-|karmatik changed nick to cmantito
18:19|-|hfb [] has quit [Quit: Leaving]
18:36|-|cmantito [] has quit [Quit: I aer quitted.]
18:36|-|cmantito [] has joined #uml
19:01|-|krau [] has joined #uml
19:48|-|kos_tom [] has quit [Remote host closed the connection]
21:03|-|ZaJe [] has quit []
22:25|-|jdike [] has quit [Quit: Leaving]
22:59|-|VS_ChanLog [] has left #uml [Rotating Logs]
22:59|-|VS_ChanLog [] has joined #uml
---Logclosed Sat Mar 24 00:00:29 2007