Back to Home / #uml / 2007 / 02 / Prev Day | Next Day
#uml IRC Logs for 2007-02-14

---Logopened Wed Feb 14 00:00:14 2007
03:13|-|alb [~net@dsl-201-219-69-184.users.telpin.com.ar] has joined #uml
03:14|-|albertito [~net@dsl-201-219-69-184.users.telpin.com.ar] has quit [Ping timeout: 480 seconds]
03:29|-|pgstudy [~pgstudy@bzq-84-108-64-51.cablep.bezeqint.net] has joined #uml
03:35|-|tyler [~tyler@89.98.142.26] has joined #uml
04:25|-|infowolfe [~infowolfe@c-67-164-195-129.hsd1.ut.comcast.net] has quit [Quit: Leaving]
04:26|-|infowolfe [~infowolfe@c-67-164-195-129.hsd1.ut.comcast.net] has joined #uml
04:26|-|infowolfe [~infowolfe@c-67-164-195-129.hsd1.ut.comcast.net] has quit []
04:26|-|infowolfe [~infowolfe@c-67-164-195-129.hsd1.ut.comcast.net] has joined #uml
05:11|-|mode2 [~mode2@ABordeaux-256-1-23-30.w90-11.abo.wanadoo.fr] has joined #uml
05:15|-|mode2_ [~mode2@ABordeaux-256-1-99-208.w90-11.abo.wanadoo.fr] has quit [Ping timeout: 480 seconds]
05:27|-|mode2_ [~mode2@ABordeaux-256-1-154-125.w86-207.abo.wanadoo.fr] has joined #uml
05:29|-|mode2 [~mode2@ABordeaux-256-1-23-30.w90-11.abo.wanadoo.fr] has quit [Ping timeout: 480 seconds]
05:29|-|alb [~net@dsl-201-219-69-184.users.telpin.com.ar] has quit [Read error: Connection reset by peer]
05:32|-|albertito [~net@dsl-201-219-70-123.users.telpin.com.ar] has joined #uml
05:38<pgstudy>is there a way to get a root_fs that is ready for 2.6 kernel?
05:38<pgstudy>i.e. with mod utils for 2.6
05:39<pgstudy>i can't even ifup the root_fs from the usermodelinux site because it needs a module :)
06:06|-|richardw [~richardw@M345P026.adsl.highway.telekom.at] has joined #uml
06:14|-|writerz_ [~mode2@ABordeaux-256-1-54-22.w90-11.abo.wanadoo.fr] has joined #uml
06:19|-|mode2_ [~mode2@ABordeaux-256-1-154-125.w86-207.abo.wanadoo.fr] has quit [Ping timeout: 480 seconds]
06:24|-|SNy [64720e3343@bmx-chemnitz.de] has quit [Ping timeout: 480 seconds]
06:29|-|SNy [101b43d93c@bmx-chemnitz.de] has joined #uml
06:40|-|mode2 [~mode2@ABordeaux-256-1-160-71.w86-207.abo.wanadoo.fr] has joined #uml
06:41|-|alb [~net@dsl-201-219-70-158.users.telpin.com.ar] has joined #uml
06:43|-|albertito [~net@dsl-201-219-70-123.users.telpin.com.ar] has quit [Ping timeout: 480 seconds]
06:45|-|writerz_ [~mode2@ABordeaux-256-1-54-22.w90-11.abo.wanadoo.fr] has quit [Ping timeout: 480 seconds]
06:59|-|the_hydra [~mulyadi@125.164.96.9] has joined #uml
07:18|-|fbenites [~chatzilla@141.99.45.63] has joined #uml
07:23|-|pgstudy [~pgstudy@bzq-84-108-64-51.cablep.bezeqint.net] has quit [Quit: Konversation terminated!]
07:28|-|richardw_ [~richardw@M264P025.adsl.highway.telekom.at] has joined #uml
07:35|-|richardw [~richardw@M345P026.adsl.highway.telekom.at] has quit [Ping timeout: 480 seconds]
07:51|-|mode2_ [~mode2@ABordeaux-256-1-186-213.w90-16.abo.wanadoo.fr] has joined #uml
07:56|-|mode2 [~mode2@ABordeaux-256-1-160-71.w86-207.abo.wanadoo.fr] has quit [Ping timeout: 480 seconds]
08:41|-|Nem^1 [~Nem@dslb-084-057-234-014.pools.arcor-ip.net] has joined #uml
08:41|-|Nem^ [~Nem@dslb-084-056-246-049.pools.arcor-ip.net] has quit [Read error: Connection reset by peer]
08:41|-|Nem^1 changed nick to Nem^
09:14|-|kokoko1 [~Slacker@203.148.65.8] has joined #uml
09:38|-|alb [~net@dsl-201-219-70-158.users.telpin.com.ar] has quit [Read error: Connection reset by peer]
09:40|-|albertito [~net@dsl-201-219-69-170.users.telpin.com.ar] has joined #uml
10:00|-|jdike [~jdike@pool-71-174-247-179.bstnma.fios.verizon.net] has joined #uml
10:01|-|ram [~ram@pool-71-245-96-74.nycmny.fios.verizon.net] has joined #uml
10:17<the_hydra>hi jeff
10:19|-|da-x [~karrde@xiv-glob.ser.netvision.net.il] has quit [Read error: No route to host]
10:42|-|hfb [~hfb@pool-71-118-248-190.lsanca.dsl-w.verizon.net] has joined #uml
10:49|-|hfb [~hfb@pool-71-118-248-190.lsanca.dsl-w.verizon.net] has left #uml [Leaving]
11:03|-|pgstudy [~pgstudy@bzq-84-108-64-51.cablep.bezeqint.net] has joined #uml
11:04<kokoko1>hiya
11:05<the_hydra>hello
11:12|-|the_hydra [~mulyadi@125.164.96.9] has quit [Quit: using sirc version 2.211+KSIRC/1.3.12]
11:45|-|da-x_ [karrde@bzq-88-153-236-252.red.bezeqint.net] has quit [Ping timeout: 480 seconds]
11:50|-|kos_tom [~thomas@humanoidz.org] has joined #uml
11:58|-|fbenites [~chatzilla@141.99.45.63] has quit [Remote host closed the connection]
12:12|-|tyler [~tyler@89.98.142.26] has quit [Remote host closed the connection]
12:21|-|da-x [karrde@bzq-88-155-44-110.red.bezeqint.net] has joined #uml
12:22|-|tyler [~tyler@89.98.142.26] has joined #uml
12:24|-|pgstudy [~pgstudy@bzq-84-108-64-51.cablep.bezeqint.net] has quit [Quit: Konversation terminated!]
12:27|-|flatronf700B [~flatronf7@202.75.186.154] has quit [Ping timeout: 480 seconds]
12:33|-|mnky [~nobody@p54B3D217.dip.t-dialin.net] has joined #uml
12:55<uroboros_>Hello.
12:55<uroboros_>jdike: Hello, are you here?
12:55<jdike>yup
13:00<uroboros_>jdike: Please, I have still problem to log in to the running uml (the freeze after banner we discussed here). I tried Debian/etch, Debian/sarge - the same.
13:00<uroboros_>jdike: What should I do to analyze the problem? I really need at least one running UML to test something. :(
13:01<uroboros_>jdike: I really can't see the problem, nothing special about the system. You can chroot in it, everything works OK. I simply do not understand why the login shell does not start. :(((
13:02<jdike>what's the UML doing on the host when the login is hung?
13:02<uroboros_>And one more thing: the /lib/tls must be removed or the init ends in "respawning too fast...".
13:03<uroboros_>jdike: if it is in sleep state or so? I will check...
13:03<jdike>yes
13:04<jdike>what UML version, again?
13:05<jdike>the /lib/tls thing has been fixed for a long time
13:06<uroboros_>UML 2.6.20
13:06<uroboros_>resp. Linux 2.6.20 compiled with ARCH=um
13:06<uroboros_>no extra patches, plain vanilla Linux.
13:07<jdike>what filesystem?
13:07<uroboros_>ext3
13:07<jdike>no, where'd you get the root_fs you're booting?
13:08<uroboros_>created with: dd if=/dev/zero of=/path/to/file bs=1M count=1 seek=1024
13:08<uroboros_>mkfs.ext3 /path/to/file
13:08<jdike>Oh, it's a debootstrap filesystem, right
13:08<jdike>?
13:09<uroboros_>host system has ext3 (dmcrypted w/ twofish-cbc-essiv:sha256), ubd is ext3 (no encryption)
13:09<uroboros_>yes, debootstrapped, cdebootstrapped etc. I tried several methods... etch & sarge
13:10<jdike>Can you put the filesystem someplace where I can grab it?
13:10<uroboros_>Yes, but it will take a long time to upload. I am on very lazy line ;)
13:10<jdike>hmm
13:11<uroboros_>I will compress if bzip2 -9 ... but after unpacking, the seek will be lost for you (so you will get reall 1GiB image).
13:11<uroboros_>But first, Iwill check what is the vmlinux doing while frozen, ok?
13:13<uroboros_>One important thing: if frozen, then in uml_mconsole "halt" or "reboot" or whatever freezes as well!
13:14<uroboros_>The only thing that works is "sysrq m" which gives strange values (0 free memory etc.) I will show you...
13:15<uroboros_>http://rafb.net/p/5b08Mw39.txt
13:16<jdike>uroboros_, that's OK
13:16<infowolfe>mornin jdike
13:16<uroboros_>jdike: ok
13:16<infowolfe>how's x86_64 coming? ;-)
13:16<jdike>infowolfe, it turns out that there is 64-bit support in the skas3 patch - just didn't see it at first
13:16<jdike>looking at it more closely now, and figuring out how to fix systemtap
13:17<infowolfe>is the systemtap what's breaking?
13:17<jdike>no
13:17<jdike>that's the tool to see what's going on in the host kernel
13:17<jdike>and if it doesn't work, I can't tell what'
13:17<infowolfe>gotcha
13:17<jdike>s going on in there
13:17<uroboros_>jdike: ps aux ... http://rafb.net/p/WLRTJg22.txt
13:18<jdike>uroboros_, if mconsole is frozen, then something very bad happened to UML
13:18<jdike>can you gdb it and get a stack trace?
13:18<uroboros_>jdike: I do not know how to do it...
13:18<uroboros_>jdike: If you tell me, I will...
13:18<jdike>find the lowest UML pid
13:18<uroboros_>976
13:18<jdike>gdb /path/to/linux 976
13:19<uroboros_>gdb /path/to/the/vmlinux 976 ?
13:19<jdike>yeah
13:19<uroboros_>under the root user? or under the user Irun the uml?
13:19<jdike>there is also a linux, which most people use
13:19<jdike>normal user
13:19<uroboros_>ok
13:20<jdike>Can you boot it with more memory?
13:20<jdike>maybe this is just strange out-of-memory behavior
13:20<uroboros_>mem=128M
13:21<uroboros_>it should be enough
13:21<jdike>whoops, right
13:21<uroboros_>and there is a swap ubd too (1024MiB)
13:21<jdike>I was looking at the 32768 pages
13:21<jdike>and thinking 32M
13:21<jdike>yeah, that should be fine
13:22<jdike>free:28534
13:22<uroboros_>So, no I get the gdb console, what next?
13:22<jdike>bt
13:22<uroboros_>strange, perhaps the swap is not on ;)
13:23<uroboros_>but still the 128M should be enough to run plain system w/ no extra daemons
13:23<uroboros_>right?
13:23<uroboros_>So, now I am in the GDB...
13:23<jdike>yes
13:23<jdike>should be fine
13:23<jdike>it's mostly still free
13:25<uroboros_>http://rafb.net/p/TLQJ5h96.txt <- the GDB at the very moment...
13:25<uroboros_>(I use x2x to transwer pastes between my notebook and table PC, x2x is pretty useful tool)
13:26<jdike>bt
13:26<uroboros_>to type "bt" ?
13:26<uroboros_>ok
13:27<uroboros_>http://rafb.net/p/K8b9xv56.txt
13:27<uroboros_>os_waiting_for_events () ? It seems it waits for events and gets no event?
13:28<jdike>hmmm
13:28<jdike>that looks like one of the helper threads
13:28<uroboros_>?
13:28<jdike>detach from it
13:29<uroboros_>to end gdb?
13:29<jdike>yeah
13:29<uroboros_>ok
13:29<jdike>find a pid file under ~/.uml
13:30<uroboros_>Iwas just thinking about it too... do the same with it as with that thread?
13:30<uroboros_>uml@vger:~$ cat /home/uml/.uml/uml1/pid
13:30<uroboros_>976
13:30<uroboros_>It was the right thread.
13:31<uroboros_>AHA
13:32<uroboros_>Sorry, I pointed the GDB to a symlink.. Now I pointed to the real ELF and got this:
13:32<uroboros_>Terribly sorry...
13:32<uroboros_>http://rafb.net/p/qyGq7w81.txt
13:32<uroboros_>Looks pretty different now.
13:34<jdike>that's not any better
13:35<uroboros_>Can not it be because of the symlink? vmlinux1 I run is symlink to vmlinux (I guess it is nonsensical thought, but... I am no programmer...)
13:35<jdike>I don't think so
13:35<uroboros_>Me to. But just "for case" I will run it without using the symlink...
13:35<jdike>strace it, and paste some output
13:36<uroboros_>strace? It will fail...
13:36<uroboros_>I tried yesterday..
13:36<jdike>strace -p 967
13:36<uroboros_>aha! OK.
13:36<jdike>976
13:37<uroboros_>Wov...
13:38<uroboros_>A lots of outputs........... and then repeating still the same... gettimeofday(..., NULL) = 0
13:38<jdike>OK
13:38<jdike>paste a couple of loops worth
13:38<uroboros_>ok
13:39<uroboros_>give me a minute...
13:41<uroboros_>how long do you want it?
13:42<uroboros_>you know, I really do not know what to choose...
13:42<uroboros_>:-D
13:42<uroboros_>Anything?
13:42<jdike>you don't see a loop?
13:42<jdike>if not, then just paste a lot
13:42<uroboros_>I saw... but now, I am logged in after the strace stopped...
13:42<uroboros_>I will have to run it once more to have the same stace...
13:42<uroboros_>state
13:43<jdike>logged in to UML?
13:43<uroboros_>yes
13:43<jdike>when did that happen?
13:43<uroboros_>I can logout/login/logout/login...
13:43<uroboros_>Somewhere during the strace
13:43<uroboros_>really
13:43<jdike>that happened yesterday too?
13:44<uroboros_>Nope.
13:44<jdike>Oh
13:44<jdike>didn't you do something which gave you a shell?
13:44<uroboros_>no
13:44<uroboros_>I just run the strace
13:44<jdike>what's the host?
13:45<uroboros_>I saw this behaviour once before w/ MPlayer... it was freezing, I run strace mplayer ... and it was not. Withou strace it was...
13:45<uroboros_>the host is my shiny IBM T43
13:45<jdike>host kernel version?
13:45<uroboros_>(a laptop)
13:45<uroboros_>2.6.20
13:45<uroboros_>the same
13:45<uroboros_>plain vanilla
13:46<uroboros_>modular
13:46<uroboros_>HA!
13:46<jdike>with behavior like this, I would suspect the host kernel
13:46<uroboros_>now it freezes again...
13:46<uroboros_>but in the shell while pressing enter
13:46<uroboros_>I will strace it...
13:48<uroboros_>http://rafb.net/p/5IFdjo27.txt
13:48<uroboros_>The loop
13:48<uroboros_>?
13:48<jdike>nothing but gettimeofday?
13:48<uroboros_>and before the loop it looked pretty same as before but I have only 10000 lines of buffer in my xterm.. next time I will redirect it to a file ;)
13:48<uroboros_>jdike: yes
13:49<jdike>10000 lines of gettimeofday?
13:49<uroboros_>And during this strace of the loop the shell was unfreezed and works normally now...
13:49<uroboros_>yes
13:49<uroboros_>very, very fast...
13:50<uroboros_>terribly fast...
13:50<uroboros_>in a second or so? really, very fast...
13:52<uroboros_>http://rafb.net/p/th27Rg95.txt
13:52<uroboros_>Now it freezes on logout...
13:53<uroboros_>It seems to be freezeing randomly...
13:53<uroboros_>(or perhaps periodically, who knows)
13:53<jdike>that's the idle loop
13:53<uroboros_>Want more?
13:53<uroboros_>I can give you longer output...
13:53<uroboros_>I have 10000 lines of it ;)
13:53<jdike>if you can get a strace during a freeze, and it's not all gettimeofday
13:54<uroboros_>ok
13:54<uroboros_>now
13:55<uroboros_>http://rafb.net/p/KEnUOb40.txt
13:56<uroboros_>Enough data? More?
13:56<jdike>that's the same
13:56<jdike>it's the idle loop
13:57<uroboros_>There is nothing else in the strace output...
13:57<uroboros_>I will try to get something else...
13:58<uroboros_>Aha. Now, I logged out (dunno when, sorry)... but Iknow how to freeze it again...
14:00<uroboros_>now it is frozen an I am stracing into a file...
14:01<uroboros_>Aha. If unfreeze the same augenblitz I ctrl-c the strace...
14:01<uroboros_>I have got the whole log.
14:01<uroboros_>It is 220M
14:02<jdike>probably needs some filtering
14:02<uroboros_>It is all the same:
14:02<uroboros_>do you have some tool to filter?
14:02<jdike>no
14:03<uroboros_>http://rafb.net/p/fN5iM094.txt
14:03<jdike>Ah ha
14:03<jdike>someone else just reported that
14:03<uroboros_>:)
14:04<uroboros_>Ok, now Iknow how to report things... thanks for your teaching me...
14:04<uroboros_>:)
14:04<uroboros_>Is there a patch?
14:04<uroboros_>Ok, sorry, "just" ;)
14:05<uroboros_>What exactly is the cause of it?
14:06<uroboros_>Can I get rid of it somehow? (not to include something in the .config, or include something in the .config?) Can I work arround or is it so lowlevel that it will need kernel patching?
14:06<jdike>don't know
14:07<uroboros_>If you need more information about this bug, I can give you. Just tell me what to do.
14:07<jdike>OK
14:08<uroboros_>So, what would you suggest me? to switch to some other UML version, say 2.6.what_number? Which worked just fine for you? Or it will not help, because it is bug in host's 2.6.20?
14:08<jdike>actually, there is
14:09<jdike>no, it's a UML bug
14:09<uroboros_>ok.
14:09<jdike>gdb it again and 'p write_sigio_pid'
14:09<jdike>then strace that pid during a hang
14:10<uroboros_>the gdb - while frozen or nevemind? to run gdb and strace >file together?
14:10<uroboros_>at the same time?
14:10<jdike>no
14:10<uroboros_>one by one, ok
14:10<jdike>get the value of write_sigio_pid first
14:10<jdike>that doesn't change
14:10<uroboros_>ok
14:11<jdike>then freeze it, then strace that pid
14:11<uroboros_>great, interesting, I will... give me a second...
14:12<jdike>also, is this the same UML as before?
14:12<jdike>if so, 'ls /proc/976/fd' on the host
14:12<jdike>ls -l /proc/976/fd
14:12<jdike>and tell me what 8 is
14:14<jdike>actually, forget that for the moment
14:14<jdike>just get the strace
14:15<uroboros_>frozen, stracing...
14:15<uroboros_>http://rafb.net/p/b8AC6946.txt
14:16<uroboros_>the beginning
14:17<uroboros_>it is the strace of the pid I got from the gdb p write_sigio_pid
14:18<uroboros_>And the ls ...
14:18<uroboros_>976
14:18<uroboros_>http://rafb.net/p/IiRKfI55.txt
14:18<uroboros_>1007 (the pid I got from the gdb)
14:18<uroboros_>http://rafb.net/p/lx5L3a23.txt
14:20<jdike>do the /proc thing I mentioned earlier
14:20<jdike>except tell me what 10 is
14:21<uroboros_>ok
14:21<uroboros_>ls /proc/976/fd
14:21<uroboros_>this one?
14:21<jdike>yup
14:21<jdike>ls -l
14:21<uroboros_>uml@vger:~$ ls /proc/976/fd
14:21<uroboros_>0 1 10 11 12 13 14 15 16 17 18 19 2 3 4 5 6 7 8 9
14:21<uroboros_>uml@vger:~$ ls /proc/976/fd/10
14:21<uroboros_>/proc/976/fd/10
14:22<jdike>ls -l
14:22<uroboros_>sorry, I wanted to paste to pastebin... :(
14:22<uroboros_>ok
14:22<uroboros_>sorry for the paste (hand was quicker than brain)
14:23<uroboros_>http://rafb.net/p/Vwjxtc60.txt
14:23<uroboros_>This way?
14:24<uroboros_>8 is cow, 10 is some socket
14:24<jdike>yup
14:24<jdike>I wish there was more information there about sockets
14:24<uroboros_>How can I get it?
14:24|-|tyler [~tyler@89.98.142.26] has quit [Ping timeout: 480 seconds]
14:25<jdike>you can't
14:25<uroboros_>:(
14:25<jdike>can you gdb UML again
14:25<uroboros_>yes
14:25<uroboros_>anything
14:25<jdike>we need to go looking for that descriptor
14:25<uroboros_>I love UML and want it to be repaired :)
14:25<jdike>print the values of the following:
14:25<uroboros_>ok
14:25<jdike>sigio_private
14:26<jdike>write_sigio_fds
14:26<uroboros_>p sigio_private? (p stands for print? ok)
14:26<jdike>yes
14:26<uroboros_>another one?
14:26<jdike>actually, there's a better way
14:27<jdike>p *active_fds
14:27<jdike>then p *$.next
14:27<jdike>then just hit enter until gdb complains at you
14:28<uroboros_>hitting enter still the same:
14:28<uroboros_>http://rafb.net/p/hZckPU37.txt
14:29<jdike>does this have debugging information enabled?
14:29<jdike>coz none of that makes sense
14:30<uroboros_>will check .config
14:30<uroboros_>Huh! I am afraid not. :(
14:31<uroboros_>I will recompile it. Can you wait 10 minutes?
14:31<jdike>OK, turn CONFIG_DEBUG_INFO
14:31<jdike>yup
14:31<uroboros_>Sorry for that :((((((((
14:31<jdike>turn on
14:31<jdike>np
14:32<uroboros_>Very said of this, I must have switched it off but Ido not remember me doing it :((
14:32<uroboros_>s/said/sad/
14:34<uroboros_>So, I will save .config, make mrproper, copy it back, make oldconfig, menuconfig -> switch the debuuginf info on and recompile, correct?
14:34<uroboros_>I will take more than 10 minutes, because lots of things are enabled in the kernel due to my needs of networking stuff to test etc...
14:34<uroboros_>But we need the kernel be exactly the same expect there will be debug on, right?
14:35<uroboros_>So doing just plain defconfig would not sutisfy the needs, or yes? It will be much faster without all my modules compiled in...
14:36<uroboros_>re-compilation started...
14:37<uroboros_>I go shopping, will be back in 10 minutes to check whether it is compiled yet or not :-D
14:38<uroboros_>I need to buy something to eat. My refrigerator is totally empty ;)
14:39|-|mnky [~nobody@p54B3D217.dip.t-dialin.net] has quit [Quit: adios]
14:42<jdike>yes, the same except for DEBUG_INFO
14:42<jdike>so rush, I'm eating lunch right now
14:43<uroboros_>back :)
14:44<uroboros_>the shop is only 20 meters from my house :)
14:44<uroboros_>I love my city-part
14:45<uroboros_>I switched Detect Soft Lookups as well, is it alright? It was default, it seems to be...
14:46<jdike>that's fine
14:46|-|pgstudy [~pgstudy@bzq-84-108-64-51.cablep.bezeqint.net] has joined #uml
14:47<uroboros_>Shit. I see I made a mistake. Need to recompile once more... sorry...
14:47<uroboros_>s/Shit/\*\*\*/ :-)
14:48<uroboros_>I switched debug on but not eh DEBUG_INFO, what a stupid person I am! :(
14:49<uroboros_>I know a bit C (ANSI) and one day Iwould like to programm to kernel. Where should I start? What should I read? I like to specialize and I love plain ANSI C. I do not like to programm "programs" so I think kernel hacking would be just fine for me.
14:49<uroboros_>What would you suggest to me to start with it?
14:51|-|pgstudy [~pgstudy@bzq-84-108-64-51.cablep.bezeqint.net] has quit []
14:51|-|pgstudy [~pgstudy@bzq-84-108-64-51.cablep.bezeqint.net] has joined #uml
14:53<uroboros_>Now, the kernel is compiling... sorry for the delay.
14:56<uroboros_>jdike: Do you wanna see my .config file (as an additional information)?
15:06<uroboros_>vmlinux compiled.
15:12<uroboros_>jdike: I got the information for you
15:15<uroboros_>http://rafb.net/p/QjJz0k33.html
15:15|-|writerz_ [~mode2@ABordeaux-256-1-36-215.w90-11.abo.wanadoo.fr] has joined #uml
15:15<uroboros_>Or rather textually here:
15:15<uroboros_>http://rafb.net/p/QjJz0k33.txt
15:15<uroboros_>jdike: still here?
15:16<uroboros_>It hanged before the login screen now ;)
15:16<uroboros_>And it remained hanging... (more than 11 minutes now)
15:17<uroboros_>Soft lockup was detected
15:17<uroboros_>Here:
15:18<uroboros_>http://rafb.net/p/LXQTWX22.txt
15:20|-|mode2_ [~mode2@ABordeaux-256-1-186-213.w90-16.abo.wanadoo.fr] has quit [Ping timeout: 480 seconds]
15:21<jdike>Do you have a strace of 17537?
15:22<uroboros_>Ican have
15:22<jdike>during a hang
15:23<uroboros_>yes, it is in the end ...
15:23<uroboros_>$SPID
15:23<uroboros_>last 5 lines or so...
15:23<uroboros_>strace -p $SPID
15:23<uroboros_>Process 17537 attached - interrupt to quit
15:23<uroboros_>poll( <unfinished ...>
15:23<uroboros_>Process 17537 detached
15:23<jdike>nothing there
15:23<uroboros_>yep
15:24<jdike>that wasn't during one of these hangs?
15:24<uroboros_>That's all I can get
15:25<uroboros_>oh, sorry
15:25<uroboros_>Wait a moment...
15:26<uroboros_>I can not freeze it now :(
15:26<uroboros_>The .config is exactly the same, but the soft lockups option on and the debig info on...
15:27<uroboros_>Sorry, it was not frozen (I just forgot to press enter inside the screen)
15:28<uroboros_>NOW! Great!
15:29<uroboros_>But only to a few seconds... perhaps it was no freeze at all:
15:29<uroboros_>http://rafb.net/p/HBcuI630.txt
15:30|-|rw__ [~richardw@62.47.197.58] has joined #uml
15:31<uroboros_>Another one (longer, about 10 seconds...)
15:31<uroboros_>http://rafb.net/p/XFniiT82.txt
15:33<uroboros_>http://rafb.net/p/z3hJj279.txt <- approx. 12 seconds.
15:34<uroboros_>Everytime new "BUG: soft lockup detected on CPU#0! ... the stuff ... " occures in the console I am running uml from...
15:34<jdike>OK, I need to think about this
15:35<uroboros_>But changing the .config options changed something... I can not freeze it for as long time as before without the debugging stuff in it...
15:35<uroboros_>Why?
15:35<jdike>might be timing-related
15:35<jdike>that could explain the strace behavior
15:36|-|richardw_ [~richardw@M264P025.adsl.highway.telekom.at] has quit [Ping timeout: 480 seconds]
15:36<uroboros_>Can not the detection of soft lockups do something with the fact if unfreeze after about 10 minutes of frozen state?
15:37<jdike>it could
15:37<uroboros_>Sorry, not "everytime" the message about "BUG: soft lockup" occures, only sometimes...
15:37<uroboros_>To be precise..
15:39<uroboros_>Ok, well.
15:39<uroboros_>No.
15:39<uroboros_>Sorry again.
15:40<uroboros_>It occures just when the strace causes it to be unfrozen. That's exactly how it behaves.
15:40<uroboros_>If freezes -> strace -> unfreeze -> new message about "BUG: ... "
15:40<uroboros_>It happens exactly this way.
15:41<uroboros_>I am sure that the strace process causes that it unfreeze,
15:41<uroboros_>I will check it more precisely.
15:43<uroboros_>Now I am not so sure. :(
15:44<uroboros_>jdike: Can I make more debuggings for you?
15:44<jdike>hold on, I'm staring at the code, trying to figure out how this happened
15:44<uroboros_>OK.
15:44<uroboros_>So now *I* will have my lunch. :-)
15:47<jdike>actually, I do need some more info
15:48<uroboros_>ok
15:48<uroboros_>let's tell me :)
15:49<jdike>p all_sigio_fds
15:50<uroboros_>wasn't it inluded ?
15:50<uroboros_>included?
15:50<jdike>nope
15:50<jdike>I haven't asked for that before
15:50<uroboros_>ah, of course
15:51<jdike>and for short stuff, just paste it here
15:51<uroboros_>http://rafb.net/p/8Iubik80.txt
15:51<uroboros_>too late ;)
15:51<uroboros_>(x2x -> great util)
15:51<jdike>OK
15:52<jdike>set $i=0
15:52<uroboros_>ok
15:52<uroboros_>done
15:52<jdike>p all_sigio_fds[$i++]
15:52<jdike>and hit return 13 times
15:52<jdike>oops
15:52<uroboros_>ee
15:52<jdike>p all_sigio_fds[$i++].poll
15:52<jdike>and hit return 13 times
15:52<jdike>NO
15:53<jdike>one more try
15:53<jdike> p all_sigio_fds.poll[$i++]
15:53<jdike>and reset $i back to 0 before that
15:54<uroboros_>http://rafb.net/p/cD8HOc34.txt
15:54<uroboros_>(I am not sure whether I did not occasionaly pressed the enter 14x)
15:54<jdike>OK
15:55<jdike>doesn't matter
15:55<uroboros_>OK
15:55<jdike>it was $6
15:55<uroboros_>what was $6?
15:55<jdike>file descriptor 10
15:55<uroboros_>I see :)
15:55<uroboros_>OK
15:56<jdike>OK
15:56<jdike>in gdb, 'call isatty(10)'
15:56<uroboros_>$16 = 0
15:57<jdike>hmm
15:57<uroboros_>hmm ;)
15:58<jdike>want to test a patch?
15:58<uroboros_>UML is pretty frozen, I can not type in login name...
15:58<uroboros_>Yes!
15:58<jdike>a debugging patch, not a fix patch
15:59<uroboros_>I'd love to...
15:59<jdike>give me some kind of clue what's happening
15:59<uroboros_>clue?
15:59<uroboros_>ah, not me, but the patch ;)
16:00<uroboros_>I am a bit egocentric sometimes (especially late in the evening) :)
16:01<uroboros_>the 'call isatty(10)' really called a kernel function called isatty() ? this first time Iuse gdb (and reading some intros adn howtos between I work for you)...
16:02<uroboros_>btw, the UML is definitely frozen now...
16:04<jdike>wget http://rafb.net/p/7i1bsb92.txt | patch -p1
16:05<uroboros_>inside /usr/src/linux-2.6.20-um (that's what's the dir. w/ UML called on my system)
16:05<uroboros_>I am not very sure with the wget, I will rather download it and then do patch -p1 < file, ok?
16:07<jdike>fine
16:07|-|Urgleflogue [~plamen@87-126-143-181.btc-net.bg] has quit [Ping timeout: 480 seconds]
16:07<jdike>it should have been wget -O - -q anyway
16:07<uroboros_>yep
16:07<uroboros_>patch -p1 < jdike-071402-1.patch
16:07<uroboros_>(Stripping trailing CRs from patch.)
16:07<uroboros_>patching file arch/um/os-Linux/sigio.c
16:07<uroboros_>patch unexpectedly ends in middle of line
16:07<uroboros_>Hunk #3 succeeded at 347 with fuzz 1.
16:08<uroboros_>is this correct? not sure...
16:08<jdike>hmm
16:08<jdike>actually, I think that's OK
16:08<uroboros_>ok
16:08|-|Urgleflogue [~plamen@87-126-143-181.btc-net.bg] has joined #uml
16:08<uroboros_>do I need to make mrproper before make ARCH=um vmlinux ?
16:08<jdike>just the make
16:08<uroboros_>ok
16:09<jdike>the second make
16:09<uroboros_>'course
16:09<uroboros_>compiling...
16:09<jdike>BTW, the call isatty() called the libc isatty inside UML
16:09<uroboros_>aha, ok
16:10<uroboros_>I do not know whole libc yet
16:10<uroboros_>thanks for explanation, I will check it later,,,
16:10<jdike>gotta shovel snow, back in ~30 minutes
16:10<uroboros_>OK, I will do the same...
16:10<uroboros_>I will kill the hanging poor UML now...
16:53<caker>jdike: oh --
16:53<caker>was/is the free_irq() bug the same as the AS iosched bug?
16:54<caker>I rolled a 2.6.20-um and some are hitting it
16:55<uroboros_>jdike: back again... can continue ...
16:57|-|cmantito [~gphreak@c-68-39-18-162.hsd1.nj.comcast.net] has quit [Ping timeout: 480 seconds]
16:58|-|cmantito [~gphreak@c-68-39-18-162.hsd1.nj.comcast.net] has joined #uml
17:03<uroboros_>jdike: I will be back in 10 mins.
17:04|-|rw__ [~richardw@62.47.197.58] has quit [Quit: Leaving]
17:13<uroboros_>I am here...
17:14|-|infowolfe [~infowolfe@c-67-164-195-129.hsd1.ut.comcast.net] has quit [Read error: Connection reset by peer]
17:21<jdike>back
17:22<uroboros_>jdike: my patche uml is ready to be run
17:22|-|kos_tom [~thomas@humanoidz.org] has quit [Quit: I like core dumps]
17:23<jdike>OK, run it
17:23<jdike>and look for "maybe_sigio_broken" lines in the boot output
17:24<uroboros_>a lot of
17:24<uroboros_>wanna see?
17:24<uroboros_>http://rafb.net/p/W3QsdW94.txt
17:25<uroboros_>(I am not connected to any pts yet)
17:27<jdike>I want to see
17:27<jdike>maybe_sigio_broken - irq fd <n>
17:27<jdike>and then the output associated with that
17:28<uroboros_>ok, I will try to freeze it and we will see if we will see (it) or not..
17:29<uroboros_>ok, uml freezes... no new line maybe_sigio_broken occured...
17:29<uroboros_>I will strace it (it will possibly unfreeze)
17:31<uroboros_>In console running UML: http://rafb.net/p/GAA7lZ87.txt
17:31<uroboros_>In UML itself (very strange):
17:32<uroboros_>http://rafb.net/p/HEMmqL89.txt
17:32<uroboros_>(the ok, ok,)
17:33<uroboros_>aha, perhaps it was pasted accidentaly in...
17:33<uroboros_>OK, so just the BUG occured, no new maybe_sigio_broken line...
17:33<jdike>Can you paste the entire boot output?
17:33<uroboros_>Yes.
17:34<jdike>I can't believe it's not there
17:34<uroboros_>I was not checking the whole boot...
17:36<uroboros_>http://rafb.net/p/jiSAGV58.txt
17:36<uroboros_>Here, whole bootup...
17:36<uroboros_>Ah, there is really no swap file... :-D I see now...
17:37<uroboros_>Ok, but that does no matter at all...
17:37<uroboros_>I can create one if you wish..
17:37<jdike>no
17:38<jdike>Can you gdb it again and look at all_sigio_fds.poll[$i++]?
17:39<uroboros_>the from p write_sigio_pid
17:39<uroboros_>?
17:39<uroboros_>this pid?
17:39<uroboros_>or the main pid
17:39<jdike>that would work
17:39<jdike>either is fine
17:40<uroboros_>(gdb) set $i=0
17:40<uroboros_>(gdb) p all_sigio_fds.poll[$i++]
17:40<uroboros_>$1 = {fd = 4, events = 3, revents = 0}
17:41<jdike>keep going
17:41<jdike>just hit return
17:41<uroboros_>may I paste it in here?
17:41<jdike>rasb
17:41<jdike>rafb
17:42<uroboros_>http://rafb.net/p/3pm4Np60.txt
17:42<uroboros_>continue.... ?
17:43<jdike>hold on
17:44<uroboros_>another soft lockup detected (uml is frozen now)
17:45<jdike>$5 = {fd = 10, events = 3, revents = 0}
17:45<jdike>maybe_sigio_broken - irq fd 10
17:45<jdike>maybe_sigio_broken - isatty(10) returns 0
17:45<jdike>so, how did fd 10 get there
17:46<uroboros_>dunno :-)
17:47<jdike>hehe
17:47<jdike>WHY NOT!
17:47<uroboros_>???
17:47<jdike>... joke ...
17:48<jdike>actually, I may have found it
17:48<uroboros_>Great!
17:48<jdike>hold on
17:48<uroboros_>I really wanna see the snippet of the code that causes this! :-)
17:49|-|ram [~ram@pool-71-245-96-74.nycmny.fios.verizon.net] has quit [Ping timeout: 480 seconds]
17:52<jdike>it's stupid
17:52<jdike>there's a standard idiom where you have a growable array
17:53<jdike>you say "I want N elements" and, if the array isn't big enough, it allocates N elements, copying the old array into the new one
17:54<jdike>well, I had such a thing, except I took a shortcut of not copying the elements because the caller would do it
17:54<jdike>except, as part of a cleanup some time ago, I used that in a case where it did need to copy them
17:55<jdike>so that pollfds structure is essentially uninitialized memory
17:55<uroboros_>I understand.
17:55<uroboros_>How do you navigate to the point in the source code when obtaining such information I gave you?
17:56<jdike>by booting up a UML here and looking at the same stuff you were looking at
17:57<jdike>and seeing garbage
17:57<jdike>and thinking, that looks like uninitialized memory
17:59<uroboros_>what exactly does the '
17:59<uroboros_>list' command in gdb?
18:00<uroboros_>It seems it navigates somewhere to the source code?
18:01|-|ram [~ram@bi01p1.co.us.ibm.com] has joined #uml
18:04<jdike>yup
18:04<uroboros_>that's marvellous!
18:04<caker>jdike: http://www.theshore.net/~caker/uml/panics/free_irq-2.6.20.txt
18:05<uroboros_>How did I alive without GDB before?
18:05<jdike>caker, the real bug is this:
18:05<jdike>09fff3f0: [<08081de7>] panic+0x70/0x102
18:05<jdike>09fff408: [<08066190>] segv+0x26e/0x29c
18:06<caker>jdike: meaning? :)
18:07<jdike>and there's code on that stack which is in the middle of what I'm fixing
18:07<jdike>I don't see a segfault being fixed, but maybe there's something I'm missing
18:08<caker>well, those traces are *after* he issued a shutdown of the UML -- he said there were more traces before then (which is why he was rebooting) but those were lost when the log was rotated
18:09<jdike>OK, let me fix uroboros_'s problem, and I cc the patch to you
18:09<caker>ok
18:10<uroboros_>:)
18:10<uroboros_>Will the fix be in next kernel step-release?
18:10<uroboros_>(just asking)
18:11<uroboros_>caker: fake_ide fakehd <- what are there? never heard of them...
18:12<caker>uroboros_: http://user-mode-linux.sourceforge.net/switches.html
18:12<jdike>it's to fake out some installation procedures
18:12<caker>yeah, cosmetic only
18:12<uroboros_>thanks
18:16<uroboros_>iomem - can this be used to give UML more memory using a file when host system lacks free memory needed to run UML? (just a question, I've got plenty of memory, but wanna understand it properly)
18:19<jdike>no
18:20<jdike>uroboros_, caker - http://rafb.net/p/0VqvIm27.txt
18:20<caker>hot off the press!
18:22<jdike>uroboros_, it will be in the next kernel if you tell me it works :-)
18:22<caker>jdike: what does this fix exactly?
18:22<jdike>I don't see it fixing a segfault
18:23<jdike>it kept an array of host file descriptors which needed special handling wrt SIGIO
18:23<uroboros_>jdike: would you mind Itest it tomorrow evening? I need to go sleep. OK, well, I have about 10 minutes to compile the kernel, so Ican make a short quick dirty test today and precise tomorrow, OK?
18:23<jdike>uroboros_, yup
18:23<jdike>this bug made most of that array be initialized memory
18:23<jdike>uninitialized memory
18:24<jdike>so it was looking at random numbers as file descriptors
18:24<jdike>except, I guess one of those random numbers was the descriptor used by a thread to communicate with the UML
18:25<jdike>and it was this that caused the hang
18:26<jdike>uroboros_, BTW back out the previous patch before applying this one
18:27<uroboros_>jdike: previous patch was removed...
18:27<jdike>OK
18:28<uroboros_>patch -p1 < jdike-20071502.patch
18:28<uroboros_>(Stripping trailing CRs from patch.)
18:28<uroboros_>patching file arch/um/os-Linux/sigio.c
18:28<uroboros_>patch unexpectedly ends in middle of line
18:28<uroboros_>Hunk #4 succeeded at 334 with fuzz 1.
18:28<uroboros_>Is this ok?
18:28<jdike>same as before, so yes
18:28<uroboros_>OK
18:28<jdike>the fuzz is because you're on a slightly different kernel than me
18:28<uroboros_>I thought o
18:28<uroboros_>so
18:28<jdike>I think the middle of line thing is a rafb oddness
18:29<uroboros_>Ido not see anything suspicious in the file in vi ...
18:30<uroboros_>vmlinux is compiling...
18:41|-|infowolfe [~infowolfe@c-67-164-195-129.hsd1.ut.comcast.net] has joined #uml
18:42<uroboros_>Seems to be working. Will test tomorrow and the day after tomorrow. Right? And, Jeff, thanks a lot for your patience and the patch! :)
18:43<uroboros_>If you have some ideas what else I can do for UML, tell me, please. I like to help when I can...
18:43<uroboros_>jdike, caker: Good night.
18:43<jdike>cool
18:43<jdike>let me know if it continues to work
18:44<uroboros_>I will test it tersily tomorrow...
18:44<uroboros_>I will force the UML to do something more than just login/logout ;)
18:44<uroboros_>Ideas appreciated...
18:44<uroboros_>:)
18:44<uroboros_>See ya.
18:44<uroboros_>s/ya/you/
18:46<uroboros_>Send me please the patch for the caker's problem as well, please. Oh, and would you be so kind and send me the patches to uml{at}ligatura.org ? Just for case I lose them somehow to have them backuped in the email... :)
18:46<uroboros_>Thanks.
18:47<uroboros_>So, I really go sleep, bye bye...
18:47<jdike>uroboros_, I'll CC you
18:47<uroboros_>OK:)
18:51[~]infowolfe pokes jdike on x86_64
18:51<infowolfe>:-p
18:51<jdike>heh
18:56|-|infowolfe [~infowolfe@c-67-164-195-129.hsd1.ut.comcast.net] has quit [Read error: Connection reset by peer]
19:14|-|infowolfe [~infowolfe@c-67-164-195-129.hsd1.ut.comcast.net] has joined #uml
19:17<jdike>infowolfe, what performance problem are you seeing exactly?
19:17<infowolfe>jdike, other than skas3 resulting in oops?
19:17<jdike>yeah, like when you force skas0
19:18<infowolfe>i'd just like skas3 to work :-p
19:30|-|infowolfe [~infowolfe@c-67-164-195-129.hsd1.ut.comcast.net] has quit [Read error: Connection reset by peer]
19:56|-|jdike [~jdike@pool-71-174-247-179.bstnma.fios.verizon.net] has quit [Quit: Leaving]
20:11|-|infowolfe [~infowolfe@c-67-164-195-129.hsd1.ut.comcast.net] has joined #uml
20:30|-|writerz_ [~mode2@ABordeaux-256-1-36-215.w90-11.abo.wanadoo.fr] has quit [Ping timeout: 480 seconds]
20:33|-|mode2 [~mode2@ABordeaux-256-1-41-81.w90-11.abo.wanadoo.fr] has joined #uml
21:02|-|ram [~ram@bi01p1.co.us.ibm.com] has quit [Ping timeout: 480 seconds]
21:32|-|SNy [101b43d93c@bmx-chemnitz.de] has quit [Ping timeout: 480 seconds]
21:51|-|SNy [f2e0027f4d@bmx-chemnitz.de] has joined #uml
21:54|-|Urgleflogue [~plamen@87-126-143-181.btc-net.bg] has quit [Ping timeout: 480 seconds]
21:54|-|albertito [~net@dsl-201-219-69-170.users.telpin.com.ar] has quit [Remote host closed the connection]
21:58|-|albertito [~net@dsl-201-219-69-170.users.telpin.com.ar] has joined #uml
22:01|-|SNy_ [d450e9923a@bmx-chemnitz.de] has joined #uml
22:01|-|SNy [f2e0027f4d@bmx-chemnitz.de] has quit [Read error: Connection reset by peer]
22:02|-|SNy_ changed nick to SNy
22:13|-|albertito [~net@dsl-201-219-69-170.users.telpin.com.ar] has quit [Quit: grrrrr]
22:17|-|albertito [~net@dsl-201-219-69-170.users.telpin.com.ar] has joined #uml
22:39|-|Nem^1 [~Nem@dslb-084-056-234-151.pools.arcor-ip.net] has joined #uml
22:47|-|Nem^ [~Nem@dslb-084-057-234-014.pools.arcor-ip.net] has quit [Ping timeout: 480 seconds]
22:47|-|Nem^1 changed nick to Nem^
22:59|-|flatronf700B [~flatronf7@202.75.186.154] has joined #uml
22:59|-|VS_ChanLog [~stats@ns.theshore.net] has left #uml [Rotating Logs]
22:59|-|VS_ChanLog [~stats@ns.theshore.net] has joined #uml
---Logclosed Thu Feb 15 00:00:35 2007