Back to Home / #uml / 2007 / 01 / Prev Day | Next Day
#uml IRC Logs for 2007-01-22

---Logopened Mon Jan 22 00:00:48 2007
00:53|-|ram [~ram@pool-71-245-96-74.nycmny.fios.verizon.net] has quit [Ping timeout: 480 seconds]
00:59|-|HuK0B [~HuK0B@89.190.202.6] has joined #uml
01:14|-|motp [~motp@83.236.181.13] has joined #uml
01:20<HuK0B>hmm.. I have strange problem already to 2 uml servers. Sometimes I don't know my fs is broken or something but when I write to some place it give me kernel panic and uml stop
04:32|-|motp [~motp@83.236.181.13] has quit [Quit: Leaving]
08:37<dgraves>morning.
11:02|-|hfb [~hfb@pool-71-160-242-6.lsanca.dsl-w.verizon.net] has joined #uml
11:40|-|ram [~ram@pool-71-245-96-74.nycmny.fios.verizon.net] has joined #uml
12:09|-|hfb [~hfb@pool-71-160-242-6.lsanca.dsl-w.verizon.net] has left #uml [Leaving]
12:10<HuK0B>hmm i got some strange errors
12:10<HuK0B>end_request: I/O error, dev ubdc, sector 27633040
12:10<HuK0B>do_io - write failed err = 28 fd = 22
12:10<HuK0B>end_request: I/O error, dev ubdc, sector 27633048
12:10<HuK0B>do_io - write failed err = 28 fd = 22
12:10<HuK0B>Kernel panic - not syncing: switch_mm_skas - PTRACE_SWITCH_MM failed, errno = 3
12:10<HuK0B>any ideas?
12:23<dgraves>HuK0B: is your ubd backing file sparse?
12:24<HuK0B>? didn't understand you what you mean
12:24<HuK0B>today is some kind of bad day almost 50% of my umls just stoped
12:24<HuK0B>and they got kernel panic when somebody write to some dir
12:24<HuK0B>I checked fss many times
12:26<dgraves>HuK0B: the file you have mounted on ubdc. does its ls -s size match its ls -l size?
12:28<HuK0B>no don't match
12:30<HuK0B>Buffer I/O error on device ubda, logical block 2590062
12:30<HuK0B>lost page write due to I/O error on ubda
12:30<HuK0B>do_io - write failed err = 28 fd = 11
12:30<HuK0B>end_request: I/O error, dev ubda, sector 20720504
12:30<HuK0B>Buffer I/O error on device ubda, logical block 2590063
12:30<HuK0B>lost page write due to I/O error on ubda
12:30<HuK0B>do_io - write failed err = 28 fd = 11
12:30<HuK0B>strange errors I got this errors
12:30<HuK0B>and after some time
12:30<HuK0B>EIP: 0073:[<400ed5e8>] CPU: 0 Not tainted ESP: 007b:bfb5b1c0 EFLAGS: 00000212
12:30<HuK0B> Not tainted
12:30<HuK0B>EAX: ffffffda EBX: 00000011 ECX: bfb5b660 EDX: 00000006
12:30<HuK0B>ESI: 00000006 EDI: 4014df40 EBP: bfb5b1d8 DS: 007b ES: 007b
12:30<HuK0B>083b77d0: [<0807273c>]end_request: I/O error, dev ubda, sector 20720520
12:30<HuK0B>Buffer I/O error on device ubda, logical block 2590065
12:30<HuK0B>lost page write due to I/O error on ubda
12:30<HuK0B> show_regs+0xb4/0xb6do_io - write failed err = 28 fd = 11
12:30<HuK0B>....
12:30<HuK0B>083b77fc: [<0805fe85>] panic_exit+0x25/0x3f
12:30<HuK0B>083b780c: [<08086603>] notifier_call_chain+0x1c/0x3c
12:30<HuK0B>083b782c: [<08086699>] atomic_notifier_call_chain+0x11/0x16
12:30<HuK0B>083b7840: [<0807a2ce>] panic+0x4b/0xd8
12:30<HuK0B>...
12:31<dgraves>HuK0B: is your root filesystem on your host (or whatever filesystem you have these created on) full?
12:31<HuK0B>and uml stop
12:31<HuK0B>no there are enought space left
12:34<HuK0B>hmm strange
12:34<HuK0B>it is full but Avail 0 Used 283G Size 294G
12:34<HuK0B>and when I remove something it is full again
12:34<HuK0B>why?
12:46<HuK0B>ok right tnx for help
12:46|-|jdike [~jdike@pool-71-174-247-179.bstnma.fios.verizon.net] has joined #uml
12:46<jdike>Hi guys
13:13|-|kos_tom [~thomas@humanoidz.org] has joined #uml
13:16|-|kokoko1 [~Slacker@203.148.65.8] has joined #uml
13:16<kokoko1>hiya
13:18<dgraves>jdike: hey, the raw stuff worked.
13:18<dgraves>HuK0B: sorry, i had to step out for a bit.
13:18<dgraves>HuK0B: the problem is probably that your backing files are sparse. how did you create them?
13:19<jdike>cool
13:19<jdike>I expected it would
13:19<jdike>kokoko1, have you seen whether your UMLs are dropping core files yet?
13:20|-|kos_tom [~thomas@humanoidz.org] has quit [Quit: I like core dumps]
13:21|-|kos_tom [~thomas@humanoidz.org] has joined #uml
13:22<dgraves>jdike: thanks. we had to enable RAW DEV AND MAX_RAW_DEVICES but it works as expected. :)
13:23<jdike>right
13:23<jdike>do you have a patch I can forward to mainline?
13:23<dgraves>::LOL::
13:23<dgraves>nope.
13:24<dgraves>developer had changed so much else it wasn't funny.
13:24<dgraves>i'll see if i can whip one up for you.
13:25|-|richardw [~richardw@M260P009.adsl.highway.telekom.at] has joined #uml
13:27<jdike>hehe
13:27<jdike>what's there to change?
13:27<jdike>dump a couple of config declarations in Kconfig.char or whatever and away you go
13:27<dgraves>right. exactly.
13:28<dgraves>in fact, that's all we did.
13:28<dgraves>however, the developer had prechanged a lot of things.
13:28<dgraves>so his tree wasn't good for a patch baseline.
13:28<dgraves>and he didn't have quilt setup. ;)
13:28<jdike>OK
13:33<dgraves>jdike: what's the patch command line you like me to use?
13:33<dgraves>diff, i mean.
13:36<jdike>diff -Nur
13:36<jdike>at the root of the kernel tree
13:37<kokoko1>jdike, sorry i was away
13:37<kokoko1>jdike, nope its not dropping core files :(
13:37<jdike>and it's still dying
13:37<kokoko1>yes :(
13:38<jdike>well, I'd at least like the exit status from the UML
13:38<jdike>that will tell me something
13:38<kokoko1>dgraves, howdy
13:39<dgraves>kokoko1: heya.
13:40<dgraves>jdike: email to?
13:40<dgraves>sorry, lost my address book.
13:40<kokoko1>allmy observation is , this uml keep dying after we start using SA (spamd) on our mail server
13:42<dgraves>jdike: its in the mail.
13:42<dgraves>hope i did it right.
13:42<dgraves>i need to set up quilt again, lost it on my box.
13:42<dgraves>lost my box too. :)
14:07|-|tyler [~tyler@89.98.144.15] has joined #uml
14:18|-|HuK0B [~HuK0B@89.190.202.6] has quit [Ping timeout: 480 seconds]
14:18<kokoko1>heh, dgraves you lost it ? :P
14:26<jdike>kokoko1, can you boot a test UML, send it a SIGABRT and see if it dumps core?
14:26<jdike>dgraves, tx
14:28<dgraves>kokoko1: yeah. :) it died in a gentoo update.
14:28<dgraves>so i went to kubuntu.
14:28<kokoko1>jdike, sure i'll let you know atm doing some important work
14:28<jdike>OK
14:28[~]kokoko1 rebooting xen hosts into new kernel-xen :S
14:29<kokoko1>heh i am kinda nerves when doing these remote reboots
14:29<kokoko1>okay here one host come back :D
14:30<kokoko1>jdike, i am tird of FC :(
14:30<jdike>why?
14:31<kokoko1>lot of time spend on updating the machines and vms
14:31<kokoko1>lot of updates each day :(
14:31<kokoko1>now see this 2.6.19-1.2895.fc6xen
14:31<jdike>Oh
14:31<jdike>there are lots of updates, but they don't take me a lot of time
14:31<jdike>just hit 'y' a few times and in they come
14:31<kokoko1>yep same here
14:32<kokoko1>but machines are 30+
14:32<jdike>yeah
14:32<kokoko1>imean vms + hosts
14:32<jdike>I have 6-7
14:33<kokoko1>i tried to convice boss to switch to centos but he didn't agreed :S
14:33<kokoko1>fedora is not for production IMO
14:33<kokoko1>even fc ppl says we can't recommend FC for production use
14:34<jdike>what is, then?
14:34<jdike>spending money on RHEL?
14:35<jdike>I suppose you can dl it for free too
14:36<kokoko1>RHEL is only valid for 30 days, after taht you will not get updates :S
14:36<kokoko1>centos == RHEL
14:36<kokoko1>jdike, interesting
14:37<jdike>OK, centos is RHEL with > 1 month of updates?
14:37<jdike>what's interesting?
14:37<kokoko1>i just reboot one of xen host, and host uptime and its vms uptime is different
14:37<kokoko1>look like xen save running vm stat to file
14:37<kokoko1>and start it from right there
14:51<kokoko1>jdike, you were on vacations ?
14:51<jdike>not really
14:51<jdike>LCA in Sydney
14:51<kokoko1>ah right , that's why i didn't see you in the # ;)
14:51<kokoko1>LCA = ?
14:52<jdike>yup
14:53<jdike>linux.conf.au
14:53<kokoko1>Oh right :)
14:56|-|Coder7 [~bhook@164.113.205.197] has joined #uml
15:01<kokoko1>so it was fun there?
15:01<jdike>yup
15:01<kokoko1>nice
15:04<Coder7>any clue why I'm getting this error when I use tunctl? TUNSETIFF: Operation not permitted
15:06<jdike>uml_net not suid root?
15:06<Coder7>hrm, let me check
15:07<Coder7>it is
15:07<Coder7>I can't figure it out, it works on one machine, and not another
15:08<jdike>anything in the host's dmesg?
15:08<Coder7>the only major difference is that one machine is running slackware 10.2, the other 11
15:08<jdike>whoops, uml_net doesn't matter
15:08<jdike>I was thinking you were seeing that in UML, missed the tunctl bit
15:09<jdike>are you running tunctl as root?
15:09<Coder7>no, not as root
15:09<Coder7>I did just find an error is syslog
15:11<Coder7>eh, but that error is not being generated by the command
15:11<jdike>You have to be privileged in order to change network interfaces
15:11<Coder7>I'm running it as a member of the uml group, and I have /dev/net/tun set to root:uml 660
15:12<Coder7>it's working on one machine, and I don't recall doing anything other than changing the permissions
15:12<jdike>OK, I guess works
15:12<jdike>+that
15:12<Coder7>it does allow me to delete tap devices as a regular user, but not add them
15:12<jdike>can you just try as root to see what that does
15:12<jdike>what's the command line?
15:13<Coder7>it does work as root
15:13<Coder7>tunctl -b -u $USER
15:13<Coder7>and tunctl -d tap0 works as a regular user
15:14<jdike>that suggests a permission problem then
15:14<Coder7>right, but I've checked and double checked them
15:14<jdike>maybe the TUN/TAP driver changed how it deals with privileges
15:14<Coder7>eh, I am using different kernel versions
15:15<jdike>not just the permissions on the file, but how they are handled in the driver
15:15<jdike>is the new one or the old one giving problems?
15:15<Coder7>new one is giving problems
15:16<Coder7>2.6.19
15:17|-|tyler [~tyler@89.98.144.15] has quit [Ping timeout: 480 seconds]
15:17<Coder7>reading docs now
15:17<jdike>The permission check is this
15:17<jdike> if (tun->owner != -1 &&
15:17<jdike> current->euid != tun->owner && !capable(CAP_NET_ADMIN))
15:17<jdike> return -EPERM;
15:18<Coder7>has CAP_NET_ADMIN been there all along, or is that new?
15:18<jdike>So you have to be whoever owns the device, or root
15:18<jdike>CAP_NET_ADMIN basically means root on normal systems
15:18|-|tommie [~tommie@62.235.155.142] has joined #uml
15:19<Coder7>but I can run tunctl to add taps on the other server, as a normal user
15:21<Coder7>yup, they changed the tun module
15:21<Coder7>I pulled up the docs on the old server, running 2.6.17
15:22<Coder7>I'll just have to change things to use sudo
15:22<jdike>I guess they tightened up the permission checking
15:25<Coder7>yeah, the old docs said to only let root add devices
15:25<Coder7>but it allowed non-root to do it
15:25<Coder7>they changed it so you have to be root
15:25<jdike>it looks like once you assign a device to a user, that user can fiddle it
15:27<Coder7>correct
15:27<Coder7>it was stumping me though... couldn't figure out why I could delete but not add
15:27<Coder7>generally you get all or nothing
15:28<Coder7>thanks for helping
15:28|-|richardw_ [~richardw@M214P018.adsl.highway.telekom.at] has joined #uml
15:30|-|richardw [~richardw@M260P009.adsl.highway.telekom.at] has quit [Read error: Connection reset by peer]
17:24|-|kos_tom [~thomas@humanoidz.org] has quit [Quit: I like core dumps]
17:58|-|richardw_ [~richardw@M214P018.adsl.highway.telekom.at] has quit [Quit: Leaving]
18:21|-|Electric1lf [~dbharris@bas14-toronto12-1167996467.dsl.bell.ca] has quit [Ping timeout: 480 seconds]
18:21|-|ElectricElf [~dbharris@electricelf.netrep.oftc.net] has joined #uml
21:07|-|jdike [~jdike@pool-71-174-247-179.bstnma.fios.verizon.net] has quit [Quit: Leaving]
22:35|-|Nem^1 [~Nem@dslb-084-056-249-057.pools.arcor-ip.net] has joined #uml
22:43|-|Nem^ [~Nem@dslb-084-056-224-204.pools.arcor-ip.net] has quit [Ping timeout: 480 seconds]
22:43|-|Nem^1 changed nick to Nem^
22:58|-|VS_ChanLog [~stats@ns.theshore.net] has left #uml [Rotating Logs]
22:58|-|VS_ChanLog [~stats@ns.theshore.net] has joined #uml
---Logclosed Tue Jan 23 00:00:29 2007