|
|
#1 |
|
Prospect
Join Date: May 2007
Posts: 4
|
Netinfo related hang
Our 10.4.10 server hangs ~1-3AM and ~3-6PM due to a lookupd crash.
Restarting lookupd with unlockupd doesn't restore system functionality however. Retrospect runs at 1:00 AM, along with a psycx backup script, but this doesn't explain the afternoon hang. Any ideas? Here are the logs from last night, in this case, lookupd didn't restart after it was killed. Jun 28 00:58:16 server root: MountBackup.sh- Mounting backup drives Jun 28 01:10:04 server DirectoryService[57]: NetInfo connection failed fo r server 127.0.0.1/local Jun 28 01:10:07 server DirectoryService[57]: NetInfo connection failed fo r server 127.0.0.1/local Jun 28 01:51:59 server DirectoryService[57]: NetInfo connection failed fo r server 127.0.0.1/local Jun 28 01:52:05 server memberd[53]: GetGroups couldn't find uid 0 Jun 28 01:56:16 server DirectoryService[57]: NetInfo connection failed fo r server 127.0.0.1/local Jun 28 01:56:23 server memberd[53]: GetGroups couldn't find uid 0 Jun 28 01:57:55 server unlockupd[372]: killing lookupd 141 Jun 28 01:58:26 server unlockupd[372]: killing lookupd 2652 Jun 28 01:58:56 server unlockupd[372]: killing lookupd 2652 Jun 28 01:59:26 server unlockupd[372]: killing lookupd 2652 Jun 28 01:59:56 server unlockupd[372]: killing lookupd 2652 Jun 28 02:00:03 server launchd: Server 0 in bootstrap 1103 uid 0: "/usr/s bin/lookupd"[2652]: exited abnormally: Killed Jun 28 02:00:37 server unlockupd[372]: killing lookupd 2655 Jun 28 02:01:28 server DirectoryService[57]: NetInfo connection failed fo r server 127.0.0.1/local Jun 28 02:01:34 server memberd[53]: GetGroups couldn't find uid 0 Jun 28 02:06:33 server DirectoryService[57]: NetInfo connection failed fo r server 127.0.0.1/local ****************************** ****************************** The night before last, lookupd restarted after being killed(several times), but this didn't help. Jun 27 03:23:02 server cp: error processing extended attributes: Operatio n not permitted Jun 27 03:29:50 server unlockupd[372]: killing lookupd 176 Jun 27 03:29:50 server launchd: Server 0 in bootstrap 1103 uid 0: "/usr/s bin/lookupd"[176]: exited abnormally: Killed Jun 27 03:30:20 server unlockupd[372]: killing lookupd 176 Jun 27 03:30:40 server lookupd[2743]: lookupd (version 369.5) starting - Wed Jun 27 03:30:40 2007 Jun 27 03:30:51 server lookupd[2743]: NetInfo connection failed for serve r 127.0.0.1/local Jun 27 03:30:51 server lookupd[2743]: NetInfo connection failed for serve r 127.0.0.1/local Jun 27 03:30:54 server lookupd[2743]: NetInfo connection failed for serve r 127.0.0.1/local Jun 27 04:00:54 server root: Unmounting Backup Drives Jun 27 04:06:48 server diskarbitrationd[51]: mds [260]:23607 not respondi ng. Jun 27 04:06:49 server diskarbitrationd[51]: SecurityAgent [170]:24579 no t responding. Jun 27 04:06:49 server diskarbitrationd[51]: loginwindow [160]:22531 not responding. Jun 27 04:06:49 server diskarbitrationd[51]: ATSServer [158]:22019 not re sponding. Jun 27 04:06:49 server diskarbitrationd[51]: coreservicesd [72]:19715 not responding. Jun 27 04:07:11 server kernel[0]: jnl: close: flushing the buffer cache ( start 0x431c00 end 0x434e00) Jun 27 04:41:48 server DirectoryService[56]: NetInfo connection failed fo r server 127.0.0.1/local Jun 27 04:41:51 server DirectoryService[56]: NetInfo connection failed fo r server 127.0.0.1/local Jun 27 05:54:03 server unlockupd[372]: killing lookupd 2743 Jun 27 05:54:35 server unlockupd[372]: killing lookupd 2743 Jun 27 05:54:52 server lookupd[3128]: lookupd (version 369.5) starting - Wed Jun 27 05:54:51 2007 Jun 27 05:55:05 server unlockupd[372]: killing lookupd 3128 Jun 27 05:55:19 server launchd: Server 7a73 in bootstrap 1103 uid 0: "/us r/sbin/lookupd"[3128]: exited abnormally: Killed Jun 27 05:55:35 server unlockupd[372]: killing lookupd 3131 Jun 27 05:55:35 server lookupd[3132]: lookupd (version 369.5) starting - Wed Jun 27 05:55:35 2007 Jun 27 05:56:25 server lookupd[3132]: NetInfo connection failed for serve r 127.0.0.1/local Jun 27 05:56:26 server lookupd[3132]: NetInfo connection failed for serve r 127.0.0.1/local Jun 27 05:57:52 server lookupd[3132]: NetInfo connection failed for serve r 127.0.0.1/local Jun 27 06:22:05 server DirectoryService[56]: NetInfo connection failed fo r server 127.0.0.1/local Jun 27 06:22:08 server memberd[52]: GetGroups couldn't find uid 0 Jun 27 06:26:13 server DirectoryService[56]: NetInfo connection failed fo r server 127.0.0.1/local |
|
|
|
|
|
#2 |
|
Major Leaguer
Join Date: Aug 2004
Location: Pittsburgh
Posts: 349
|
It looks like netinfod is doing something strange, rather than lookupd. Does your NetInfo data look ok? What happens if you restart netinfod? (That could be dangerous, I don't know...)
|
|
|
|
|
|
#3 | |||||||||||||||||||||||
|
Prospect
Join Date: May 2007
Posts: 4
|
My netinfo data looks really really long. Short of outright gibberish I don't have much chance of finding an error in there. Our old admins never cleaned out old users after they left (I know bad idea) so "nidump -r / /" gives 102kb of XML. I skimmed it for gibberish, but that is about all I could do. Is there some utility that could sanity check the database for me? Do you mean to restart netinfod after the server hangs, or before? I will leave an ssh session open to day to see if I can get a response out of it after it hangs. sometimes I can, but my users allways want me to reboot it quickly so they can get work done. |
|||||||||||||||||||||||
|
|
|
|
|
#4 |
|
Major Leaguer
Join Date: Aug 2004
Location: Pittsburgh
Posts: 349
|
I mean restart it when the server hangs. If things come back suddenly, you have a lead.
I don't know of any NetInfo checker utility. |
|
|
|
![]() |
| Thread Tools | |
| Display Modes | Rate This Thread |
|
|