05-24-2002, 04:44 AM

I use OS X 10.0 as a mailserver, and we have uptime problems:

Our server stops responding after a restart. In more detail:

We run CGPro 3.4.7 on Mac OS X 10.0.4 (NOT the OS X Server version) on an Apple iMac 450Mhz G3 with 348MB memory and plenty of free HD space. On this machine we only run the CGPro server and remote monitoring sodtware (timbuktu pro). The Appleshare File Sharing extention is on so we can backup the CGPro Base Directory. The OS X machine is left logged in to a power user with energy saving and screen saving settings appropiate for a server environment. (I also tried logging out, with the login window displayed. This dit not seem to matter for the problems descibed below). no USB or other devices are attached (Keyboard/mouse disconnected)

Every day at 1.00 AM a cron shell script copies the /var/Communigate dir to the /backup/Communigate dir. This dir is shared using Apple File Sharing, and mounted on our backup server, which makes a copy onto a backup tape.

When we set up this server, everything worked fine for months. Then we had to shutdown the server for a minute, and when it came back up, the server stopped responding within 20 hours. We reset the computer, and it hanged again (up to 5 times in a week, at random intervals, during day and night time) After about a week the server ran stable again for months. Last week it had to restart again, and this week the same problems occur: hangs at random intervals.

With a 'hang' I mean: it is possible to ping the server, the screen (in one case) was still on but 'vibrating' as if a second monitor was standing too close. It displayed the OS X login window, and the cursor blinked. Mouse and keyboard did not respond when reconnected.
Remote access, except for Ping, was not possible: SSH, CGpro webinterface, adminwebinterface, Filesharing to backupserver, Timbuktu login, these were all down.

I have no clue what causes this OS X machine to hang. The CGPro log, and the UNIX logs at /var/logs give no clues to what is wrong, no errors. They just stop at a given time, and continue with the boot procedure after reset.

Do you have any suggestions?


05-24-2002, 08:07 AM
My first suggestion would be to upgrade to 10.1.4. Second, I would keep the Console app running. It is possible that something is crashing and printing out an error (kernel panic, for example) that doesn't make it to disk. (Hence you don't see it in the logs after reboot.)

It is also possible that there is a hardware problem. I would try replacing components one at a time or removing them altogether, if they aren't required.
Swapping the disks with a similar Mac would be a great way to test all the hardware at once, but it won't tell you which part is bad, if any.

- Avi