PDA

View Full Version : A 10.3.8 troubleshooting tale...


griffman
02-12-2005, 10:20 AM
A story of woe ... and eventual happiness! Please note there's no "Eureka!" moment in this thread, but some of the troubleshooting techniques I used may help some of you out with issues of your own...

After installing and using the 10.3.8 update on my PowerBook, and using it for a day or so, all seemed good. So I put it on my workhorse, the Dual G5/2.0 which I use for macosxhints every day. This machine is loaded with software that pushes and prods the OS in various ways (MenuMeters, Butler, SnapzPro, GeekTool, etc.). It's also got a plethora of devices attached: D-link Bluetooth adapter, USB keyboard/mouse switchbox, iSight, iPod, Palm Pilot, Wacom tablet, digital camera cable, Epson scanner and Epson printer, and a FireWire hard drive that I use for backups. In short, it's amazing that it's been as stable as it has been for me since I got it.

I've never had to reinstall OS X on it, despite all the abuse I give it (it's also always running PHP/MySQL/Apache, as I write all the hints locally, then upload to the site).

Enter the 10.3.8 update. Generally, I run a clone of my boot drive before I install an upgrade on the G5. But after 15+ successful updates, I got lazy, and didn't do so (you know where this is going, right?). All seemed OK after the install, but I soon noticed two major issues. First, if I connected to the G5 via a remote GUI control app (like Chicken of the VNC, or any other VNC program), the G5 would die. The mouse would vanish, the keyboard would stop working, and the fans would slowly ramp into vacuum cleaner mode (keep pets and small children from the vents in this mode!). Even worse, I'd get the exact same thing when I tried to sleep the machine. Not a good situation at all.

So I tried the usual tricks first, which included:

Repair permissions and disk: No effect.
Login as a new user: No effect.
Install the 100+MB combined updater: No effect.
Remove any third-party kernel extensions. In my case, that was just my Wacom tablet (I use Apple's bundled scanner driver). No effect.

Hmmm. Things were not looking good. Next step, try safe booting. For those who don't know, safe boot mode (http://docs.info.apple.com/article.html?artnum=107392) is a way to disable all but the necessary Apple extensions. Simply restart the computer and hold the shift key down, and you'll see "Safe Boot Mode" during the progress dialog. I also disabled my login items (http://docs.info.apple.com/article.html?artnum=106756), which you can also do with the shift key (at the login screen). When booted in this manner, sleep worked as expected (I couldn't test VNC, though).

Next step: Reboot again, this time allowing my login items to run, but still booting in safe mode. Again, sleep worked fine, so all my quirky third-party apps seemed to be off the hook. This experiment seemed to point to one of the Apple-required extensions as the culprit, so I took a mild side trip. I ran System Profiler, and dumped a report in text mode. Then I rebooted in full mode, and exported another report. Using diff in the Terminal, I compared the extensions bits of the two reports, and made a list of all the differences. I was prepared to start going through the list one by one if I had to, but then thought to try one slightly easier approach.

I stripped the machine bare: removed everything that was plugged into it, other than the keyboard and mouse. Rebooted, and everything worked. Shut down, plugged in the Bluetooth adapter and the FireWire hub, and rebooted. Still, everything worked. Progress! It seems the problem was specific to either my USB hub, or one of the USB devices hanging off of it. So I started plugging them in again, one by one (no reboots, as USB is a "live" technology). I reinstalled the Wacom tablet drivers, and plugged it in. Then tried sleep. It worked. Repeated for the other five things hanging of the hub. They all worked perfectly.

Now I was really stumped: I had a perfectly functioning machine again, and yet I hadn't changed a single thing! I never emptied caches, I never trashed any prefs, and I didn't install new versions of any drivers. However, I hadn't yet shut down and restarted. So I did, with fingers crossed. Miracle of miracles, the machine rebooted and worked perfectly -- sleep and VNC connections no longer crash the box!

I'm still not sure entirely what happened, but it seems that one of my devices wasn't really happy about 10.3.8. By removing them, then cold booting, then adding them back in, whatever the problem was, it got resolved. It kind of feels like my car wouldn't run right, so I took it completely apart, then put it back together with the same parts, and now it runs fine! I have no idea what I did or why it worked, but I'm quite happy it did!

So before you try an OS X reinstall, it might be worth the effort to go through a 'cycling' of your attached peripherals ... who knows, you might get lucky like I did!

-rob.

voldenuit
02-12-2005, 12:59 PM
I have seen devices chickening out of obnoxious behavior too, once you start taking serious steps to track the problem.
Just as you, I am always torn between the warm, fuzzy feeling of having gotten it to work again and a slight frustration that I didn't learn anything in the process.
Hopfully your G5 remains stable.

bedouin
02-12-2005, 01:35 PM
Just noticed a problem today where my displays would not wake from sleep. Not sure if it's related to the 10.3.8 update or not. So far, after putting the displays to sleep a number of times, it has only happened once. Nonetheless, it still bothers m.

Craig R. Arko
02-12-2005, 03:50 PM
I think one of the things that happens when you disconnect (some) devices and reconnect them later is that it forces the caches: "/System/Library/Extensions.kextcache" and "/System/Library/Extensions.mkext" to be rebuilt.

Check the timestamps on these and see if that correlates to the system recovery.

Discrepancies in these caches can cause all sorts of system weirdness, up to and including kernel panics.

griffman
02-12-2005, 11:42 PM
Craig:

I bet that's it -- both files are timestamped about the time I plugged in the last device.

What's odd is that, if you look at the install.log, both files are also rebuilt as part of the 10.3.8 update process. I guess something went wrong during that initial rebuild, perhaps.

In any event, it's been 24 hours now, and the G5 is definitely back to its normal self!

-rob.

mgh02114
02-13-2005, 12:59 AM
I generally follow the following upgrade procedure:

1) Clone the drive to a backup firewire drive and disconnect it (you already knew you should do this)

2) Boot off another drive, and run "Disk Utility -> Repair Disk" on the startup voume. If any errors arise, run Diskwarrior or Techtool or something like that to fix the problem before you upgrade.

3) Boot into the startup volume, and run Cocktail's "Pilot" (or some other utility that will delete the caches and repair permissions)

4) Update using a downloaded "combo" updater

5) Re-run Cocktail (or again some other utility that will delete the caches and repair permissions)

6) Immediately test mission critical apps to make sure that no huge problems have arisen before I add new data to the system.

This possibly would have averted your problem, by deleting the caches immediately after the install.

Your troubleshooting procedures pretty much match my own, except that I try disconnecting all peripherals BEFORE I start analyzing log files. Actually, I have never needed to examine log files. If a serious problem arises, I just revert to the previous clone and wait a few days. The problem, and the fix, usually is announced on sites such as yours, and I wait for that before I try the upgrade again.

Thanks for posting your experience. You have provided an explanation for why "disconnect all peripherals and reboot" seems to fix some problems.