When a seemingly simple Solaris install spirals hopelessly out of control, you just have to keep digging for answers As far as IT work goes, this was one of the simplest tasks you can face: install an OS. Yet it took nearly all day due to a series of unfortunate turns. It was the kind of day that makes you question your sanity.The task was extremely low profile. In order to support legacy SPARC-based code, an older Sun T2000 needed a fresh install of Solaris 10. This server would likely do next to nothing. It might service a log-in or two each month, as developers may want to check older code or test this and that. Otherwise, it’s nowhere near a critical system. Still, it would serve a purpose, and despite its age, it would easily handle the minor load. I figured it might take 15 minutes of actual effort to kick off a clean install from DVD and voilà — I could move on to more important duties. I mean, I’ve installed Solaris a million times. How long could it take? Little did I know.For those not familiar with this vintage of SPARC systems, the T2000 does have an ALOM (Advanced Lights Out Manager). This ALOM may have been advanced several years ago, but it’s quite rudimentary compared to modern units. Essentially it allows for server power control, device component and sensor readings, and a text console. There are no facilities for virtual media or anything fancy. Also, there’s no frame buffer on these systems. Everything is done via serial console or a telnet log-in to the network management processor. That’s not a problem because there’s a DVD-ROM drive, so off we go. I kill the default boot sequence and boot from the DVD. Eventually the Solaris 10 installer pops up, and I enter the requisite configuration information, ho hum. The installer heads off to start the actual package install — then promptly kicks out to an emergency shell, displaying a variety of errors. This is followed by 30 minutes of digging around, looking for why the package install failed to start. Next up comes another attempt, but this time paying slightly more attention. Nope, it bails out again.After much more digging through various document searches and the system itself, I determine that due to a bug in the kernel used by the installer (and possibly a problem with the firmware on this particular DVD-ROM model), the installer cannot mount the DVD’s file system to access the packages, though it is able to boot from the DVD. It’s not exactly a common problem, but there we are. Time to punt on the DVD install — I’ll just JumpStart it.There are no other Solaris systems handy, so I’d have to do this with a Linux server. I didn’t really need a full JumpStart configuration server; I just needed the system to be able to boot the installer from the network and access the packages via NFS — easy peasy. I install rarpd and bootparamd on an available Linux server, toss the MAC address into /etc/ethers, add a hostname and IP to /etc/hosts, toss the appropriate inetboot file into /tftpboot, and configure a bare-bones bootparam entry for the box. After that, I mount the DVD ISO across a loop and export that via NFS. Aside from the ISO export, this is how most Solaris JumpStart scenarios work. If I had a Solaris box to work with, most of the process would be more or less automated. But hey, this was quick enough even when done manually.Rebooting I go, and after the interminably long POST procedure, the box picks up an IP and starts to TFTP the inetboot file — then promptly times out trying to get the file. After checking just about everything possible with the configuration and burning more time, I bust out the packet sniffer. Even though I saw a proper TFTP file request from the server, the file was never delivered. The reason: another bug, this time in the version of OpenBoot on this server that essentially abandons the TFTP client listener in RARP/bootparam installs. Naturally, that version of OpenBoot could not be upgraded without a functional Solaris 10 installation on the box. Catch, meet 22.After looking into a few other options that weren’t viable, such as booting from a USB stick, I figured I’d boot from the DVD, but pull the packages from NFS. This is quite uncommon with Solaris, no matter how normal it may be with nearly every other Unix-like operating system, but it can be done. Of course, it requires unbundling the Solaris 10 install DVD, modifying files to point to alternate install locations, and rebuilding the DVD — on another Solaris box. Scratch that idea. I’m not interested in building a Solaris x86 box for no other reason than to install an OS on this Solaris SPARC box. So it’s off to using JumpStart with DHCP and PXE. After adding dozens of configuration parameters to the DHCP server, I fire up the box once more, wait impatiently yet again for the boot process, and eventually get to the point where the box successfully receives the inetboot file, but bails because it “can’t find” the miniroot.I recheck everything, making minor changes that don’t seem to be necessary at all, but it’s still not working. I again fire up the packet sniffer and see that NFS requests are hitting the server, but the server still won’t boot to the installer. At this point, this quick and easy installation has taken hours and hours, most of the time spent waiting to see if the change I made to one small part of the whole scenario would fix the problem. Invariably, it did not.I looked a bit closer at the packet traces. I noticed that the box was trying to use NFSv4 exclusively and was having problems with the Linux NFS server. Not exactly obvious, but there it was. I dug around again, trying to see if I could pass a variable to instruct the process to use NFSv3 instead, and came across a way to do that — with bootparamd, which I couldn’t use because the TFTP client was broken in that mechanism. Lacking explicit knowledge of how the NFS client in the miniroot environment functioned, I threw a Hail Mary and instructed the Linux NFS server to only use NFSv3, hoping that if the Solaris client didn’t get anything back when it tried NFSv4, it would fall back to NFSv3.Yet another reboot, and magically, the Solaris installer finally came up. Of course, it was at this point that one of the disks started throwing masses of SCSI errors. I had to start all over again with a new disk, but the path was cleared. This box might actually get an OS after all. The interminably long installation process eventually completed, it rebooted into a pristine Solaris 10 environment, and finally, all was well.I’ve stood up complete virtualization clusters with a dozen hosts, storage, networking, and everything in roughly the same time that it took to install Solaris on this Sun T2000. It’s proof positive that when you fall down the rabbit hole like this, there’s no telling how, or even if, you can get back out. This story, “The OS installation from hell,” was originally published at InfoWorld.com. Read more of Paul Venezia’s The Deep End blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter. Technology Industry