Wednesday, September 30, 2009

Clusterware Uninstall

Sometimes I really miss the old days. Of course, someone my age saying that really doesn't mean much compared to someone that lived through World War II. When I started in IT there was no www, texting, cell phones in your pocket, or USB coffee warmers. My current lament is remembering the Oracle installer when it was all command line, java was another word for coffee, and an uninstall on Unix meant "rm -rf".

Where is this going? As the title of this post suggests, I had to recently uninstall Oracle Clusterware. I am working on a project to bring a new application on board and we are using an active/passive failover model using Oracle Clusterware. This model is something I have worked on several times and did a presentation at Collaborate 08 on the topic. So, I was going along installing away, having installed OCFS2, created voting and OCR disks and then installing Clusterware. The initial install went great until I realized that I had installed into a non-standard location compared to our other clusters. After the initial "Oh Crap" moment, I just figured I would uninstall what I did and install into a new home. This is where the fun began.

Of course, I realized this was not going to be as easy as ./runInstaller and remove the Clusterware install. I did not figure that it would cost me an entire day! After running the installer and removing the Clusterware home, I received an error saying that the directory could not be removed from the other node. No problem, I'll go remove it. So, I thought I was clean. No more Clusterware directory, no more Clusterware home in the inventory because I ran the installer. Boy was I wrong.

To make this long story a little shorter, I'll get to the end of what turned out to be four install/uninstall cycles. And, by the way, looking at Metalink, the only document I could find was removing from Windows. I am on Linux and it is not supposed to be this difficult! Okay, enough whining...

1. Make sure the crs and css processes have been stopped. This should have been as easy as going to /etc/init.d and issuing the init.crsd stop and init.cssd stop, but this did not stop all processes. After all the deletions and cleanup, the only way I could really get rid of the processes was to kill them as root.
2. After issuing the stop commands, delete the init.crs, init.crsd, init.cssd, and init.evmd files from the /etc/init.d directory on both nodes.
3. Edit the /etc/inittab file and remove the lines at the bottom that start the init.evmd, init.cssd, and init.crsd
4. Remove the init.crs symbolic link from /etc/rc.d/rc0.d, rc1.d, rc2.d, rc3.d, rc4.d, rc5.d, and rc6.d. This link will start with K## where the ## is a number. Mine looks like K96init.crs and is a symbolic link back to /etc/init.d/init.crs
5. Remove the physical Oracle home directory that Clusterware was installed in.
6. Reboot! I know, it seems very Windowsish, but it was the only way to make sure all the pieces of crs and css were out of memory.

Having done all of that, I was finally able to reinstall into my new Oracle home. So, the moral of my story... make sure you are following your own standards before starting!

See you in 10 days at Oracle OpenWorld. Don't forget about User Group Forum Sunday.

3 comments:

  1. Note that 11gR2 introduces a cluster deinstallation tool to automate all the deinstallation tasks and returns the system to a state where clusterware can be installed again.

    For 11gR1, deinstallation was in the standard documentation at http://is.gd/3R5DM

    ReplyDelete
  2. Okay, so I should have read the documentation before just going ahead with the uninstall and deletes. Things I should have known a week ago!

    ReplyDelete
  3. http://bit.ly/egURL - another one for 10.2

    ReplyDelete