XenServer pool master network issues and the host is still booting error
Since I'm only behind on about a dozen projects, it was nice to come across a refreshingly confusing new problem. It all started when I fired up XenCenter to see if any of the XenServer pools could accommodate a new server and noticed that one of the pools was inaccessible.
This pool consisted of two servers. Was able to ping both and SSH into both. Pool slave was fine. Most xe commands on the pool master reported "The host is still booting". Fired up xsconsole and it reported a network problem. Waited till after business hours and rebooted the master. Problem persisted. It seems this network problem was confined to XenServer -- the host was fine -- ifconfig showed bond0, eth0, eth1 and xapi1. I was able to ping it and SSH into it.
After scouring the web for a while and reading documentation, decided to try "xe pool-emergency-transition-to-master". Interestingly enough, that fixed the "network problem" on the master and was now able to run xe commands without "the host is still booting" error.
Was now also able to connect to the pool in XenCenter. However, the pool in XenCenter was missing the slave. According to "xe pool-list" on the slave, it considered itself the pool master -- probably promoted itself after loosing connectivity to the master. This is the correct behavior according to Citrix XenServer documentation. To reset the master address, ran "xe pool-emergency-reset-master master-address=the-real-master.my.domain" on the slave to tell it who's really in charge. Now the slave showed up in XenCenter as well.
Problem solved, it seems. Now if only I could figure out what caused it in the first place...