After unpacking, racking, and mounting the JBOD, I waited until the weekend had started before powering down the server and installing the RAID card. Connected it all up, rebooted into the Adaptec BIOS, and configured the 6x 1TB drives into a RAID6 array. After that, I installed the RAID StorageManager off of Sun’s website, and then the “Common Array Manager” software. CAM is supposed to provide a web GUI to an organization’s worth of Sun JBODs, so you can update JBOD firmware and query status and whatnot from a single interface. There’s client and server bits written in Java that run on the various boxes, so the data path was going to look like this:
JBOD -> XEN dom0 running remote proxy tool -> XEN domU running web GUI
I say “was going” and “supposed to” because all the remote proxy tool in CAM ended up doing was consistently triggering a kernel panic in the aacraid driver whenever it’s detection code fired up.
Take a long drag off the irony of driver and firmware issues, and download the latest-n-greatest aacraid driver and firmware from Intel via Sun, and update. Same results. Repeat in various configurations, and before throwing in the towel, get a basic dump and file a bug. I didn’t put any more serious thought into debugging it simply because this whole thing has to be up and running yesterday, and the last time I asked for documentation on the topic, I was rebuffed with a variant of this classic: “If you were smart enough to debug the kernel, you wouldn’t need documentation on how to debug the kernel.”
Take a moment to stand in awe of the massive poisonous cobaggery involved in that statement being offered to someone who wants to help fix a crasher. I’ll wait.
That kind of shit would never fly in any GNOME venue, which is why GNOME kicks so much ass.
Update: The cobaggery about kernel development did not come from Sun or any representative of any company involved in open-source, and was unrelated to this situation at all. I relate it simply as it pertains to debugging kernel issues, and why I don’t do it.
Without wishing to defend the response from Sun, it’s not the first time I’ve been on the receiving end of a Bad Case of Attitude from members the GNOME community as well. There are good people and bad people in every community; don’t kid yourself that just because GNOME isn’t a business, that there aren’t assholes in our midst.
James, you describe the CAM proxy running in dom0 and the CAM BUI running in domU. Is the panic occurring in the dom0 or domU RHE instance? If the latter, the BUI needs to be told where the proxy is. The registration wizard will search for you, but I wonder if when the search is being done in the domU (which will be fruitless since it is a virtual machine with virtual drivers), the aacraid driver is choking on the virtual drivers. A possible work-around is to specify the ip address of the dom0 host in the registration wizard. The BUI and the proxy communicate via TCP/IP. As long as there is network connectivity between the domU and dom0 (which is required for CAM to work in your setup), specifying the ip address of the dom0 host in the registration wizard will prevent the discovery from happening in the domU and only occur in the dom0 (which is what you want).
numpty: As noted in the update, the cobag in question isn’t a known employee of any company.
Paul: The actual array is attached to dom0, and it’s dom0 that’s crashing when the register process on dom0 starts talking to the raid controller. Inside the domUs, they only get the generic xen device (disk image on a logical vol, on a different array/controller) that’s presented to them.
Irony is having all this beautiful Sun gear and struggling with drivers and “array management” software on Linux when you could be done with a simple “zpool create tank raidz ” on OpenSolaris. Xen is just as easy in OpenSolaris http://opensolaris.org/os/community/xen/docs/2008_11_dom0/ :3