Virtual Hardware 7 Bug, Woeful VMware Response

UPDATE This post is actually woefully inaccurate – go read this one instead!

I don’t like making posts like this, really i don’t. As much as I’ll sing the praises of VMware ’til I can’t sing no more, I’ll also point out the shortcomings – I don’t turn a blind eye just because I like the company.

Recently in the course of doing some vSphere testing, we came across some rather strange behaviour. When creating a new Windows Server 2008 VM with Virtual Hardware 7, the LSI SAS Adapter, and 2 virtual disks, we found that the second disk was being offlined during Windows installation. Even more strangely, this only seemed to effect Server 2008 Enterprise, both 32 and 64bit. Standard edition worked fine.

That being the case, and knowing that there was no differentiator between Standard and Enterprise edition from the VMware side (unlike with 2003, which has separate options in the Guest OS dropdown), I presumed this must be some wierd Microsoft problem. So we logged a call with Microsoft, only to have them come back and say the code base is the same so they didn’t think it was them. Microsoft asked if we could log a call with VMware about it, which we duly did.

And got pointed to this VMware KB article.

Now, aside from the fact that this KB article is misleading (VMware support confirmed that they know this happens on brand new VM’s – not just upgrades from Virtual Hardware 4 to Virtual Hardware 7), and that VMware seem to have seen it on all versions of 2008 unlike me, the real shock came when I asked when I could expect the bug to be fixed. The answer? There are no plans to fix it, as far as VMware engineering were concerned there is a workaround for the problem as per the KB article, and that was that. Too bad if you have automated build systems that rely on the presence of a second disk.

I’m sure I don’t need to explain to anyone reading this how completely unacfuckingceptable this stance is. This is clearly a bug with Virtual Hardware 7, but there are no plans to fix it. Can you imagine if this happened on physical hardware? If HP shipped some dodgy SmartArray firmware that caused the same problem, you can bet it would be fixed as fast as they could after the first time they heard about it and replicated it. They wouldn’t issue a crappy workaround and then say their job is done.

One wonders how Windows Server 2008 got certified on vSphere. One wonders when Microsoft finds out about this if they will retract support for Server 2008 on vSphere. Or at the very least, be forced to add another level to the SVVP which specifies what version of Virtual Hardware the operating systems are certified on rather than just the ESX version.

So, if you too think this situation is ridiculous, I encourage you to replicate the problem for yourselves and then contact your VMware account team (NOT the support guys!) to ask when this will be fixed.

Please don’t rain hell on the support guys, they are just the messengers, and they have enough on their plates already without getting more calls about a known issue. I don’t want to cause more work for them like I have once before.

But unfortunately it seems the only way to get the message to VMware about this is by having enough people yell about it – I obviously wasn’t the first person to have run into this, and nothing has been done about it so far so maybe they’ll listen to a few thousand of us.

UPDATE I have been informed by my friendly account team that I was apparently misinformed by support, and that the case will be re-opened for engineering to look at. Apparently that KB article does only pertain to virtual hardware upgrades, and engineering have never heard of this happening with new VM’s. I’ll update this post again when I hear more!

Advertisements

14 Responses to “Virtual Hardware 7 Bug, Woeful VMware Response”

  1. Gabrie van Zanten Says:

    Hope more people will ask their account managers. You truly have a point there.

  2. Justin Says:

    We saw similar issues when we upgraded our Windows 2008 Exchange servers from hardware version 4 to 7. All second and third disks were offline when the VM was restarted. Luckily, it was just a simple solution to fix, and we have no data loss issues. BUT, this should not have occurred in the first place.

  3. Virtual Hardware 7 Bug, Woeful VMware Response « new hardware Says:

    […] the rest here: Virtual Hardware 7 Bug, Woeful VMware Response 29 Oct 09 | […]

  4. Leif Says:

    You mentioned this afflicting your LSI SAS adapter setup. Did you ever test with the PVSCSI adapter on the 2nd drive?

  5. Eja Says:

    Well I can imagine this happening on HP machines, it did(do?) happen. For several months now HP have released one iLO management driver after the other trying to solve the faulty errors that is generated and that results in a ASR, automatic server recovery, a reboot. Hopefully, finally resolved in Supportpack 8.3.
    http://forums13.itrc.hp.com/service/forums/questionanswer.do?admit=109447627+1256887284875+28353475&threadId=1332008

  6. Wes Says:

    Now if the HW vendor was IBM and not HP… btw congrats on the VCP4 cert

  7. Thomas Vuylsteke Says:

    Euhm, isn’t this “by design”? If you take a physical windows 2008 enterprise edition server (think a member of a failover cluster), and you present a new disk to it… well its offline! Which makes sense as you do not want to have your disks online on multiple nodes at the same time before they are part of the cluster.

    Now the new “lsi sas” logic controller allows you to create a cluster in a box on vpshere (for windows 2008)! What you are saying here is no bug, it’s just how the enterprise OS was created to behave.

    • stu Says:

      Perhaps you should read posts a bit more carefully before commenting… I’m not talking about clustering or adding a new disk to an existing VM, I’m talking about creating a new VM from scratch with 2 virtual disks. It has nothing to do with clustering, and everything to do with the good design principle of separating the OS volume from application / data volumes. By using 2 .vmdk’s rather than partitioning a single .vmdk, you can make the data volume an independent disk and thus use snapshots against your OS volume only (if you choose to), and also if you run out of space on the OS volume you can simply grow the virtual disk and then extend the system partition (which you can do online in 2008, using native Windows tools).

  8. Thomas Vuylsteke Says:

    Sorry if I sounded harsh.

    I do agree on the fact that using multiple vmdk’s gives you more flexibliity and whatsoever, heck it’s something I tell my customers as well.

    If I read your post correct, you stated the problem only occurs on “Windows Enterprise editions (2008)”. Microsoft told you the install base is the same. Well it’s not.

    I encountered this behaviour with “unattended” setups aswell. Even before there was anything like “vSphere hardware” out there. When you installed a Windows 2008 enterprise edition using the “lsilogic”, the second and other disks started offline aswell. You could try asking it at “http://blogs.technet.com/filecab/” perhaps they have an answer.

    All I can say is: if you take a windows 2008 enterprise edition and add a second disk to it, it will be offline. It is not a VMware bug and it’s not a microsoft bug. Microsoft intented it to be this way. As an entprise OS CAN be (which doesn’t means it will be) part of a failover cluster, you do not want disks to be online immediately. Hence they probably made the decission for the Enterprise flavour to have additional disks offline.

    If you perform a new install of a VM + OS, I can’t see how this is a problem. Or it’s a manual action to put the disks online, or just let it be part of you scripted rollout. Diskpart can take care of this just fine.

    However If you do an “virtual hardware” upgrade, then I guess this can be pretty annoying. What happens if you upgrade from hardware version to version 7? Does it actually use the lsi sas controller then? If that’s the case, i’m not surprised this happens.

    • stu Says:

      Yes but if that was the case, then I should be able to replicate it on a physical server right? I can’t. We’ve tried several times to be sure, and we cannot replicate this on physical hardware.

      You are right regarding the triviality of a workaround (which we have indeed implemented), care to share some more information how you know the codebase is not the same? Remember Microsoft was the first company I contacted about this on the same assumption, they don’t treat calls from us lightly (the same as VMware don’t) and they came back to us saying the codebase is the same. Depending on what VMware come back with, we may need to have a chat with Microsoft again.

      I could see how this might be the default behaviour for adding disk to a running machine, but not for a clean install… as I said, we cannot reproduce this on physical, have you been able to?

      And thx for the comments 🙂

  9. VM Hardware 7 and Additional Disks in Windows Server 2008 - blog.scottlowe.org - The weblog of an IT pro specializing in virtualization, storage, and servers Says:

    […] over at vInternals posted an article a couple of days ago about a problem he encountered with VMware vSphere and Windows Server 2008. […]

  10. Thomas Vuylsteke Says:

    Our “typical” physical installation are on HP blade servers with a smart array controller. With 2 or 3 partitons on the “one and only mirror” volume we haven’t seen this aswell. But from the second you present a disk from an EVA array over a FC network the disk is offline.

    I guess it has something to do with how the disks are enumerated or presented to the OS. So this behaviour depends on the (type of) array controller.

    • stu Says:

      Yeh we’re the same, even when you don’t RAID the disks at all so it looks like 2 disks, you don’t get the problem.

      So I suspect you are indeed right in supposing it has something to do with the way the disks are presented to the guest, which will make this into an “undocumented feature” of Virtual Hardware 7 rather than a bug 🙂

  11. Rob Upham Says:

    @Thomas – you’re correct. VH7 presents disk devices as SAN devices, in order to be fully compliant with MSCS clusters, etc.
    Windows 2008 Enterprise and Datacenter default to leaving SAN disks offline, whereas all other editions online them.

    See: http://msdn.microsoft.com/en-us/library/bb525577%28VS.85%29.aspx

    It’s worth noting that VMware/ESX presents itself identically to all editions of W2008 and is not aware of which version of the OS is running on it. As such, any issues such as this where different behaviour is experienced on different editions is always going to be due to the differences in the OS. As Stu mentioned, the code *base* is the same between editions, but the actual running code / settings (obviously) isn’t.

    VMware will be improving the documentation around this issue.

    Disclaimer: I work for VMware, know Stu, and have worked on this case. 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: