This is a post I’ve been meaning to write for a while now, but for one reason or another I kept putting it off. So when Mike Laverick asked me about what subjects I wanted to cover in my Chinwag, I finally got around to talking about it in public. I say in public, because this is something I and many others been saying to VMware behind closed doors for a bloody long time now. And as Mike pointed out during the Chinwag, there’s certain sense of irony in applying this phrase to VMware, given that Paul Maritz is credited with inventing it (or at the very least popularising the saying in the IT world) over 20 years ago.
I’m not writing this post to be inflammatory. Nor do I feel the need to justify my comments any more than what was said in the Chinwag. I just feel that a little more clarity and elaboration is needed – when we started on the topic the conversation was skirting around several things at once, all of which were pretty negative. And although I wouldn’t go as far to label those first 20 or so minutes as “VMware bashing”, I can understand how it might be seen that way. So let’s get that straight – when I say VMware should eat their own dogfood, I mean it constructively. I’m not talking about stuff like vCloud Director 1.0 requiring a database from the least-VMware-friendly company on the planet, or that the vSphere Client isn’t supported as a ThinApp package. I’m talking about something much more fundamental than that. I’m talking about glass houses and throwing stones.
The focal point of my thoughts on this is the VMware management layer, specifically vCenter. A product that saw it’s first release way back in 2003. And from a 40,000 ft. view has changed very little since. Obviously vCenter 4.1 has many more features and is much more vertically scalable, but it’s essentially the same monolithic application that it was back in 2003. In some ways you could say it regressed with vSphere 4.0, because the previous largely stateless application server acquired a local LDAP service (i still don’t understand why that can’t be remote, but I digress…), invalidating the simplicity of the “warm standby” availability model you might have employed with VI3.
Over the past few years, the noise most people have been making around vCenter falls into 2 broad categories: scalability and availability. Early in life, VMware could afford to neglect vCenter availability and so could we – virtual infrastructures weren’t that large, and vCenter was nowhere near a critical component. If it went down, nobody really cared – the VM’s kept running. But fast forward a few years and the picture was rapidly changing – virtualisation was taking off, and expanding into new areas like VDI. And when VMware started going down the acquisition path in 2007 with the likes of Dunes and Propero, as well as developing their own products like SRM (which was in development for years before being released), it was obvious that vCenter was moving squarely into the critical path rather than lurking in the periphery of it. That was 3 years ago.
Now ordinarily, this course of events wouldn’t be enough to draw the ‘dogfood’ fire from me, but as we all know VMware has for a very long time been waging a war with developers and software companies. How many times have they urged us to have the fight with an application owner who said they couldn’t virtualise for some reason that boiled down to shitty application design / coding, or imaginary resource requirements? I can tell you in the 5 or so years I’ve been in the virtualisation game, I’ve had plenty of those fights. And at the end of the day, the argument invariably comes down to the path of least resistance – either the app needs to be upgraded / thrown out / re-written, or it stays the same and goes onto physical hardware. When you work for a company who’s primary source of revenue is not technology related, guess which one of those is the cheaper short-term option and thus wins most of the time?
VMware of course knows this very well, and to a certain degree the existence of these shitty apps makes VMware’s products all the more attractive. Features like vMotion, HA and Fault Tolerance are designed to provide a level of availability on the infrastructure layer that is unachievable in the application layer with most of the software found in the enterprise today. If all the applications we use had resiliency built into them, there would simply be no need for any of those infrastructure layer features, and a big part of the competitive edge VMware had for a long time might never have come into existence.
But VMware isn’t just any old business. It’s a software company. If anyone had the ability to implement architectural changes to an application, surely it would be someone young (in the software world) and bleeding edge like VMware? Apparently not. When we customers started howling about the lack of resiliency of vCenter, the solution we got was vCenter Heartbeat – a Band-Aid® to address what is, IMHO, a fundamental application design issue.
The fact that VMware chose to focus on the scalability aspects of vCenter first isn’t really the issue, it’s that they appear to have done so in isolation from the availability constraints. What’s the point of being able to manage hundreds of hosts and thousands of VM’s from a single application instance, if that instance is a single point of failure? Because of this lack of focus on availability, when vCenter moved into the critical path of our infrastructure, it fell into that poorly designed app bucket. Sure, it wasn’t a requirement when it was first developed, but it’s certainly been a requirement long enough for the problem to have been addressed at it’s root. And I don’t accept that VMware was so deaf or shortsighted that the availability requirement caught them completely off-guard.
I don’t know about your environments, but I’ll tell you how many other critical infrastructure applications (DNS, directory services, monitoring, backup, AV, etc) in my environment need to be protected with a technology like vCenter Heartbeat: 0. None. Nada. Zip. Zilch. Every single one of them can either run on a 3rd party clustering solution, or achieve high availability by way of a distributed architecture. vCenter on the other hand is monolithic and has not been certified to run on top of a 3rd Party clustering solution since at least version 2.5. A cynic might say that’s because they want to extract more money from customers by making them buy Heartbeat. Personally, I think it’s likely down to a lack of internal resources to QA such a configuration. Which is a pretty piss poor excuse – VMware isn’t any old software company, it’s a company that provides mission critical software. They should have QA resources coming out their ears.
But happily, there are changes afoot. Just look at vCloud Director – that’s how vCenter should have been a long time ago. At least nowadays my feedback to VMware is whole lot easier – I can just say “make vCenter like vCD”. We’ve been saying things to that effect to VMware for years now, and vCD gives every indication that they have been listening. (Aside: The reason I don’t mention that in the Chinwag is because vCD was released after the Chinwag was recorded and I wasn’t on the vCD beta).
There’s a saying that goes something like “God could build the Earth in 6 days because he started with a clean slate”. Whether you believe there is a God or not, there’s a lot of truth in that. And I certainly accept that changing the fundamental architecture of vCenter is not something that could happen overnight. But if VMware wants to go on the offensive against the current state of Enterprise applications it had damn well better make sure that it’s own backyard is in order first. I look forward to the day when all VMware management apps can be deployed to a PaaS and backed with a data fabric. A tcServer and GemFire based one of course ;).