Rethinking the Guest

With the arrival of vSphere 4.0, it’s a good time to revisit your guest standards. Not just what virtual hardware and resource allocations you use by default, but also what standard software you deploy into the guest. Virtualisation has been labelled a game changing technology, but when most people talk of this, they refer to it on levels outside of the guest. There is however just as much potential for changing the game within the guest. It’s time to start thinking about how you might externalise some of the stuff you would normally run within the guest, because doing so will allow you to have a much more dynamic infrastructure as well as better prepare for the inevitable day when “your” (and I use that term very loosely) infrastructure is hosted in someone else’s cloud.

In your virtualisation programmes so far, I’m betting most of you (myself included) haven’t given a huge amount of thought to what the implications for your standard guest operating systems are (OK, with the exception of VDI). Well, not so much the operating system itself, but the supporting management software. I am of course referring to ‘agent’ software. Just like our friend Mr Smith from the Matrix, these things seem to have multiplied over the years, at least in the Windows world.

But how many of these agents are still really relevant these days? And I do mean the agents themselves – not the features / functionality that so often requires an agent. I’ll try and restrict this conversation to the lowest common denominator in the agent world (again, forgive my Windows centric approach, but it is the most widely deployed guest OS after all): patching, monitoring, backup and anti-virus.

Agent #1 – Ol’ Patchy (Yarrr me ‘earties!)

Every enterprise I have worked in has required the use of an agent to patch the operating system. I won’t name any of the software, you all know them. And to be honest, the patching agent is usually quite light in comparison to other agents, in that it generally doesn’t need to poll the operating system constantly. If anything, it will poll a remote machine to see if there is any work for it to do. Of course, if you are using the same agent to perform other functions such as hardware inventory then it may be a little more intrusive. But probably not a huge amount.

But it doesn’t have to be this way. There are certainly agentless patching tools out there – in fact, the company that VMware Update Manager uses for patch repository operations has such a product. That company being Shavlik, the prodcut NetChk Protect. If you have less than 500 machines, the free Windows Software Update Service could probably meet your needs. Again, it is agentless. Anyone familiar with the Winternals and Sysinternals suite of products will know the power of a non-persistent agent approach. This is the way that tools such as psexec work – they dynamically inject themselves into their target, do what they need to do, then remove themselves entirely. I’m not sure why more vendors haven’t taken this kind of approach, but I’ll bet it’s nothing to do with being unable to provide the required functionality when doing things this way, and everything to do with having to write higher quality code. Mark Russinovich and Bryce Cogswell are perfectionists, among other things.

But if you have a big environment, and you’re using a single agent for patching, software distribution and hardware inventory, then there’s probably not much you can do for the time being. Except maybe switch off the hardware inventory (you can get all that info from vCenter), and start planting some seeds of thought with your vendor about providing agentless or at least non-persistent agent options.

Agent #2 – I’ll be watching you…

OK, here comes the first candidate for removal. In the *nix world, centralised monitoring via SNMP and syslog is par for the course. But for some reason that wasn’t good enough for the Windows world, and a whole bunch of companies sprung up to fill a void that was essentially created by Microsoft’s sub-optimal event logging architecture.

There are 2 aspects to consider here. The first being why the fuck are you monitoring a virtual machine for hardware failures? If anyone reading thinks this is necessary, stop reading now and don’t ever come back here. You are not welcome. Virtual hardware obviously does not fail. Drivers fail, applications fail, operating systems fail. But not virtual hardware.

The second aspect goes back to the stupid design of Windows eventing pre-Windows Server 2008. But lucky for us that changed with 2008, and in fact the functionality is available for Server 2003 SP1+. And really, you shouldn’t be running any Windows Server 2003 systems on pre SP1. But if you are, that doesn’t mean you can’t start designing a new centralised monitoring infrastructure ready for when your fleet is able to use the functionality. Again, large enterprises may condemn the functionaliy as being mickey mouse. But if syslog is good enough for *nix, Windows Eventing 6.0 is sure as hell good enough for Windows. Check it out, you might be surprised.

Agent #3 – Bacdafucup, just bacdafucup!

vSphere 4.0 brings a whole new level of functionality to the table with regards to backup, with VCB functionality being baked right into the core API for any backup vendor to take full advantage of. Currently VMware Data Recovery is the only tool out there that leverages this functionality in the way I hope that all backup vendors will, but the Data Recovery product is not targetted at the Enterprise.

Until the enterprise backup vendors get their chi together, we may have to live with backup agents a little while longer. But again, you should start thinking about how you might take backups agentless now, and pressure your backup vendors to think the same way.

Agent #4 – Medic!

Obviously something drastic was needed on the hypervisor layer if we were ever going to get rid of this stalwart, but VMware has delivered the goods via the VMsafe API. Now we just have to wait for the industry to catch up, hopefully it won’t take too long. But until then, we’re unfortunately stuck with it on Windows at least. But don’t be fooled into thinking VMsafe is not significant just because the rest of the industry isn’t onboard yet. It is an absolutely necessary piece of functionality to get to where we’re going, and I don’t see any other hypervisor vendor even thinking about offering anything similar. All I can say about that is how short sighted of them.

Like backup, it’s not something we can do anything about now. But we should start asking questions of our vendors and talking with our security people about this Brave New World.

Bringing It All Together

So why this tirade on agent based software? Because it will hinder the move to the cloud. Or at least, be an unnecessary burden in the cloud. Imagine if you could present an application owner with options for patching, monitoring, backup and AV, and be able to switch them on or off without having to deploy an agent into the guest, have it register somewhere, then apply some kind of configuration to it. Not only would that massively simplify your internal provisioning and ongoing maintenance (no more agent upgrades etc etc), but it would also massively simplify anything external, because an external cloud provider could offer the same services but wouldn’t need to use the same software that you use internally and wouldn’t need to intrusively modify your corporate image. No worries about application compatibility with agent software. No worries about agent compatibility with other agents. Potentially no more undue burden on the host and other guests from continual agent initiated CPU scheduling / descheduling.

This world may not be feasible today, but the foundations have been laid with vSphere 4.0 and the latest generation of guest operating systems. Like anything, a critical mass needs to be built before the train of thought spills over into the mainstream. My question to you is do you want to be an influencer, or be influenced. I think I know the answer to that :-).

Tags:

2 Responses to “Rethinking the Guest”

  1. Stuff from stuf : Rethinking the guest? Says:

    […] in my area of focus.  I read an interesting post recently on the vinternals.com site called Rethinking the guest.  I was going to comment on the post there, but thought I would blog my response […]

  2. Stu Fox Says:

    Stu, I’ve responded on my blog.

    http://blogs.technet.com/stufox/archive/2009/05/27/rethinking-the-guest.aspx

    Cheers

    Stu Fox

Leave a reply to Stuff from stuf : Rethinking the guest? Cancel reply