Challenge Convention

This post is part of a series based on a presentation I did to the London VMware User Group on February 25th, 2010 about the reality of Enterprise scale internal cloud platforms. To find other posts in the series, just look for the tag “Bringing the Cloud Down to Earth”.

In the previous post, I talked about some of the things that you should think about before going ahead with an internal Cloud project. And I ended saying you should challenge the way things are done currently if they are impossible or very difficult to automate. But there are other things you should also challenge that aren’t directly related to automation, and I’ll cover some of those now.

Before I do though, I’m going to state the obvious – challenging current conventions is not about pointing fingers and saying things are crap / wrong / <insert expletive here>. It’s just that the current conventions may not be as optimal as they once probably were. But don’t throw the baby out with the bathwater – there may well be good reasons for keeping some things the same, and others not so much. Don’t just blindly challenge everything for the sake of it, pick your fights carefully. And by that I don’t mean only pick the fights you think you can win – I mean pick the fights that will have the biggest impact on your end deliverable. Things like:

Self Service
The idea of getting servers via self service in the Enterprise has been around nearly as long as x86 virtualisation itself – it’s certainly been around a lot longer than that in the hosting space of course. And usually it’s met with opposition from many different groups, and questions like “how will we manage capacity?”, “how will we control growth?” etc.

Such gut reactions are nothing more than a misinterpretation of what self service actually means. Put simply, self service does not equal no governance. I’ll say it again, self service != no governance. Just because someone has the ability to _request_ something on their own, doesn’t mean they will automatically and instantly get it! The governance around new requests can be as lean or as obese as you like – it’s completely up to you. And it may change depending on what the state of your business is – a particular business unit might be experiencing rapid growth, so you may decide that requests from that area will go straight through with nothing more than financial approval while requests from other groups may have to go through multistage financial approval and some kind of technical approval.

The important thing is obviously to make sure whatever you implement to handle governance is flexible. You probably have very similar processes and tools in place today to handle change requests – maybe you can leverage the platform that handles those or use the people that were involved in the development and process engineering around that in your Cloud project.

No Support
One of the reasons external clouds can appear very cheap is because they don’t offer any support within the guest OS. There is obviously demand for such a service tier, otherwise the likes of Amazon wouldn’t be doing any business! But do you offer a “no support” option internally today? I mean a real service tier that includes differentiated pricing to account for the fact that no support resources will be available? Until recently, I had never seen such an offering available in any place I have worked. As anyone knows, the ongoing opex costs associated with a server make the acquisition and depreciation costs pale in comparison. And the majority of those opex costs are people related.

The great thing about a “no support” option is that regardless of whether people choose the option or not, you win! If there is uptake, then you have offered something to your business that they wanted and wasn’t available previously. If there is no uptake, then you have just proven that the people who were shouting for Amazon really hadn’t thought things through properly, and it actually isn’t anywhere near as viable an option as what they made out.

Having such an offering within an Enterprise can make people nervous, especially support staff – how will they know whether a support request is actually valid or not? This is where you need to step back a little and have a look at the support processes within your organisation (hopefully you have them defined – otherwise here’s another thing for the garbage in / garbage out list). You may need to make an amendment to your CMDB data structure to include a flag to indicate support status, or you may simply have to add a new option to an existing field to achieve the same result.

In every place I have worked in the last 10 years without fail (all companies with 10K or more employees), there has been a carefully constructed host naming convention. And many times, it has been the subject of intense debate between the architecture team who usually propose it and the operational teams who it actually impacts. The most common ones I have seen have been something like:

<geographical identifier><functional identifier><environment identifier><a number>

So for example, a development web server in London might be called lonwebd0001. Now if you didn’t have a CMDB and support tools / processes that enforced the mandatory linking of configuration items with incident/problem/change requests then I understand why such a naming scheme might make sense. But if you are living in a world without this, then you have _way_ bigger things than Cloud to worry about my friend!

But assuming you do have the relevant processes in place, is a hostname really the right place for information like geographical location and server function? What about mobility? How many times have you gone through a DC migration and been left with a whole bunch of servers with hostnames that are no longer “compliant” and even worse, downright confusing to new comers or outsiders? And you couldn’t rename them because it would mean a whole bunch of risk and testing for the application owner? You all know what I am talking about, I am sure!

So in this world of Cloud, where something may be running in one location one week and another the next, why the hell would you use such a naming convention! Moving to something like an 8 digit hexadecimal scheme will eliminate any problems like this, and will also make it easier to programmatically determine what name a new VM request will get. Remember like many things, the standards you are responsible for setting only need to be good for one day longer than you are alive. A little over 4 billion unique hostnames should see to that ;).

So when thinking about how the Cloud will change the way things are at your organisation, think about the most valuable things to challenge. It may be a little, it may be a lot – the above points are just some food for thought.



One Response to “Challenge Convention”

  1. Top 5 Planet V12n blog posts week 09 | VMvisor Says:

    […] Radnidge – Challenge Convention & Garbage In / Garbage Out & Engage Support Early & Engage Support EarlyThe notion […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: