This Cloud needs an enema!

In the words of Jack Nicholson’s Joker (well… almost), it’s time to clean out the crap from the Cloud and everything the IT media has associated with it over the past year or so, and get down to business.

I’m going to make a bold statement here, and don’t mean any disrespect to anyone in doing so, but I think a lot of the focus in the blogosphere currently is not quite on track with regards to Cloud. I’ll condition that statement further by saying if you deal with under 1000 servers and less than 3 datacenters in your environment, you can safely ignore that comment and the rest of this post. And I don’t mean that in a “my dick is bigger than yours” sense – as I’ve said many times before, I’ve only worked in large environments, it’s the only thing I know (and even that is arguable).

So what do I mean by not on track? I look at the cloud related stuff on the Planet V12n aggregator (which I _still_ seem to be dropped from, John :P) and I see stuff about UCS, about API’s from various Cloud providers, and general commentary about the Cloud not being ready for primetime etc etc. While UCS and API’s may be interesting (and important), and the Cloud may not be ready for primetime, it is becoming increasingly apparent to me that there is a lack of thought and dialogue surrounding a critical piece of the Cloud puzzle that you should start putting some thought into now. And you don’t need any new hardware or software to do it. I’m talking about metadata.

I’m not for a second claiming this is anything new, it seems there is no shortage of startups seeking venture capital for their cloud metadata based apps these days. But these startups are not offering guidance as to _what_ metadata should be defined – they’re developing products to store metadata and make decisions based on it. And really, that’s not such a big deal because there’s only so much of this metadata that is broadly applicable to all enterprises (and plus you cant really base a business model on IP associated with the metadata definition of itself), but you have to wonder how much success such software can have in the absence of the metadata itself!

It used to be OK to make soft allocation / provisioning decisions based purely on available capacity, or at best based on geographical proximity to users (more applicable to VDI). But in the Cloud era, this won’t be good enough – you cannot simply look at where you have capacity. You need to first make a decision as to where to look for capacity. And obviously to do that, you need richer data than what 99% of us have today.

On the demand side (ie request), you may need to know information about entitlement (what class or tier of service can a requestor ask for, or at what stage does their request no longer qualify for ‘lean governance’?), RPO/RTO requirements (does the workload need to go into a cluster that has a stretched network and replicated LUNs available?), performance requirements, and data sensitivity (can the data live outside of the country but still internal to the organisation? can it live in an external cloud, private or not?).

On the supply side, we need to define and store / advertise the characteristics of the available infrastructure. This can range anywhere from higher level datacenter characteristics (such as geographical location, network bandwidth and latency to other places, whether it’s internal or external), to non-capacity related cluster level characteristics (such as virtual technology platform, available storage tiers, available network tiers) and finally intra-cluster characteristics (like resource pools, not just in the VMware sense but you may have multiple tiers of storage presented to a single cluster when you then logically separate into different “resource pools”, and then attach different levels of governance or other policies to these resource pools).

Obviously the above examples are not exhaustive, but hopefully they’re enough to give you the picture. Once you have defined the metadata that is relevant for you, you can then implement the necessary business logic to map the demand to the supply, thus determining where to look for capacity. And of course after you figure out where to look for capacity, you can actually go look there (and hopefully find some capacity, otherwise your capacity planning process needs some attention).

Just because you may need this data, doesn’t necessarily mean you need to prompt a requestor for it – there are many ways by which you can derive the requisite information. Directory service group membership, organisational group, cost center, assocation of a known application with the request (if you have a centrally defined application repository that tracks info such as business criticality and data sensitivity) etc etc. And probably in the majority of cases you’ll want to make these decisions without prompting the requestor for them. After all, ultimately the request may be coming from the application itself rather than a person.

So where to from here? A standardised set of attributes for these various logical entities will only go so far – the OVF specification is probably a good starting point for technical aspects of a workload, and naturally if you define something as a characteristic of a request it’s probably just as relevant to define the same in your infrastructure at some level. I don’t know of any industry bodies like DMTF or even any loose affiliations of companies that are looking at this from an infrastructure perspective, it sure would be nice if they did. But as I said, that will only go so far – each organisation will have their own pieces of metadata that they care about, and so however you implement the metadata store you should at least ensure that is flexible. But don’t mistake “flexible” with “uncontrolled” – you should absolutely ensure that the metadata schema is well defined and under change control.

The technology in this space is rapdily evolving and will continue to do so, but getting your data model right is just as important as (if not more important than) getting the technology right, and probably needs a lot more thought and input from a wider audience within your organisation than the technology decisions do. Thinking about this stuff now won’t cost you a cent in capital expenditure, but will make all the difference to the speed and success with which your organisation can move into the Cloud.


8 Responses to “This Cloud needs an enema!”

  1. Tom Howarth Says:

    I very cogent and well thought out article. you make some very good points, the Metadata is ofter overlooked as is the security aspects of the cloud. both are the elephants in the room that nobody wishes to be the first to mention.

  2. Tom Howarth Says:

    Ahh it is too early in the morning for typing. The first word was supposed to be “A” not “I”

  3. Gabrie Says:

    Good post Stu !!!

  4. Rodos Says:

    Stu, great post. Not to much of a rant. As a person who has posted on UCS and Cloud in the week, with them appearing of v12n I feel I have a right to reply.

    Firstly UCS is not cloud. UCS is an interesting platform for building cloud on, just like it may be for building virtualisation on. I post on it because its interesting and something that others are interested in. I tag my posts and have asked VMware to feed only posts I tag with VMware, but they don’t they take all of them. I can’t help that. I know Scott has been posting on UCS too.

    As to the Cloud adoption bit. Agree on the meta data thoughts. Especially as you move into larger organisations. That metadata may be human generated from the business units or it may be automated through other systems generating requests for services. I have described before (and keep pushing VMware to ensure they include in the vCloud APIs) the inclusion of customer specific placement and movement algorithms that manipulate your own metadata. For example the vCloud API may review the standard meta data in your workload and determine that out of your set of 16 data centers (some internal, some external) only 5 of those meet the criteria. However you may have metadata on your workload that vCloud does not understand or you many not even want to tell someone about. You want to write your own placement algorithm that you can program and load that may reduce that set of 5 down to a possible 2. For example you may have your own regulatory requirements around data placement or latency requirements to facilities that only you can know about or measure.

    One other element, you mention metadata being related to Tiers of storage as an example. I believe that Metadata for the cloud may need to be further abstracted and instead be based on a service level or performance metric. You should not care what tier of storage its on, rather care that it performs in a certain way.

    A few of us have been discussing this metadata you speak of a little while ago, see

    Would love to catch up with you at VMworld and get some of your insights.

    When you see UCS posts, hit next or delete and tell Mr Troyer to aggregate based on tags. If its on Cloud, keep on keeping us honest and relevant to your market.

    Fantastic stuff!


    • stu Says:

      Thx Rodos – don’t take that UCS part the wrong way, of course UCS != Cloud, what I meant was more along the lines of “stuff in V12n that isn’t about VMware products”. As I said at the start of the post, UCS is interesting and important as applied to the cloud (it’s not a compelling option for anything other than virtualised workloads, by design) I was just saying there seems to be a lot of focus on that at the moment rather than other stuff that people need to put more thought into than something comparatively simple like a hardware platform choice (note the word comparatively – I’m not trivialising the importance of hardware platform, but compared to data modelling it certainly is trivial :-)). If you think I’m not going to read anything you or Scott write about any topic that you care to, you’re dead wrong!

      As for the storage tier stuff, tier differentiation on the basis of characteristics like availability or performance is exactly what I’m talking about – I dont think I implied anything otherwise? The classification of anything according to things like service level or performance metric is by definition tiering, there is a graded relationship.

      Unfortunately I’m not going to make it to VMworld, but when I come back home for a visit I will definitely try to catch up with you for a beer. And thanks for the comments 🙂

  5. Rodos Says:

    Tom, not all of us are overlooking the metadata! Some of us think its critically important and have been talking about it.


  6. Dingo Says:

    Nice post man! 😀

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: