Pages

May 13, 2011

Blogger.com had a big outage which hampered Nextbigfuture

10-15% of internet users hit something on blogger.com in a day.
It has about 0.75% of internet traffic. This is 12% of the total traffic that goes to the main site of google.com

The blogger.com problem was software maintenance gone wrong.

Blogger.com status is here
They are still recovering articles from the last two days.



ZDNet describes the issue

Earlier this week, Google rolled out a maintenance release for its Blogger service. Something went terribly wrong, and its Blogger customers have been locked out of their accounts for more than a day. Google’s engineers have been frantically working to restore service ever since, although they haven’t shared any details about the problem.

A Blogger Service Disruption update contains four updates from the last 24 hours, starting with this one:

We have rolled back the maintenance release from last night and as a result, posts and comments from all users made after 7:37 am PDT on May 11, 2011 have been removed. Again, we apologize that this happened and our engineers are working hard to return Blogger to normal and restore your posts and comments.

That’s nearly 48 hours of downtime, and counting. Overnight updates promise “We’re making progress” and “We expect everything to be back to normal soon.”

My question is, “What if this had happened to another Google service?” Say, Google Docs? What if every document you wrote and saved on Wednesday was suddenly taken offline on Thursday, and you no longer had your presentation or your notes or your research for a client meeting today? How does this promise from Google sound now?

Goatguy had a comment about Cloud Services and information in the cloud

NBF - There is an option with blogger.com to save all articles as an XML file. So this is the option for backing up or as a basis for exporting to another site.

In essence, the big trend is to go for cloud computing, for cloud document storage, for aggregated cloud-based or cloud-dependent shared-hosting (such as BlogSpot’s thousands of sites). It makes oodles of sense; the administration of the virtual site is itself virtual. You know not, and need to know not what server, or servers your site is being hosted. You don’t worry about backups – because its not like you could do much with them to independently bring a site back up. The concept of virtual cloud hosting is a candy-bowl full of goodies, addictive goodies, techno-narcotic laced goodies, deeply stimulating geek-endorphin goodies.

We all do the same thing – talk our selves, or be talked into timid forays into using the cloud services. First the experiments, then the pilot hosting, then as things look good, full hosting, then consolidating the hosted services. It works, that narcotics take hold, and you’re IN, baby. Then … shit happens.

The bowl is empty, you don’t have a backup action plan, you can’t reasonably reproduce your hosting content in a brief amount of time; indeed, the amount of time to reproduce the site elsewhere is so onerous that you simply only talk about it, countenance it, but … gaahhhh, not do it.

OK, there’s the narrative of “what is”. My deep worry is that the trend toward virtualization isn’t just subject to this kind of technological tsunami (it is equivalent!), but also that it is subject to lawless censure, governmental meddling, unwanted and possibly constitutionally illegal data mining, and what I’ve come to think of as “data leaks” – slow but persistent extraction of data by data mining bots, with no-permission-given aggregation, then unwanted marketing (or worse) subsequently.

THIS IS NOT to say that one’s own hosting on private servers, or a less unified and more Balkanized assortment of servers would be free from similar targeting. Our $1 bill says, “in God we Trust”. It is kind of the same thing with using the Borg / Google to host our stuff. We are promised, and we hope that “the largest carrier, with the highest market value and track record for success will also provide the best service, the highest surveillance for bad boys trying to make a mess of things, will provide the most conservative updates and will be the least likely to suffer serious system-wide outages”.

Admit it – that’s what you want too! We all do. Hence why we pick Google over Amazon, and Amazon over (Rackspace, RightScale, CloudShare, HP, IBM, Foundation Network, Microsoft, Sun Cloud, Sun, VMWare, 3tera…) From what I can tell, there are over 80 cloud providers at this point. Statistically, if Zipf’s law is applicable (as it almost always is for such statistical collections of resources) then 80% of the work is being done by 20% of the providers. And within that, 80% of the 80% is being done by 20% of 20% of the providers. (i.e. 65% by 4%)

OK, now I’m mostly done. But consider this: I don’t have an alternative plan that is really workable. I can’t see you simultaneously maintaining your Blogspot account, letting all the real-time commentary and new articles accumulate, and at the same time trying to mirror the stuff onto a separate resource, offering necessarily different formatting, layout, content, database facilities and programming back-end features … while imposing the very same features as cover to analogous limitations. It would be mighty hard to keep up such a service, if you were forced to take manual control over the cross-synching.

But here’s another thought. Isn’t it arguably reasonable that a service such as BlogSpot also offer you, the senior bloggist, a streaming service via periodic SFTP (say 4 times an hour, or when 10 megabytes of tape is accumulated) that packages up the changes to your webhosting site, so that your site theoretically could be brought back up to any particular time from the periodic “frozen snapshots” of the whole thing plus all the audit/sequential change logs? Blogspot themselves, defensively, could host their “alt service” on the HP, IBM, SUN, Rackspace clouds (why not all?), and simply commit to self-restriction of not performing updates and upgrades to the extraGoogle site mirrors until their own cloud has stabilized in non-quirky ways for at least 48 hours.

Again, you would need simply to feed your base-snapshot to sftp://alt.nextbigfuture.com in order to bring up the alternate site, with only a loss of 15 minutes, or 30 minutes of changes. Reasonable. Indeed – your “backup streams” wouldn’t even need to be hosted on your personal computer or corporate server. The service agreement would always have them hosted at “alt.nextbigfuture.com”.

Shoot… the alt site ought to also be continuously integrating the logs and sequential snapshots, so that you don’t even have to, yourself, upload them, or engage the processing to integrate them into your segment of the database. In a major outage (or in response to your complaints!) Google would need merely to redirect www.blogspot.com/nextbigfuture to http://alt.blogspot.com/nextbigfuture in order to restore service. Further, if Google was really on top of its game, they would be continuously “test shifting” (or more positively, “load balancing and capacity evaluating”) sites transparently by swapping them on a daily basis between the primary and the backup site.

Such continuing (and essentially, and necessarily transparent) swapping would serve to ensure that event-forced swapping would work as transparently as possible, and that all the kinetics is worked out to allow any hosted service to run from any cloud as needed, with the cross-synchronization of activity logs able to keep the services live, and at some point restore the services to the failed location((s)) as opportunity arises.

So sez me. This is the kind of geeky idea-dandruff you get from working for years with Tandem “Non-stop” computers. Too bad they went away. Nice company, good computers, intriguing idea, utterly inefficient, but remarkably workable none-the-less.

If you liked this article, please give it a quick review on ycombinator or StumbleUpon. Thanks
blog comments powered by Disqus