There has been a lot of excitement around Web 2.0 technology lately. It appears that we haven't learned much from our Web 1.0 days, also known as the .Com days. These exciting new technologies allow for companies to quickly produce new web sites and easily extend existing web sites. The problem is people forget that to be successful in the long term, you still need a reliable and well designed architecture and you still need some form of governance to ensure that the end product can meet high levels of quality of service.
What do I mean by Quality of Service (QoS)?
In terms of web sites and software, when I talk about QoS I am referring to:
- Service Level Agreements (SLAs)
- Minimal defects
Balancing Agility with QoS
Startups are faced with a dilemma. They typically have a limited amount of time, money, and resources. They need to get a prototype up and running quickly to give them an opportunity to go after VC funds. The defining moment comes when they get the funding and then have a limited amount of time to produce a Beta version soon to be followed by a real production version. This is where many startups miss the boat. Many prioritize speed to market over sound architecture design. It is a tough decision to make.
- Do I spend a lot of time and money now and risk missing the window of opportunity?
- Can I worry about scalability after I see how well received the web site is?
- Can I afford to pay an experienced architect now or can I wait until we have more money and a good stream of traffic?
|From Web site crashes|
Twitter had quickly become a rising star on the web with millions of users. In the April time frame their traffic spiked to a level beyond what their underlying architecture could handle. Now Twitter is the joke of the town and are victims of daily outages and performance issues. What we found out is that behind the scenes the baby is ugly. Twitter recently responded to some questions from Techcrunch and revealed more information about their "architecture" then any sane person would offer. Here is a summary of what I learned:
- Use one database for writes (because "replication of MySQL is no easy task")
- Limited scalability -3 physical database machines "POWERING ALL OF TWITTER"
- Human Intervention required - "There's a lot of necessary handholding and tweaking "
- They plan to grow operations, rather then fix the handholding
- Tightly coupled - massive traffic on one part affects all
- Many design flaws - "Everything from faulty process, environment, configuration, and just plain load"
But Twitter is not the only startup having issues. It is becoming common for me to experience crashes and outages in my daily routine of using various new web sites and services. There have been days that I have seen four sites down at the same time. It is getting so common that I started creating a collection of screen shots.
Technorati has been struggling recently with a surge in web traffic. Countless other startups have flashed their cute crash messages on my screen. To me, the joke is on them. It's no wonder that the corporate architects of the world (me included) get a little annoyed when the media and talking heads start touting Web 2.0 and Mashups as the silver bullet for enterprises. Many have said that we should skip SOA because it is hard and time consuming and instead go the WOA route. I think Web 2.0, Mashups, and WOA are all great technologies, but when they are applied to an architecture that does not scale or do not follow some form of governance to assure some level of QoS, then you will likely need to design a cute outage web page to entertain your users as your team scrambles to bring the system back on line.
Here is one that I have created for any new startup out there.