-
Website
http://www.scobleizer.com/ -
Original page
http://scobleizer.com/2008/10/05/help-im-clueless-about-web-service-scalability/ -
Subscribe
All Comments -
Community
-
Top Commenters
-
danja
44 comments · 4 points
-
polizeros
52 comments · 1 points
-
AndyBeard
69 comments · 4 points
-
Zachary Adam Cohen
35 comments · 8 points
-
dbarefoot
40 comments · 3 points
-
-
Popular Threads
-
The best and worst thing Twitter did in 2009: RT
22 hours ago · 20 comments
-
World-brand-building mistakes France’s entrepreneurs make
1 week ago · 181 comments
-
2010: the year SEO isn’t important anymore
6 days ago · 67 comments
-
iPhone developers abandoning app model for HTML5?
6 days ago · 51 comments
-
Google eating Yelp?
5 days ago · 25 comments
-
The best and worst thing Twitter did in 2009: RT
Also, how difficult is it to alter your data model and architecture without screwing up when you are live? Does everyone have to get an expensive DBA? Or is it mostly common sense?
I usually develop in stages, with the first stage being very bloated with repetitive functions and calls. After I get all of my functionality in order I work through everything and consolidate functions to make them as dynamic as I possibly can. I've tried to plan things on paper ahead of time, but it never works out.
Do they have a development process that addresses this from the beginning, or do they go through the same stages?
Thanks!
Have a great time!
Amy Woidtke
green interior decorator
Seattle, WA
I've picked up on one of the words you just used "glue". Perhaps ask them how they can build a credible, trusted Brand when they glue bits of applications together to create something for the end-user.
How, as a Brand will they cope when these services fail, how will they manage that Brand Experience. Sure, a lot of these products are free to the end user yet that doesn't mean that they should receive a shoddy or poor or failing service.
If they choose free services in the cloud from places like Amazon and Google with no real uptime guarantees, how can they manage that?
Lots of people love twitter, yet people are having to now change their ways to use it well, de-follow, use Friendfeed, Twitter turned off sms in UK as well. All these are actually people problems and not technical problems. They impact upon the way people go about their lives.
I'd also be interested in how they plan (or if they can at all) for people using the product in ways they never imagine that impact upon their initial scope? That's kinda what happened to Twitter as well.
Mike Ashworth
Marketing Coach and Consultant
Brighton and Hove, Sussex, UK
Come up and visit us in San Francisco this week - having gone through a LOT of scalability issues with the growth of Technorati, I can give you lots of grounding on the basics, so you won't feel uncomfortable with these very smart guys. There's a set of basic rules and principles that help out in building scalable systems - but it'd take an entire book to really talk about them all in detail. Come on up and visit - and if you want, I'll give you a sneak peek into the depths of Offbeat Guides, and how we're building to scale as well... :-)
You've got my number/email, drop me a line. It'd be great to catch up as well!!!
Dave
This is a work of art and you must be congratulated. It's no wonder you're everywhere we turn.
Good luck with your interview, I'm sure it'll be a great success.
Pete.
How do you think about ROI for time and energy invested in making your architecture more scalable?
I would use your lack of knowledge about scalability as an asset...you are a curious person, and can help someone like me who wants to learn about scalability through you.
It is sometimes better not to know too much so that you can find out.
So find out for me!
D
Discuss.
Do you build an application and make it scalable, or do you build a scalable application from the beginning?
Is scalability more important than functionality sometimes?
Simon.
In part it means, encapsulating your design and code at a mid to low level of granularity. If you are strict with encapsulation techniques (following pure OO and message passing design is a solid method) it doesn't matter what languages or libraries o databases you use in each part as you simply rewrite/replace components as they become bottlenecks.
This approach allows you to get a full system bootstrapped without need to optimize components ahead of time.
The whole point of a web service is that you have a simple API and others don't care about how you implement internals. So taking an API-centric design approach is key.
1) Start with something very restrictive. App Engine is somewhat close to where you should start. You get .2 seconds to handle each request, you have no state on the server, you can't do joins or complex queries on the database, and you can't persist anything outside of the database. At no point should you assume that any two requests will hit the same front end server, and at no point should you assume that any two database entries will be on the same DB server.
2) Ignore everything in step 1 *when necessary for your application*. This important request needs to do a lot of heavy lifting and a few joins? Fine. You need to do a database query that takes 2 seconds on an empty database for a feature of marginal use? No. Make your application unscalable if it is necessary, but only as a conscious choice.
3) Now you've hit the big time, and your system is melting. Add front end servers and throw session info into the database. Federate and replicate the heck out of the database, and do everything you can to get rid of the expensive queries you added in step 2. Generally, this will involve de-normalizing and caching as much as possible. Add caching at every point in your app. You should be able to hire people to help at this point.
Are there some features in a language / framework that make it better suited for scaling?
Which language / platform would be ideal for building a robust easily scalable web application?
* What technical infrastructure scalability models did they try (horizontal, vertical or perhaps a combination?
* Scalability and resilience often go hand in hand... was resilience considered to be a known factor or a bonus when they decided to scale up or scale out?
* Have they encountered any geographical scalability problems?
* On what OSI-level have they concentrated the most, when trying to solve scalability issues?
* How are they pro actively monitoring performance problems and on what OSI-levels are they monitoring?
* From a scalability perspective "the cloud" can look like a good idea, but what are their thoughts on resilience in "the cloud"?
Fredrik Wennberg
IT Solution Architect
Amberpoint is the one I'm most familiar with. (www.amberpoint.com). I'd love to know if there are others that are different/better.
- What were the symptoms?
- What part of the architecture or system was the most limiting factor?
- What did they do about it that wasn't just volume/capacity adjustments with more boxes?
- What was the most innovative thing that was created to deal with scale? Did they make any new/interesting algorithms, techniques, methods, etc.?
- They're probably not done dealing with scalability, what are their ongoing efforts to improve?
- Was scalability always/only about technology? What about processes and people scalability?
I regularly speak at conferences about scalability including 4 sessions this year at MySQL Conference. In the past I helped scale Fotolog (then the 13th largest website on the Internet) to achieve a 10x growth without adding any database server. Most recently I presented a session at Dave McClure's Startonomics about Startup Scalability Strategies: How to grow up without blowing up. I regularly help candidates preparing for their interviews. You can reach me at 5 5 1 6 5 5 5 5 9 0 and within 30 minutes I can help you cover major ground in terms of strategies, approaches and tools to become scalable.
Regarding at what point a Startup should consider scalability:
http://startonomics.com/blog/startup-scalabilit...
http://mashraqi.com/2008/09/startonomics-startu...
http://startonomics.com/blog/scalability-for-st...
Talk to you soon,
thanks,
Frank
I wrote a blog post on this issue last month at http://blog.broadpool.com/2008/09/23/it-goes-to... . I guess the biggest questions for most listeners would be, "How do I get there? I can't start from where Yahoo! starts, so how do I build a site so that it can grow over time?" The other question would be, "How do I handle success? What do I need to do to ensure that, should my web app be wildly successful, we don't die because of it?"
You have to bring up the Twitter scenario--ask them there thoughts on why Twitter failed (db scalability).
You should also ask about 'the cloud' and the future of scalability. Is what was once a major purchase and commitment (new servers, configing and coordinating) soon to be replaced by 'cycles on the cloud'?
Ask them about scaling on LAMP vs. scaling on Windows. :-)
Ironic that tomorrow is the 3 year anniversary of a post I did entitled, "Web 2.0 Conference: The Dirty Little Secret": http://www.iconnectdots.com/ctd/2005/10/web_20_... where I talked about the complete and utter lack of ANY discussion of scalability or latency. It was all about "just build it" which made me shudder.
Here are some key questions to ask:
1) There are two audiences for scalability, developers and users, but they have one shared goal, performance. Are there any best practices or benchmarks for how fast a web app or page SHOULD parse? Is there a min-max window of performance developers should target?
- Developers want application performance but balanced with a need to optimize conflicting priorities (e.g., delivering fast web apps but needing to wait for an ad server to deliver a personalized advertisement).
- Users don't care about the monetization demands...they just want the app to work and be nearly as good an experience as a desktop app (though willing to make a trade-off for having stuff in the cloud accessible from anywhere with different device types)
2) We all know there are accelerating internet loads from an ever-increasing number of broadband users, apps in the cloud, and data types like video. We've seen wildly conflicting estimates of internet capacity as well. Are there *any* definitive internet infrastructure numbers that'll help developers, I.T. professionals or anyone creating 'net-centric strategies, to get a clue about latency, capacity and so forth going forward?
(Need to tell you that this is THE #1 biggest issue with all the startups we cover at Minnov8.com. Without a PhD in network topology and infrastructure, how the hell is a handful of geeks to build and deliver a strategically sound platform or application-set?).
3) "To API or not API: That is the question". One could argue that the root cause of Twitter being the poster whale for fail was the API. It's almost comical how many have leapt on the API and are using it for all sorts of apps. So the question might not be "API or not", but "when to API"?
Good luck.
--
Steve
Sounds like an amazingly interesting webinar. I guess the question I would ask is, how exactly do you test your efficiency and scalability before hand, so that you can be modestly prepared for that overnight 6 million user count?
Also, will this be available after the conference? I registered on Fast Company, but I will have to working as the conference is live. Thank you Scobleizer
Admonishion? You mean admonition?
I know there is a stark difference, but really, come on. If you're allowed to ask help, why not give folks the benefit of the doubt if they flub a resume and ask them why?
Fast and big can be great (and usually commands attention) but may or may not be a good indicator of success. Likewise, how do they measure success and then revise/redeploy on the fly in response to the data.
Questions:
"Individually looking at each others past projects like iLike and Friendfeed, what items in their development do you think were key to the projects success? In a similar vein, what would you have done differently and how would it have made things better?"
"In hindsight, were there key players beneath the surface in these projects which played a large part in making them a success, or was the project so well defined that everyone came together equally in making it a success? If there were key people, what did they provide to the project that you couldn't provide personally?"
Then it should be easy to show remarkable improvement when the real interview comes around. Everyone will be commenting about how you aced it, regardless of the fact that you will still be the least-informed individual in the room on the subject.
It's all about perception management. Instead of comparing you to the more-informed subjects of your interview, most people will compare you to your even-less-informed previous performance.
I'm not going to tell you where I got this idea. Just suffice it to say that political strategists are genius!
Who is going in unprepared. Seems this blog post just prepared me in a BIG way for Thursday!
There are some basic topologies that everybody uses:
a) "the sink" uses the database as repository of every message -- write it first, let it be read second; parallelize/cluster the db server and you easily crank up the volume. This model is very popular amongst web 2.0 systems. It is limited though, as adding a db server only brings 0.75 more power.
b) "the network" uses pipes to distribute messages between writers and readers -- very popular in the telco industry, where speed is master and geography plays an important role. Parallelize writers and readers and you get a linear scale that depends alone on the number of servers you put in. Big disadvantage is that distributed persistence needs to be consolidated at some point -- can generate some difficult to tame data flows.
Now, ask yourself: which model was chosen by google? and which by twitter?
Services are fascinating, for a couple of reasons. Sure, there are technical "service scaling" issues, which the other folks on the panel will know all about. Matt has great stories on the scaling of Akismet. But far more interesting are the human scaling issues. I always find more thoughtful discussions there.
First, there are terms-of-use issues. When you get to a certain size, you need to have policies for appropriate use. You wind up creating competitors, and an ecosystem emerges around your service. Look at Twitter, and the emergence of complementary products like Summize. Then look at what happened when Notchup exploited Linkedin to grab a bunch of users. How you monitor use and enforce terms of use is a big question, and it goes far beyond simple APIs and scaing.
Second, there's the fact that we're building human APIs. APIs and web services are typically focused on letting machines talk to one another. But by tying realtime activity feeds to our mobile devices, or location-based services that report our coordinates, we're plugging humans into applications, Amazon Turk style. As humans start to interact with applications via web services, through mobile devices and so on, a whole new set of scaling issues emerge.
Since you're the higher-level, human-angle participant on the panel, I'd elevate things beyond bits and bytes and into humans, policies, exploitation, and startup ecosystems.
Not sure that helps... sounds like it'll be a great panel!
A.
- Where does scalability cross paths with standards (WS-*, WSDL, UDDI) vs. simplicity (REST, POX+HTTP).
- Do these people think that ESB has a place in their view of scalable services?
- How should you look at databases differently in a service-oriented model? (There's all sorts of sub-topics here; do services share a database, or have their own, does it vary? 2PC vs. other kinds of synchronization? Clustered caches vs. RDBMS?)
- What mechanisms are important to keep instances in synch and sharing work without tripping over each other or creating new bottlenecks (messaging, database, clustered caches, etc.)?
- Are location-transparency/routing and directory/discovery services important to scalable SOA, or is this simply the job for DNS and load balancers?
There's tons of interesting things to talk about here, really.
How do you measure and monitor the external performance of your Web API to proactively deal with loading, connectivity, and application issues?
What is the issue you run up against most: bandwidth bottlenecks or application loads?
smp
http://highscalability.com/
2) Is true horizontal scalability ever achievable or will there always be some shared resources?
3) Is designing for scalability a reactive or proactive process?
4) Are we too eager to abandon the benefits of normalization to achieve scalability?
5) Traffic patterns, especially for smaller sites, can be extremely volatile and unpredictable. How to you design for that?
6) What's the time horizon on scalability becoming a commodity that's provided by hosting providers alongside power and ethernet?
What decisions, from your initial design, presented the largest hurdles to scaling?
What issues will you ensure are taken into consideration in your next version 0 design?
How have you changed the way you develop software to improve scalability?
Software/system architecture is only one aspect of successfully building a scalable system. Operations, development/deployment processes, monitoring and so on all make a big different to the scalability of any system.
Which provider would they choose today if they would need to built a new service that could face a scalability problem.
Limitation/Marketing
Is an invitation procedure a good way to manage the number of users accessing the platform. Would it frustrate the people not authorized to access the system or would it force them to find alternative ways to beta test it ;-)