If we were to design highly scalable website from scratch, what technologies would we use?
Based on Web 2.0 popularity, LAMP seems to be high in the running. We should looking for the best community support, the fastest development time and most importantly, the best scaling approach.
in LAMP , The L and the A part is not the complicated one. The M part neither. Its the P part that you have to give thoughts about. PHP. Perl or Python.
PHP is pretty much de-facto but misses an application-server.
Perl is pretty good but harder than PHP.
Python is pretty good but is **** about white space.
Here is my plan in phases , what I'm using to start a website from scratch using a single server for now. Later, I'll scale out horizontally when the need arises.
Phase 1: Single Server, Dual Quad-Core 2.66, 8gb RAM, 500gb Disk Raid 10
OS: Fedora 8. You could go with pretty much any Linux though. I like Fedora 8 best for servers.
Proxy Cache: Varnish - it is way faster than Squid per my own benchmarks. Squid chokes big time.
Web Server: Lighttpd - faster than Apache 2 and easier to configure for me.
Object Cache: Memcached. Very scalable.
PHP Cache: APC. Easy to configure and seems to work fine.
Language: PHP 5
Database - MySQL 5
Phase 2: Max Ram out to 64 GB, cache everything.
Phase 3: Buy load balancer + 2 more servers for front end Varnish/Memcached/Lighttpd. Use original server as MySQL database server.
Phase 4: Depending on load & usage patterns, scale out the database horizontally with an additional server. I don't expect the db to be a bottleneck for website as only metadata info is stored there. I'll mostly be serving images stored on the file system. Possibly separate Varnish / Memcached / Lighttpd tier into separate tiers if necessary. But I'll carefully evaluate the situation at this point and scale out appropriately and use CDN for static content if necessary.
Phase 5: Max all servers to 64gb of RAM, cache, cache, cache everything.
Phase 6: If I get this far then I'm a multi-millionaire already so I'll replace all of the above machines with whatever the latest and greatest is at that time and keep scaling out.
The important point is that I know how to scale each layer when/if the need arises. I'll scale the individual machines when necessary and scale horizontally too.