2008/09/24

State of play in EC2-based database hosting

Oracle recently blinked and decided to support their DB and some other stuff on EC2. Reading the actual terms though, assuming I've understood them correctly, they haven't actually done anything other than map EC2 virtual cores onto CPU Sockets, and let the normal rates apply. That's IT. What this means is that running one Oracle server for 100 hours is still 100 times more expensive (for licensing costs) than running 100 Oracle servers for one hour.

That's not cloud licensing. Cloud computing works on the premise that whether you use one, 10, or 100 CPUs, you pay per CPU-hour, no more no less. That's what DevPay does. The only problem is that in this scenario, software is a commodity, which I imagine doesn't sit too well with Oracle.

Virtualisation has been around for decades, but only once FLOSS commoditized the server operating system 'ecosystem' did it become possible to do things on the scale that Amazon are doing. Back in 2005 I had a Linux VM with the inestimable Bytemark, and it was plain for all with eyes to see that virtualisation was going to pull the floor out the bottom of the server hosting market once players had found the right way to leverage the economies of scale. Right now, that's being done behind closed doors by the big players, but Amazon are the first to have thrown open the doors to the unwashed masses, and that's why I like them so much.

(To repeat, I don't own stock - maybe I should :-)

To get back to the point. How do databases fare on Amazon EC2? Given that EBS has only been around for a couple of weeks, and before that, on EC2, DB hosting was risking everything to a block device that could go *poof* at any moment, which wasn't exactly pleasant.

This is something which will remain up in the air until someone with serious [PostGre/My]SQL-fu takes some AMIs, configures them just so, and benchmarks them. We know, right now, that on a small instance, disk throughput tops out at roughly 100 MB/s on a three-volume RAID 0 setup. I'm interested in seeing the speeds for EC2 and EBS on larger instances.

Moving on from pure throughput, how do PostGreSQL & MySQL stack up on these setups? Do their respective caching mechanisms etc. work with or against this strange new environment? Enquiring minds want to know!