When talking to people about BigDoor, we often get asked about our backend database systems. Potential clients want to know whether or not our system can handle their traffic (yes, it can) and other start-ups want to know how we scaled to handle the high traffic levels of our awesome clients. We thought it would be useful to give a little bit more information about our system as well as share some of the things we learned in the process. There is no better expert in this topic than our CTO/Co-founder Jeff Malek. He took a break from working his daily magic to give some perspective and advice. 

A friend was recently asking about our backend database systems.  BigDoor’s  systems are able to successfully handle high-volume transactional traffic through our API coming from various customers, having vastly different spiking patterns,  including traffic from a site that’s in the top-100 list for highest traffic on the net.   Don’t get me wrong; I don’t want to sound overly impressed with what our little team has been able to accomplish, we’re not perfect by any means and we’re not talking about Google or FaceBook traffic levels.  But serving requests to over one million unique users in an hour, and doing 50K database queries per second isn’t trivial, either.

I responded to my friend along the following lines:

  1. If you’re going with an RDBMS, MySQL is the right, best choice in my opinion.  It’s worked very well for us over the years.
  2. Since you’re going the standard SQL route:
    1. If your database is expected to grow in step with traffic, and you’re thinking about sharding early – kudos.  You’re likely going to have to do it, sooner or later.
      1. Sooner vs. later if you’re in the cloud and running under its performance constraints.
      2. Do it out of the gate, if you have time, after you’ve figured out how you’re going to do it (i.e. whether you’re going to leverage a tool, DYI, etc).
        1. In other words, if you have time, don’t “see how long you can survive, scaling vertically”.
          1. Sharding while running the race : not a fun transition to make.
      3. I realize what I’m saying is counter to popular thinking, which is “don’t shard unless you absolutely have to”.
        1.  Without the assumption that your data set is going to grow in step with your traffic, I’d be saying the same thing.
    2. Designing your schema and app layer for sharding, sharded on as few keys as possible, ideally just one, is not future-proofing, it’s a critical P0.
  3. Since you’re going to be sharding MySQL, your options are relatively limited last I checked.
    1. Ask for input from folks who have done it before.
    2. The other sharding options I started looking at over two years ago all had disallowing limitations, given our business model.
    3. At quick search-glance just now, it also does appear that dbshards is ruling this space at this point.
  4. So barring any other options I’m missing, your best options that I’m aware of:
    1. dbshards
      1. Definitions we/they use, to help clarify discussion  :
        1. global tables : tables that contain the same data on every shard, consistency managed by dbshards.
        2. shard : two (primary and secondary) or more hosts that house all global table data, plus any shard-specific data.
        3. shard tree : conceptually, the distribution of sharded data amongst nodes, based on one or more shard keys.
        4. reliable replication : dbshards-proprietary replication, more details on this below.
      2. pros
        1. The obvious : you’ll be able to do shard-count more reads and writes that you’d otherwise be able to do with a monolithic, non-sharded backend (approximately).
          1. Alternatively, with a single-primary read-write or write-only node, and multi-secondary read-only nodes you could scale reads to some degree.
            1. But be prepared to manage the complexities that come along with eventual read-consistency, including replication-lag instrumentation and discovery, beyond any user notifications around data not being up-to-date (if needed).
        2. It was built by folks who have only been thinking about sharding and its complexities, for many years
          1. who have plans on their roadmap to fill any gaps with their current product
            1. gaps that will start to appear quickly, to anyone trying to build their own sharding solution.
              1. In other words, do-it-yourself-ers will at some point be losing a race with CodeFutures to close the same gaps, while already trying to win the race against their market competitors.
        3. It’s in Java, vs. some other non-performant or obscure (syntactically or otherwise) language.
        4. It allows for multiple shard trees; if you want (or have to) trade in other benefits for sharding on more than one key, you can.
          1. Benefits of just sharding on one key include, amongst other things, knowing that if you have 16 shards, and one is unavailable, and the rest of the cluster is available, 1/16th of your data is unavailable.
            1. With more than one shard tree, good luck doing that kind of math.
        5. It provides a solution for the auto-increment or “I need unique key IDs” problem.
        6. It provides a solution for the “I need connection pooling that’s balanced to shard and node count” problem.
        7. It provides a solution for the “I want an algorithm for balancing shard reads and writes”.
          1. Additionally, “I want the shard key to be based on a column I’m populating with the concatenated result of two other string keys”.
        8. It has a distributed-agent architecture, vs. being deeply embedded (e.g. there are free-standing data streaming agents, replication agents, etc instead of MySQL plugins, code modules, etc ).
          1. Provides future-proofing, scaleability and plug-ability.
          2. Easier to manage than other design approaches.
        9. Streaming agents allow you to plug into the update/insert stream, and do what you like with changes to data.
          1. We use this to stream data into Redis, amongst other things.  Redis has worked out very well for us thus far, by the way.
          2. Other dbshards customers use this to replicate to other DBMS engines, managed by dbShards or not, such as a column store like MonetDb, InfoBright, even a single standalone MySQL server if it can handle the load.
        10. It supports consistent writes to global tables; when a write is done to a global table, its guaranteed to have been done on all global tables.
        11. It doesn’t rely on MySQL’s replication and its shortcomings, but rather on its own robust, low-maintenance and flexible replication model.
        12. Its command-line console provides a lot of functionality you’d rather not have to build.
          1. Allows you to run queries against the shard cluster, like you were at the MySQL command line.
          2. Soon they’re releasing a new plug-compatible version of the open source MyOSP driver, so we’ll be able to use the same mysql command line to access both dbShards and non-dbShards managed MySQL databases.
        13. Its web console provides a lot of functionality you’d rather not have to build.
          1. Agent management and reporting, including replication statistics.
          2. Displays warning, error, diagnostic information, and graphs query counts with types.
          3. Done via the “dbsmanage” host, which provides centralized shard node management as well.
        14. It’s designed with HA in mind.
          1. Each shard is two (or optionally more, I think) nodes.  We put all primary nodes in one AWS availability zone, secondaries in a different one, for protection against zone outages.
          2. Write consistency to two nodes; in other words DB transactions only complete after disk writes have completed on both nodes.  Secondary writes only require file-system (vs. MySQL) disk writes.
          3. Managed backups with configurable intervals; MySQL EC2/EBS backups aren’t trivial.
          4. Web-console based fail-over from primary to secondary; this is very helpful, particularly for maintenance purposes.
        15. Proven to work well in production, by us and others.
          1. We’ve performed 100K queries per second in load-testing, on AWS/EC2, using m1.xlarge instances.
        16. Designed with the cloud and AWS in mind, which was a great fit for us since we’re 100% in AWS.
        17. “dbsmanage” host
        18. Drivers included, of course.
          1. In addition to MyOSP, they have JDBC, PQOSP (native Postgres), ADO OSP (for .NET), and soon ODBC.
        19. Go-fish queries allow you to write standard SQL against sharded data
          1. e.g. sharded on user.id : SELECT * FROM user where FirstName=’Foo’;
            1. will return all results from all shards performing automatic aggregation
              1. sorting using a streaming map-reduce method
        20. Relatively easy to implement and go live with; took us about six weeks of hard work, deadline-looming.
        21. It’s the market-leading product, from what I can tell.
          1. 5 of the Top 50 Facebook apps in the world run dbShards.
        22. It supports sharding RDBMSs besides MySQL, including Postgres, DB2, SQL Server, MonetDb, others coming.
        23. Team : top-notch, jump-through-their-butts-for-you, good guys. 
        24. Ability to stream data to a highly performant BI backend.
      3. cons
        1. As you can see, some of these are on the pro list too, double-edged swords.
        2. Cost – it’s not free obviously, nor is it open source.
          1. Weigh the cost against market opportunity, and/or the additional headcount required to take a different approach.
        3. It’s in Java, vs. Python (snark).
        4. Doesn’t rely on MySQL replication, which has its annoyances but has been under development for a long time.
          1. Nor is there enough instrumentation around lag.  What’s needed is a programmatic way to find this out.
        5. Allows for multiple shard trees.
          1. I’m told many businesses need this as a P0, and that might be true, even for us.
          2. But I’d personally prefer to jump through fire in order to have a single shard tree, if at all possible.
            1. The complexities of multiple shard trees, particularly when it comes to HA, are too expensive to justify unless absolutely necessary, in my humble opinion.
        6. Better monitoring instrumentation is needed, ideally we’d have a programmatic way to determine various states and metrics.
        7. Command line console needs improvement, not all standard SQL is supported.
          1. That said, we’ve managed to get by with it, only occasionally using it for diagnostics.
        8. Can’t do SQL JOINs from between shard trees.  I’ve heard this is coming in a future release.
          1. This can be a real PITA, but it’s a relatively complex feature.
          2. Another reason not to have multiple shard trees, if you can avoid them.
        9. Go-fish queries are very expensive, and can slow performance to a halt, across the board.
          1. We’re currently testing a hot-fix that makes this much less severe.
          2. But slow queries can take down MySQL (e.g. thread starvation), sharding or no.
        10. HA limitations, gaps that are on their near-term roadmap, I think to be released this year:
          1. No support for eventually-consistent writes to global tables means all primaries must be available for global writes.
            1. Async, eventually consistent writes should be available as a feature in their next build, by early October.
          2. Fail-over to secondaries or back to primaries can only happen if both nodes are responding.
            1. in other words, you can’t say via the console:
              1. ‘ignore the unresponsive primary, go ahead and use the secondary’
            2. or:
              1. ‘stand me up a new EC2 instance for a secondary, in this zone/region, sync it with the existing primary, and go back into production with it’
          3. Reliable replication currently requires two nodes to be available.
            1. In other words, if a single host goes down, writes for its shard are disallowed.
              1. In the latest versions, there’s a configuration “switch” that allows for failing-down to primary
                1. But not fail down to secondary.  This is expected in an early Q4 2012 version release.
          4. dbsmanage host must be available.
            1. dbshards can run without it or a bit, but stats/alerts will be unavailable for that period.
          5. Shard 1 must be available for new auto-increment batch requests.
          6. go-fish queries depend on all primaries (or maybe all secondaries via configuration, but not some mix of the two as far as I’m aware) to be available
    2. DYI
      1. I can rattle off the names of a number of companies who have done this, and it took many months longer than our deployment of dbshards (about six weeks, largely due to the schema being largely ready for it).
      2. Given a lot of time to do it, appeals to me even now, but I still wouldn’t go this route, given the pros/cons above.
    3. The latest release of MySQL Cluster may be an option for you, it wasn’t for us back with MySQL 5.0, and not likely now, due to its limitations (e.g. no InnoDB).
    4. AWS RDS was an option for us from the onset, and I chose to manage our own instances running MySQL, before deciding how we’d shard.
      1. For the following reasons:
        1. <li>I wanted ownership/control around the replication stream, which RDS doesn’t allow for (last I looked) for things like:
          1. BI/reporting tools that don’t require queries to be run against secondary hosts.
            1. This hasn’t panned out as planned, but could still be implemented, and I’m happy we have this option, hope to get to it sometime soon.
          2. Asynchronous post-transaction data processing.
            1. This has worked out very well, particularly with dbshards, which allows you to build streaming plugins and do whatever you want when data changes, with that data.
              1. Event-driven model.
              2. Better for us than doing it at the app layer, which would increase latencies to our API.
        2. Concern that the critical foundational knobs and levers would be out of our reach.
          1. Can’t say for sure, but this has likely been a good choice for our particular use-case; without question we’ve been able to see and pull levers that we otherwise wouldn’t have been able to, in some cases saving our bacon.
        3. Their uptime SLAs, which hinted at unacceptable downtime for our use-case.
          1. Perhaps the biggest win on the decision not to use RDS; they’ve had a lot of down-time with this service.
        4. Ability to run tools, like mk-archiver (which we use extensively for data store size management), on a regular basis without a hitch.  Not 100% sure, but I don’t think you can do this with RDS.
        5. CloudWatch metrics/graphing is a very bad experience, and want/need better operational insights to what it provides.  Very glad we don’t depend on CW for this.
      2. All of these reasons have come at considerable cost to us as well, of course.
        1. Besides the obvious host management cycles, we have to manage :
          1. MySQL configurations, that have to map to instance sizes.
          2. Optimization and tuning of the configurations, poor-performance root-cause analysis,
          3. MySQL patches/upgrades.
          4. maybe more of the backup process than we’d like to.
          5. maybe more HA requirements than we’d like to; although I’m glad we have more control over this, per my earlier comment regarding downtime.
          6. maybe more of the storage capacity management than we’d like to.
        2. DBA headcount costs.
          1. We’ve gone through two very expensive and hard-to-find folks on this front, plus costly and often not-helpful, cycle-costing out-sourced DBA expertise.
          2. Currently getting by with a couple of experienced engineers in-house and support from CodeFutures as-needed.
      3. As I’ve seen numerous times in the past, AWS ends up building in features that fill gaps that we’ve either developed solutions for, or worked around.
        1. So if some of the RDS limitations can be worked-around, there’s a good chance that the gaps will be filled by AWS in the future.
        2. But it’s doubtful they’ll support sharding any time soon, there’s too much design and application-layer inter-dependencies involved.  Maybe I’m wrong, that’s just my humble opinion.