Depth-First Design: The Monkey in the Middle

Choosing a database solution for any application is an extremely difficult, but critical decision. You are deciding the backbone of your application, whats going to be used to capture and store your applications precious data. When viewing all of the options out there, you need to evaluate their strengths and weakness, whether they play into or against the needs of your application. You need to look at your application and attempt to envision the end product. How many users will be using it? Will these users be accessing your data concurrently? Will you have a big fluctuation in the number of concurrent users at a time, say 50,000 users between the hours of 5 -8pm, but only 1,000 users during 6am-11am? How will you handle a large jump of overall users due to the applications increased popularity (hopefully!). In terms of hard disk space, how much data do you expect to store? At what rate will this grow? How complex do you expect your schema to be? Who do you have that can be your database administrator, and what kind of technologies can they manage and administrate based on their current skill set? What are the costs of operating and maintaining the server that will be housing the database? These are just a sample of questions that need to be answered.

Relational databases easily dominate the market. They are the popular choice, and for good reason. They have been proven to be viable solutions for a vast range of applications, from online retail stores to complex AAA MMORPGs. They have great transaction support, providing full ACID qualities. They can handle a high number of users concurrently accessing the same data. They support highly complex SQL queries and data manipulation, allowing developers, designers and business executives to query against their data to build much needed data metrics and analysis, able to answer questions such as “on our big annual 4th of July sale, for years 2003-current, for which item which was available during that date range, was the third best selling in our New York City customer base”, using these reports to further drive their business and sale tactics, or identify inefficiencies and help improve overall sales.

However, relational databases are not perfect. They have their weaknesses, more specifically in the realm of performance, scalability and object-relational impedance mismatch issues. Yet most applications simply choose to work around these imperfections due to a simple case of “the benefits of relational databases greatly outweigh their negative impact on my application”. In the case of massively-multiplayer-online-games (MMOGs), performance, scalability, completely avoiding object-relational impedance mismatch, high quality transaction support and complex SQL reporting are all needed, practically equally so. It’s been said that with regardless of which database solution you choose, you are allowed to only pick two items from the latter list and apparently most developers need highly complex SQL reporting as number one, leaving the second choice up in the air. Historically, however, transaction support tends to be a very close second in terms of desired features.

Cryptic Studios, in the quest for coming up with the perfect database solution, have gone as far as designing and developing their own database management systems, CrypticDB. The CCP Team of EVE Online actually use a single-shard SQL server architecture, and to combat performance issues, have thrown large sums of money into a mind bogglingly beefy server hardware configuration, A.K.A. scaling vertically (more on that later). While throwing money at the problem may be a viable solution to some, for many (start-ups for example) that’s not an option, at least it’s not the best.

From an engineers perspective, relational databases introduce the problem of object-relational impedance mismatch, making it difficult to translate the data that is used in a object-oriented application to store nicely within the relational databases strict structural confinements and incompatible datatypes. In order to store complex data structures, we are forced to develop wrapper classes to do the translation, which if not done perfectly, will have severe impact on performance, possibly so severe that the end user may be affected, like being forced to wait two seconds to loot a green item from a dead zombie. The issue of object-relational impedance mismatch is an extensive one, beyond the discussion of this particular post.

Luckily, we aren’t stuck with just relational database management systems to choose from. Leaving the realm of relational databases introduces the broad range of NoSQL databases. NoSQL are object-oriented based, and in general have the SQL query language omitted (thus the name, NoSQL). There are many different varieties of NoSQL databases, being categorized by their data storage methods. NoSQL databases were developed to primarily solve the issue of scalability, to scale horizontally rather than vertically (CCP, if you remember, decided to scale their database vertically by increasing their hardware configurations). Scaling horizontally means adding more nodes to a system, creating a distributed network. In a distributed environment, data integrity becomes an immediate concern, so the ultimatum of choosing “consistency vs. availability” comes into hand.

NoSQL was also introduced to deal with the issue of object-relational impedance mismatch, being able to handle and store a much wider range of datatypes and structure than a RDBMS could. As an extension, NoSQL also allows handling large volumes of data, quickly retrieving such information, virtually eliminating the need for expensive JOIN operations (which most NoSQL solutions decline to even offer). In essence, NoSQL offers either a high level of data integrity or availability and better compatibility and support for complex and object-oriented data.

When talking NoSQL, we can’t neglect to talk about the CAP theorem, first proposed by Eric Brewer in 2000. The CAP theorem states, in a distributed computing environment, you can optimize for only two of three priorities, Consistency, Availability and Partition Tolerance.

Image Source: http://blog.nahurst.com/visual-guide-to-nosql-systems

When considering walking down the NoSQL path, the needs of your application will need to dictate, the two priorities that you decide to go with in your NoSQL, distributed database solution. Choosing the correct combination is absolutely critical and choosing the wrong combination could very easily have a dire impact on your business. For example, Amazon claims that just an extra one tenth of a second on their response times will cost them 1% in sales. This should really drive home the importance of correctly choosing your applications database needs. No pressure ;)

There’s simply no “best” database solution to any application. It’s a constant battle of tipping the scales. It’s almost guaranteed that regardless of what solution you go with, it’ll have a negative impact somewhere. The trick is to limit that negative impact or at least direct it towards a aspect that can “afford” to be hit by it, where perhaps the developers can “make-up” for the short coming within the application’s design and implementation.

For WordWars, we knew from the beginning we wanted a database solution that was easily scalable, being able to handle 30 users one day, and scale up to 10,000 users the next day (we could only be so lucky to see a large jump of users like that!), with virtually no mediation required. With an increased user base, we also want to maintain the same high level of performance as our user-base fluctuates, while keeping in mind the fact that we have little to no money to scale vertically. We also need to rely heavily on transactions, due to the nature of gameplay and high probability of having many concurrent users playing at the same time, attempting to access the same data. We need consistency so that every user viewing a single board sees the same thing. We also need a certain level of data integrity. For example, we simply need to be able to gracefully handle the use case of two users attempting to play a word on the same board location, at the same time.

Google’s Big Table database solution attempts to play the middle ground. It is NoSQL based, offering a high level of speed and scalability that relational database tend to fail to deliver, but also provides transactions, a common feature standard in relational databases, but often missing from NoSQL databases. Big Table provides full transaction support, using optimistic concurrency. As you can see from the CAP theorem diagram above, Big Table provides Consistency and Partition Tolerance, which satisfies the needs of WordWars nicely. Yes, we are giving up complex SQL reporting. However, Big Table does use a similar query language called GQL, but at this time it’s still very simple and infant in it’s capabilities, which is more then enough to satisfy what we predict to be our reporting needs.

We are also sacrificing Availability, which may appear to be a huge sacrifice given the type of application we are developing. A failed transaction will happen when using Google’s datastore, for various reasons (timeout, concurrent modification issues, etc). It’s inevitable, it’s the nature of the beast. However, we are developing our application to handle these failures gracefully and greatly limit the impact of any transaction failure. It’s a sacrifice we can make-up for in our application design. If we decided to choose Availability over Partition Tolerance, then we would be losing the latter property completely, with virtually no way of recovering it or making up for it in some other way, at least not easily or efficiently.

Through Google’s App Engine service, we could start using Big Table immediately, have a full NoSQL database at our disposal within minutes with virtually zero effort on our part, leaving Google to the administration duties. Most importantly, we get everything completely free. We will only start getting charged once we have regular users and furthermore once we surpass the free quota thresholds. Since we are in the very early stages of development and prototyping, this aspect alone is enough to drive us toward Google App Engine.

I’m not going to go into great detail about the inner workings of Big Table. Instead, in Part II of this multi-part series, I’m going to talk about how we designed and developed our applications back-end to work with Google App Engine, the trials and tribulations of such, what we are doing to prevent Google App Engine quotas from sending us into bankruptcy, as well as how we are handling inevitable transaction failures. Most importantly, I will discuss how we are keeping our application as platform independent as possible, allowing us to migrate off of Google App Engine with relative ease, if that time were to ever arise.

Depth-First Design

Friday, January 13, 2012

The Monkey in the Middle

No comments:

Post a Comment