Introduction to DynamoDB | ScyllaDB

DynamoDB Design Principles

As stated in Werner Vogel’s initial blog about DynamoDB, the database was designed to build on top of a “core set of strong distributed systems principles resulting in an ultra-scalable and highly reliable database system.” It needed to provide these attributes:

  • Managed — provided ‘as-a-Service’ so users would not need to maintain the database
  • Scalable — automatically provision hardware on the backend, invisible to the user
  • Fast — support predictive levels of provisioned throughput at relatively low latencies
  • Durable and highly available — multiple availability zones for failures/disaster recovery
  • Flexible — make it easy for users to get started and continuously evolve their database
  • Low cost — be affordable for users as they start and as they grow

The database needed to “provide fast performance at any scale,” allowing developers to “start small with just the capacity they need and then increase the request capacity of a given table as their app grows in popularity.” Predictable performance was ensured by provisioning the database with guarantees of throughput, measured in “capacity units” of reads and writes. “Fast” was defined as single-digit milliseconds, based on data stored in Solid State Drives (SSDs).

It was also designed based on lessons learned from the original Dynamo, SimpleDB and other Amazon offerings, “to reduce the operational complexity of running large database systems.” Developers wanted a database that “freed them from the burden of managing databases and allowed them to focus on their applications.” Based on Amazon’s experience with SimpleDB, they knew that developers wanted a database that “just works.” By its design users no longer were responsible for provisioning hardware, installing operating systems, applications, containers, or any of the typical devops tasks for on-premises deployments. On the back end, DynamoDB “automatically spreads the data and traffic for a table over a sufficient number of servers to meet the request capacity specified by the customer.” Furthermore, DynamoDB would replicate data across multiple AWS Availability Zones to provide high availability and durability.

As a commercial managed service, Amazon also wanted to provide a system that would make it transparent for customers to predict their operational costs.

DynamoDB Data Storage Format

To manage data, DynamoDB uses hashing and b-trees. While DynamoDB supports JSON, it only uses it as a transport format; JSON is not used as a storage format. Much of the exact implementation of DynamoDB’s data storage format remains proprietary. Generally a user would not need to know about it. Data in DynamoDB is generally exported through streaming technologies or bulk downloaded into CSV files through ETL-type tools like AWS Glue. As a managed service, the exact nature of data on disk, much like the rest of the system specification (hardware, operating system, etc.), remains hidden from end users of DynamoDB.

DynamoDB Query Operations

Unlike Structured Query Language (SQL), DynamoDB queries use a Javascript Object Notation (JSON) format to structure its queries. For example, as seen on this page), if you had the following SQL query:

SELECT * FROM Orders
INNER JOIN Order_Items ON Orders.Order_ID = Order_Items.Order_ID
INNER JOIN Products ON Products.Product_ID = Order_Items.Product_ID
INNER JOIN Inventories ON Products.Product_ID = Inventories.Product_ID
ORDER BY Quantity_on_Hand DESC

In DynamoDB you would redesign the data model so that everything was in a single table, denormalized, and without JOINs. This leads to the lack of deduplication you get with a normalized database, but it also makes a far simpler query. For example, the equivalent of the above in a DynamoDB query could be rendered as simply as:

SELECT * FROM Table_X WHERE Attribute_Y = "somevalue"

It is important to remember that DynamoDB does not use SQL at all. Nor does it use the NoSQL equivalent made popular by Apache Cassandra, Cassandra Query Language (CQL). Instead, it uses JSON to encapsulate queries.

{
    TableName: "Table_X"
    KeyConditionExpression: "LastName = :a",
    ExpressionAttributeValues: {
        :a = "smith"
    }
}

However, just because DynamoDB keeps data in a single denormalized table doesn’t mean that it is simplistic. Just as in SQL, DynamoDB queries can get very complex. This example provided by Amazon shows just how complex a JSON request can get, including data types, conditions, expressions, filters, consistency levels and comparison operators.