Hey! Another blog post about something AWS has that is super trendy! It’s basically a must-read! What a unique topic!
Ya, I know. What a typical programmer blog I have made by my third blog post. 🤷♂️No point in hiding the fact that I am a vanilla programmer dude, I guess. Maybe this will be the differentiator; DynamoDB was f!&$ing complicated to wrap my head around. So I am writing a blog post for the next person with whatever malfunction I have to accelerate their learning experience. It’s a public service. You’re welcome other people who have left their car at school thinking they took the bus that day.
I’m totally a respectable leader and have never written a bug in my life…
Excellent, then this blog post is for you. Let’s talk turkey (DynamoDB). What the heck is it?
Well, it is kind of a document store, kind of a key-value store, a kind of magic box that can scale infinitely. Basically, what it really is, is many things.
The best part about that statement is I am technically not wrong, the best kind of not wrong. Seriously though, DynamoDB (DDB) can be used as simply as somewhere to throw some keys or a config and as complex as your entire data store. It is pretty flexible and efficient in many different use cases.
In my current project, we are investigating using DDB as our exclusive datastore. This means no RDBMS at all. This definitely takes some foresight and time to understand how to do this effectively with DDB in a single table.
Now for the feature presentation
As I mentioned above, it took me a minute to understand how best to use DDB. Probably the most important thing to understand is the primary key.
The Primary Key and Sort Key (ooooooh)
The primary key (pk going forward, I’m paying per letter here) is more than just the index. In DDB, the pk determines the location of the actual data in your system. DDB will hash the pk and use the hash value to determine which “bucket” or shard to store the data.
Why does this matter? Well, if you want to get sets of data together, it makes sense to keep them all in the same bucket. The tricky part is that the pk must be unique, so you can’t use the same pk for different objects or groups of data. How can you leverage this hashing pk to place all the data in the same bucket to accelerate your request?
Enter: The Sort Key
Let’s go through an example. In this example, we will have an organization with locations and products, which are organizations with many locations and few products 🤷♂. Let’s also say that the typical access pattern for this data is an organization’s details, an organization’s locations, an organization’s products, and everything together.
To grab these in groups that are efficient to query in one shot, we can create these objects with the following pk’s and sk’s:
Type | Primary Key | Sort Key |
---|---|---|
Organization Details | organization_guid | profile#{unix_timestamp} |
Product | organization_guid | product#{product_id}#{unix_timestamp} |
Location | organization_guid | location#{location_id}#{unix_timestamp} |
This data structure keeps all of the individual objects grouped together and queryable by the organization but also makes it possible to get everything for an organization in one query. It also would make sense if you wanted to use a pk of organization#product and an sk of product_id#{unix_timestamp} for a product if you liked. That would have the same effect, but getting all the data for the “everything” page would require two or more requests.
Access Patterns (aaahhhhh)
In the previous sections, I mentioned the access pattern briefly for our data. This plays a significant role in how to store your data. To get maximum value out of DynamoDB, you want to get everything in one request using a pk and filtering on the sk. So, when designing your table structure, you will want to know how that data will be obtained. Since, in our contrived example, we had three simple access patterns, we utilized the organization_guid as the pk to make getting data simple. Explicitly using the organization_guid for the pk on all of them made the “everything” page work and would also allow that query to be performant since all of the data would “live” together in the same “bucket”.
Leveraging the product ID as the sk allows us to search on a specific product ID by filtering on the sk, which is still very performant (and reduces cost on the read). You can imagine that the first thing that happens is to get all of an organization’s products, then from that much smaller list, we filter out only the one we want.
I know, I know. That was a lot
I am by no means an expert, and I reserve the right to have someone call me a moron in the comments, but so far, this has been some takeaways I have had using DynamoDB. I have to really dig into things like Global Secondary Indexes and Local Secondary Indexes, which seem super exciting to me, but I have not used enough to speak on. I feel I will be coming back to add to this or make corrections, plus definitely some additional blogs when I learn more about having it in production!
Post Cover Image from: Charlotte Coneybeer