· 6 years ago · Apr 30, 2020, 06:46 AM
1Filter DynamoDB records by nested structure with
2query or scan
3
4
5DynamoDB is the Amazon take on fully managed NoSQL database services that provide fast and predictable performance with seamless scalability. It is a great solution for projects that require fast response of the data and easy integration with other services like ElasticSearch.
6
7By essence DynamoDB provides a couple of ways to filter the returned results. First let's look at the properties that can be provided as a criteria.
8
9On creation of a Table in DynamoDB you have to specify a pk(partition key) by which you will have a uniquely identifiable record in the database.
10
11The secord parameter by which you can clean the unneeded data is the sk(sort key). The name states that is used for sorting purposes but in fact it can be also used as an unique identifier to filter the outgoing from the db data.
12
13The third option that DynamoDB provides to its users is the gsk(global secondary key). You can make as many global secondary keys as you need and this is a great way to create criterias for filtration. The gsk's also provide the capability to specify nested structures as the values. So let's say we have the following structure in a column of the table named "rentor":
14
15"rentor": {
16 "id": {N: 1234}
17 "name": {S: "John Doe"}
18 "rented_on": {N: 1234123421}
19}
20
21The values in the JSON object are "marshalled" as DynamoDB likes to call them, this way the db know what type of data is stored inside the column, but let's forget about that for the purpose of this example, and imagine that the object is saved as a plain JSON.
22
23"rentor": {
24 "id": "1234"
25 "name": "John Doe"
26 "rented_on": "1234123421"
27}
28
29To make the "id" of this object an gsk we have to specify it when we create our resources with aws-cdk. It will look something like this:
30
31this.table.addGlobalSecondaryIndex({
32 indexName: "index-rentor",
33 partitionKey: {name: "rentor.id", type: dynamodb.AttributeType.STRING},
34});
35
36The this.table is an instance of an "DynamoTableWithElasticSearchStream" class that creates and defines the DynamoDB table. Using this instance we execute the method addGlobalSecondaryIndex by passing the name that the index will have and binding it with the nested structure that we will use as a criteria.
37
38That's all you need to define the gsk.
39
40Now for the interesting part! How do we filter by it?
41
42DynamoDB provides two main mechanisms to filter the returned data to the client, those methods are query and scan. They both have their usage cases, and I will explain which one to use for what now.
43
44The query method is the better performer compared to the scan method. The reason for that lies in the way they work and how they filter in the background of our application. Query is searching for the value by the passed key and is returning the result as soon as it finds a match. This technique gives the query a great advantage in execution time and money wize for the pocket of the consumer.
45
46You can use scan for smaller tables that have a reasonable amount of records. This statement should be taken really in mind, because scan gets the full data of the table and then starts to filter by some criteria. This will for sure impact the budget of the client and the execution time. Despite that scan has some advantages compared to query and for example one of them is nested querying of data. Yes! Query doesn't allow it. Query allows to do your search on a flat structure like a string, number, boolean but not on an object. Scan on the other hand doesn't even have a limit on the dept of that object you just have to specify all of the object keys in the valid order. On top of that scan allows to use the "contains" keyword to make a full text body search, neat!
47
48So sorry, but query is out of the question for this topic.
49
50How do we implement the search by this gsk that we've created previously. Lets explain the params that have to be passed and how the method works. Example request:
51
52const params: ScanInput = {
53 TableName: "tableName",
54 FilterExpression: "#rentor.#id = :rentorId",
55 ExpressionAttributeNames: {
56 "#rentor": "rentor",
57 "#id": "id"
58 },
59 ExpressionAttributeValues: {
60 ":rentorId": "abcd-efgh-abcd-efgh" // UUID id from the database
61 }
62};
63
64try {
65 const queryOutput = await DynamoDB().scan(paramsQuery).promise();
66 return queryOutput.Items;
67} catch (e) {
68 throw e;
69}
70
71So what just happened? First we declared a variable named params in which we constructed our input parameters. We started of with the table that we will be searching, after that we created our filter selection. The filter string is binded in the "ExpressionAttributeNames" and "ExpressionAttributeValues" properties. As you can see in "ExpressionAttributeNames" we have added two key value pairs, first is the column name and then the nested "id" of the object. The "ExpressionAttributeValues" are the values we have passed in our API request for the data.
72
73After we have build the requirements for the filtration we just need to pass them to the scan method of DynamoDB and return the "Items" it has found.
74
75This method comes with its pro's and con's. Yes! we can filter by nested object attributes but we have to be careful of the table size we want to filter. On bigger tables it will be a better solution to create a separate column in the db with the id and query the results and not do the scan technique we just covered.
76
77Hope this helps you on your quest to greatness!