There are a number of wonderful blog posts here on the SAP Community talking about how you can get started with SAP HANA Cloud Document Store, and what you can do with it
For this blog post, I wanted to meet absolute beginners where they are:
- Introducing the design considerations behind using the more familiar Relational (Row and Column) Stores
- Discussing where we might run into challenges when using Relational Stores
- Looking at where a Document Store might help
- Explaining at a high-level what SAP HANA Cloud Document Store is
This blog post is intended to be a high level introduction – it won’t address all design considerations, and will simplify where able. Those looking for more in-depth technical articles can find some recommendations under Other Resources at the end of this post
Relational Stores Recap
When working with a database using Relational Stores, we want to start by deciding how we’ll organise and structure our data. This structure is called a Data Model
To recap a very simplified Relational Data Model, let’s look at how we might store information about a book
You can describe this book in many other ways – it may be brown, it may be sold in a hardcover, and it may be 388 pages long
We only want to store information that will be useful to our business. For our example, the list above will be enough
With an understanding of which information we care about, we know what we’ll need to store in a Database Table. Someone looking at this problem for the first time might sketch out a table as below
A book’s ISBN is unique – each ISBN refers to exactly one published book. This uniqueness lets us use the ISBN as something called a Primary Key (indicated by the key symbol)
By looking at our Primary Key Column (in this case ISBN), we can tell which book each row of data is about
Without knowing better we deploy our table in a database and people begin using it. Before long, our colleagues come to us with issues
Challenges with “One Central Table”
A colleague wants to search for all books written by Cameron Swift. Except it turns out there’s more than one person in the world called Cameron Swift
To help tell them apart, we add an Author ID that is unique to each Author. While we’re at it, we give our Publishers a Publisher ID too
A second colleague wants to enter a book with two Authors.In theory, we could add another set of Author ID and Name columns. Unfortunately, that won’t help us when we need to enter a book with three Authors, let alone four or more Authors
To help us store this data, we start by separating the data into multiple tables. We move the Author and Publisher data out into their own tables
As we can see above, the Author ID and Publisher ID also uniquely identify each row in our Authors and Publishers tables. These are the Primary Keys for those tables, in the same way that ISBN is for our Books table
“Relational” Data
There’s no point storing our data across different tables unless we know how they’re linked so we can get it back out again
This is where the term Relational Data comes from – in order to make sense of the data, we need to understand our tables and the Relationships between those tables
The way we represent this relationship is by adding the Primary Key from one table as a column in another table
As an example, we’re going to link the Publishers table to our Books table by including the primary key for our Publisher table (the Publisher ID) as a column in our Books table
By matching up the Publisher ID columns from our Books and Publishers tables, we can now easily find out the publisher for each book. Now that we understand how this works for Publishers, we can look at our Authors
Including Multiple Authors
When we think about including the Author IDs inside our book table, it may sound like we have the same problem we started with. As we showed earlier, we can’t include multiple Author IDs in our Books table, because there can be any number of Authors (with the exception of zero, because books don’t write themselves)
We need to create a third table to link our Books and Authors tables. We’ll refer to this as a Linking Table. This table is a list of ISBNs and associated Author IDs
Because a book can have many authors and an author can write many books, neither column here is unique. This is by design
Starting with the ISBN for any given book, we can then look in our Linking Table. Each row with a matching ISBN points to an author of that book.
Using this approach means we don’t have to make any changes to the structure of our tables as we add multiple authors. For each additional author, we just add another row to our Linking Table
Our Relational Data Structure
Our full data structure for storing this simplified example would look something like this:
Challenges with Relational Tables
Working through these challenges makes something very clear:
When new business requirements arise, these could require changes in our underlying data model. As we saw above: even with a simplified scenario – changing data models is not a trivial task
We can’t change the set of columns for just a single row – all rows within a table use the same set of columns
Depending on the scope of the change required, we could be looking at adding columns to a single table (like we did with Author ID and Publisher ID), or more complex changes requiring re-mapping our data across a number of tables
In this case, we’ll likely need to update the logic that writes to and reads from these tables – everywhere we use them
Does this mean that storing data in Relational Stores is a bad idea? Absolutely not. There are benefits to storing data in this way – we often want to enforce a known, fixed schema and we want any variation to be considered and approved. This is where Relational Data storage might be the best fit
Document Stores
On the other hand, there are times where the fixed approach of predefined columns and tables isn’t the best way to store our data
Maybe we want to store nested data – for example, our books where we can have any number of authors – without splitting the data between different tables
Perhaps we’re receiving data from Cloud APIs: if we’re getting data from Ariba’s APIs, the data isn’t returned to us as entries ready to be stored in a table. Instead, we’ll likely get results formatted in something called JSON (JavaScript Object Notation)
JSON Documents
The term JSON Document is used to describe either JSON Objects or JSON Arrays. We’re only going to describe JSON Objects in this blog post
Inside our JSON Objects, we have a number of Key/Value Pairs. For example, [Book] Title is a Key, and the Value for this might be Document Storage. Let’s revisit our earlier example
Above we can see how the properties of a book could be modeled in either a Table or JSON Document. Our JSON Document is an object within curly braces { }, made from a number of Key/Value Pairs
The JSON Document is self-describing. That is – if we retrieve the book data, we can check the keys to tell what each value represents. If we retrieve a single row of data from a table, the column names aren’t included by default
With our small number of columns, this is workable. Are we going to be able to remember fifty columns? A hundred? Nobody has that kind of patience
JSON Collections
Because our JSON Documents store the structure alongside the data, we can easily vary the structure between entries. Unlike rows stored in a table, JSON Documents are stored in something called a JSON Collection
To help us understand this, we can picture each JSON Document as a physical sheet of paper with our data written on it
Working with a blank piece of paper, we can write our data exactly as it is – we don’t need to stick to a fixed format
Loose sheets of paper aren’t great for storage and retrieval, so we store our related documents in a Manila Folder. This folder is our JSON Collection
When you store a JSON Document, you put it in a related JSON Collection. When you want to access the Document later, you retrieve it from the same Collection
Using SAP HANA Cloud Document Store, we can do this using a modified SQL Syntax. But that’s a story for another, more technical blog post
SAP HANA Cloud Document Store is an additional feature that can be activated within an existing SAP HANA Cloud instance – for more information you can read the blog post linked under Next Steps below
Conclusion
In this blog post we’ve discussed Relational Storage using Tables, including the benefits and limitations of working with a fixed, known data structure
We’ve also discussed how we can work with data that doesn’t conform to a fixed structure using JSON Documents, stored inside JSON Collections
As always, the practical reality is more complex than the theory. Hopefully this has been a useful high level introduction to when you might use a Document Store like SAP HANA Cloud Document Store
I welcome any questions or comments below
Next Steps
For more Practical Application:
Maybe you’d like to find out how you can activate SAP HANA Document Store within SAP HANA Cloud
Perhaps you’ve already activated SAP HANA Document Store and want to get some experience with inserting and selecting JSON Documents
For more Theory:
You may have heard the term NoSQL Databases before. Document Store is a type of NoSQL Database, but isn’t the only type. Perhaps you’d like to learn about another type of NoSQL Database, the Graph Database
SAP HANA Database as a Graph Store – Introduction by Poornapragna Malamandi Suresh
Other Resources
Enabling JSON Document Store by Cameron Swift
HANA DocStore First Steps by Cameron Swift
The Small JSON Document Store Cookbook by Mathias Kemeter
Spotlight: SAP HANA Cloud JSON Document Store by Laura Nevin
The SAP HANA JSON Document Store – Introduction (Part 1) by Kai Mueller
Ariba Analytics using SAP Analytics Cloud, Data Intelligence Cloud and HANA DocStore – Part 1 by Cameron Swift
Note: While I am an employee of SAP, any views/thoughts are my own, and do not necessarily reflect those of my employer