Only interested in managing JSON as relational data in PostgreSQL? Skip to âRelational views over JSONâ
If your small-medium-sized company is dependent on cloud services or SaaS products, youâve more than likely felt constrained when querying the data, whether you need it for business intelligence, order-flows, dashboards or something entirely else. For example, if your company uses WooCommerce or Shopify, thereâs no easy way to answer complex queries like âaverage order price in product category x bought by women between 20 and 25 y/o in the state of yâ. Add to this that youâre entirely dependent on the speed and capabilities of the API which means waiting for numerous API requests to finish. This is neither fun nor efficient.
In most cases, storing and querying JSON in a NoSQL database like MongoDB does the job. However, if you need to store and query plain JSON in a relational database, you might have a dreadful experience. It is possible to , but you will deviate from traditional SQL syntax and if youâre using ORMs you shouldnât expect great support.
These are just a few of the reasons you might want to format JSON to tabular data instead of storing it plainly. Depending on the scenario, this can be quite a task. One way to solve this is the tedious task of building an ETL flow that retrieves the data, transforms it, and finally inserts or updates existing data across multiple tables. Especially flattening the JSON to individual columns and casting types in the transformation step can be quite cumbersome.
However, there is another option. In short, one can approach this problem by storing the data in a two-column table in its native format, JSON, along with a unique identifier and, as we shall see, easily create relational views over the JSON. To mirror the API to a local database you build a tool that performs a dump of the entire API and continually keeps the data synchronized using webhooks or periodic API calls for modified data. New entries are simply inserted whereas updated fields on existing entries are easily merged. Building such a tool is a relatively small task and it certainly pays off in the long term.
Generally speaking, there are multiple benefits to this approach. First of all, you have an incrementally updated on-premise database which you can instantly query, even complex queries. Additionally, an on-premise database can simplify retrieving data for miscellaneous internal tools and integrations. If such tools become dependent on the database, you also reduce the API contact surface which brings the benefit of only keeping one place updated to specifications and greatly lowers the amount of API requests.
Letâs assume you synchronize orders from a JSON API to PostgreSQL. The API might respond with a JSON document like the following:
{
"order_id": "123456789",
"order_date": "2024-02-25T12:34:56Z",
"total_amount": 149.99,
"currency": "USD",
"status": "processing",
"payment_method": "Credit Card",
"receiver_details": {
"name": "John Doe",
"address": {
"street": "123 Elm Street",
"city": "Springfield",
"state": "IL",
"postal_code": "62704",
"country": "USA"
},
"phone": "+1234567890",
"email": "[email protected]"
},
"items": [
{
"item_id": "001",
"name": "Widget A",
"quantity": 1,
"price_per_item": 49.99,
"total_price": 49.99
},
{
"item_id": "002",
"name": "Gadget B",
"quantity": 2,
"price_per_item": 50.00,
"total_price": 100.00
}
],
"shipping_method": "Standard",
"estimated_delivery": "2024-03-02",
"notes": "Please leave the package at the front door."
}
Looking at the data, we see potential for at least three different tables. We need one for parent orders that stores information such as status, currency, order date, shipping method, notes etc. Since we have an array items, we can also create a table for order items. Finally, we also want a table for customers which stores email. phone, address etc.
To get started, we simply create a table with the order ID as the primary key and a column for JSON. We will opt for the JSONB data type, a binary format that offers numerous advantages, with flexibility and speed being among the most significant. For instance, using JSONB allows PostgreSQL to support creating Generalized Inverted Indexes (GIN), which can significantly enhance the performance of our queries. This makes JSONB an ideal choice for applications that require efficient data manipulation and retrieval.
CREATE TABLE public."Orders" (
"OrderId" SERIAL PRIMARY KEY,
"OrderJson" JSONB NOT NULL
);
At this point we can create our relational views over the JSON. We start by creating the view for parent orders.
CREATE OR REPLACE VIEW orders AS
SELECT
o."OrderId" as order_id,
o."OrderJson"->>'order_date' AS order_date,
o."OrderJson"->>'total_amount' AS total_amount,
o."OrderJson"->>'currency' AS currency,
o."OrderJson"->>'status' AS status,
o."OrderJson"->>'payment_method' AS payment_method,
o."OrderJson"->>'shipping_method' AS shipping_method,
o."OrderJson"->>'estimated_delivery' AS estimated_delivery,
o."OrderJson"->>'notes' AS notes
FROM
public."Orders" o;
This view results in the following table
| order_id | order_date | total_amount | currency | status | payment_method | shipping_method | estimated_delivery | notes |
|---|---|---|---|---|---|---|---|---|
| 123456789 | 2024-02-25T12:34:56Z | 149.99 | USD | processing | Credit Card | Standard | 2024-03-02 | Please leave the package at the front door. |
| 234241225 | 2024-02-26T14:56:32Z | 299.98 | USD | shipped | PayPal | Express | 2024-03-01 | Call before delivery. |
| 349983123 | 2024-02-27T16:18:45Z | 450.00 | USD | delivered | Bank Transfer | Overnight | 2024-02-28 | Leave at back door. |
Similarly, we can create a view for a table consisting of all the items related to the orders. Notice that we can easily cast types from string to float, integers etc.
CREATE OR REPLACE VIEW order_items AS
SELECT
o."OrderId" as order_id,
item->>'item_id' AS item_id,
item->>'name' AS name,
(item->>'quantity')::int AS quantity,
(item->>'price_per_item')::numeric ::float AS price_per_item,
((item->>'quantity')::int * (item->>'price_per_item')::numeric) AS total_price
FROM
public."Orders" o,
jsonb_array_elements(o."OrderJson" -> 'items') item;
| order_id | item_id | name | quantity | price_per_item | total_price |
|---|---|---|---|---|---|
| 123456789 | 001 | Widget A | 1 | 49.99 | 49.99 |
| 123456789 | 002 | Gadget B | 2 | 50.0 | 100.00 |
| 234241225 | 003 | Product C | 1 | 99.99 | 99.99 |
| 234241225 | 004 | Product D | 2 | 100.0 | 200.00 |
| 349983123 | 005 | Accessory E | 3 | 150.0 | 450.00 |
Finally, we also want to store the customers related to orders as well
CREATE OR REPLACE VIEW customers AS
SELECT
o."OrderId" as order_id,
o."OrderJson"->'receiver_details'->>'name' AS name,
o."OrderJson"->'receiver_details'->'address'->>'street' AS street,
o."OrderJson"->'receiver_details'->'address'->>'city' AS city,
o."OrderJson"->'receiver_details'->'address'->>'state' AS state,
o."OrderJson"->'receiver_details'->'address'->>'postal_code' AS postal_code,
o."OrderJson"->'receiver_details'->'address'->>'country' AS country,
o."OrderJson"->'receiver_details'->>'phone' AS phone,
o."OrderJson"->'receiver_details'->>'email' AS email
FROM
public."Orders" o;
| order_id | name | street | city | state | postal_code | country | phone | |
|---|---|---|---|---|---|---|---|---|
| 1 | John Doe | 123 Elm Street | Springfield | IL | 62704 | USA | +1234567890 | [email protected] |
| 2 | Jane Smith | 456 Oak Road | Anytown | CA | 90210 | USA | +1234567891 | [email protected] |
| 3 | Alex Johnson | 789 Pine Street | Metropolis | NY | 10001 | USA | +1234567892 | [email protected] |
At last, we ready to write queries using our new views. For example, letâs say we wanted a breakdown of total sales by each product:
SELECT
item_id,
name,
SUM(quantity) AS total_quantity_sold,
SUM(total_price) AS total_sales
FROM
order_items
GROUP BY
item_id, name
ORDER BY
total_sales DESC;
| item_id | name | total_quantity_sold | total_sales |
|---|---|---|---|
| 570 | Hammer | 34 | 2,656.61 |
| 568 | Saw | 29 | 2,588.41 |
| 692 | Nails | 31 | 2,534.99 |
| 306 | Tape measure | 28 | 2,430.26 |
| 309 | Ladder | 34 | 2,334.26 |
Unfortunately, this method comes with a slight downside as this type of PostgreSQL view is query/read-only If you really need to, you might be able to work around this by creating triggers, but itâs a rather intricate process which is against the spirit of the method Iâve outlined. Of course, youâre also able to update JSON the usual way. Still, it shouldnât pose an actual problem as the database isnât the source of truth and shouldnât be treated as such. Moreover, even if the database was the master, it still shouldnât be changed directly but through the API to ensure integrity.
The performance of the approach outlined in this post should be fine. Of course, this heavily depends on the structure of the JSON, the number of entries and the queries. If you experience unacceptable speeds however, you can create a materialized view for each view.
CREATE MATERIALIZED VIEW orders_mv AS
SELECT * FROM orders;
CREATE MATERIALIZED VIEW order_items_mv AS
SELECT * FROM order_items;
CREATE MATERIALIZED VIEW customers_mv AS
SELECT * FROM customers;
This comes at the cost of freshness of course, but in some cases itâs an acceptable trade-off.
2/28/2024