If you’re interested enough to read this article, you already know what JSON is. It’s been around for about 20 years now, but one thing you may be surprised to hear about is that it wasn’t standardized until 2013. [Mind blown]. It was a well-established format that was widely used for years leading up to it, so, who knew? Any API worth its salt will give you the option of returning your data in JSON, in fact, it’s been the de facto data format for RESTFUL services for as long as I care to remember. Who wants to work with that bloated and antiquated XML format anyway?

My job here at Zion & Zion is all about data architecture, warehousing and the ETL/ELT (extract, transform, load) process that supports it. Much to my delight, Snowflake, our data warehouse provider, makes working with JSON objects an incredibly pleasant experience. You have the option to skip the whole transform process entirely and just load in the raw JSON as is.

The magic of the Snowflake VARIANT

For all our examples, we’re going to use the Snowflake VARIANT field type to store our JSON objects. It optimizes the object behind the scenes and enables you to operate on your JSON as a native, semi-structured object. In my experience, it’s often the case that I only need a single column named “JSON” in a table that will store the entire response from a third-party API. Later, I can use dot syntax within my SQL code to crawl the object to operate on the nested nodes of data as I see fit. Let’s get into some examples so you can see what I mean for yourself.

Let’s use the following JSON string in the examples below with the variable name of $JSON_STRING


‘{
 id: 123,
 name: "tester",
 array_val: ["zero", "one", "two"], 
 date: "2020-01-31",
 nested: {
  label: "Nested Value",
  value: 10
 }
}’

Create new a new JSON record

Let’s start with some basics. Here we take our JSON string and insert it into Snowflake using the PARSE_JSON function. It looks very similar to your standard INSERT command, but instead of VALUES we’re using SELECT in combination with a Snowflake helper function PARSE_JSON to convert the JSON string into a JSON object that the VARIANT type can accept.


INSERT INTO "DB_NAME"."SCHEMA_NAME"."TABLE_NAME" (JSON) 
SELECT PARSE_JSON($JSON_STRING);

Update a value within a JSON object

Sometimes you might want to only update a singular attribute of a JSON object. The following uses the Snowflake helper function OBJECT_INSERT to overwrite the JSON:nested.value attribute from 10 to 20.


UPDATE "DB_NAME"."SCHEMA_NAME"."TABLE_NAME" 
SET JSON = OBJECT_INSERT(JSON:nested, 'value', 20, TRUE) 
WHERE JSON:id::number = 123;

Update an entire JSON object

Other times it may be easiest to update the entire JSON object instead. Here’s an example of replacing the entire object using the unique id as the key.


UPDATE "DB_NAME"."SCHEMA_NAME"."TABLE_NAME"   
SET JSON = PARSE_JSON($JSON_STRING)
WHERE JSON:id::number = 123;

Delete an entire JSON record

Along those same lines, you can delete based on JSON attributes as well. The attribute can be of any data type you wish and at any level of depth within the object. Below is an example that uses the unique id as the key to delete the entry.


DELETE FROM "DB_NAME"."SCHEMA_NAME"."TABLE_NAME"  
WHERE JSON:id::number = 123;

Querying of JSON as tabular data

You can query your JSON object much like you would a relational database to return tabular output of rows and columns. Below is an example that coerces each value into particular data types of my choosing by including “::[type]” after the path. I then provide an alias for my own convenience. The data types you specify in the SELECT will also effect behavior in further query logic, for example: coercing a field to a number will allow you to use it with mathematical operators etc. Since you’re dealing with semi-structured data, it’s wise to be explicit with your data type definitions.


SELECT 
JSON:id::number AS "id",
JSON:name::string AS "name",
JSON:date::date AS "date",
JSON:array_val::array AS "array_values",
JSON:nested.label::string AS "nested_label",
JSON:nested.value::number AS "nested_value"
FROM "DB_NAME"."SCHEMA_NAME"."TABLE_NAME" 
WHERE "id" = 123
AND "date" BETWEEN '2020-01-01' AND '2020-02-01';

Flattening JSON into columns

There will inevitably be times when you want to create unique rows of data out of an array of data found within your JSON object. The example below uses the FLATTEN function to expand the array into multiple rows and feeds them into the query as if it were a table data source by using the TABLE function as the alias aptly named “array”. Finally, the SELECT statement then taps uses that alias to access the values of each row. To help visualize the output, I have included a screenshot of the result from Snowflake.


SELECT 
JSON:name::string AS "name", 
"array".VALUE::string AS "array_values"
FROM 
"DB_NAME"."SCHEMA_NAME"."TABLE_NAME", 
TABLE( FLATTEN(INPUT => JSON:array_val) ) AS "array"
;

UPSERT

Snowflake provides the ability to do a single command update or insert (UPSERT) through the MERGE command. The following example takes an array of JSON objects within a file saved to a Snowflake stage and either performs an update or an insert command based on whether the unique field (id) matches.

Note: This requires a Snowflake file format named JSON with the “strip outer array” option enabled, a “Snowflake Managed” stage to be named JSON_STAGE_NAME and a “file_name.json” file to be uploaded to that stage. The contents of the file is expected to be an array of objects similar to the format of our example JSON we started with.


MERGE INTO
"DB_NAME"."SCHEMA_NAME"."TABLE_NAME" AS "foo" 
USING (
  SELECT 
  $1 JSON
  FROM @JSON_STAGE_NAME/file_name.json
  FILE_FORMAT = (FORMAT_NAME = JSON)
  ON_ERROR = 'skip_file'
) "bar"
ON "foo".JSON:id::number = "bar".JSON:id::number
WHEN MATCHED THEN
 UPDATE SET "foo".JSON = "bar".JSON 
WHEN NOT MATCHED THEN
 INSERT (JSON) SELECT PARSE_JSON("bar".JSON)
;

Game Changing Approach: “Just Feed It All In”

The ability to feed in the entire JSON response from your third-party API straight into your data warehouse easily and without the downsides / limitations we’re used to is a game-changer. You will find you’re able to turn ideas into real proof-of-concept applications by avoiding the nuisance of piecemealing the exact dimensions and metrics you need to support an entire strategy and just feed everything in. Snowflake is optimized to traverse the entire object stack for you as you make your queries, so if the data is present in the API, you know you have it in the warehouse.

Another benefit from this approach is that you get to avoid process heavy scripting required to manipulate the data received before you import it. It’s as simple as:

  1. Make the request from the API
  2. Write the response to a file
  3. Feed the file into your Snowflake Managed stage
  4. Ingest it into the warehouse
  5. Have a daiquiri

I plan on writing more Snowflake-related content in the future because of how much my workday revolves around this majestic creature, and to sing its praises for how much time it saves me. I’m a potential fan boy in the making.