Postgresql: denormalize JSONB with nested key/values and arrays

Question

If needing to denormalize a nested JSONB structure into a flat table, can it be realistically done in a single SELECT if I'm mixing key/values and arrays like the following? I have table containing two columns - repository and data and an example data column value is like the following:

{
    "branches": {
        "master": {
            "type1": {
                "files": [{
                    "filename": "a.txt",
                    "contents": [{
                        "identifier": "foo",
                        "value": "bar"
                    }]
                }]
            }
        }
    }
}

My goal is to have a query that flattens this structure e.g. into the following CSV pseudocode:

repository,branch,type,filename,identifier,value
a/b,master,type1,a.txt,foo,bar

I have kept the JSON example as simple as possible, but would ultimately end up with around a million lines.

Although the need for a single data JSON column is fixed, the format of it can be refactored if it would make the denormalization easier or more efficient. e.g. instead of branches and types being keyed off their unique identifiers, these could be converted into arrays with identifiers instead.

Where does the column repository come from in the output? Btw: you are actually normalizing the de-normalized data you have. — user330315
– user330315, Commented Jul 2, 2020 at 6:22
Can you please fix the JSON? It's invalid because the number of { braces doesn't match the number of } braces. And the contents array isn't closed properly. Is contents really an array? If yes what if there are multiple identifiers and values in there? Should they be shown as rows or columns in the output? — user330315
– user330315, Commented Jul 2, 2020 at 7:02

user330315 · Accepted Answer · 2020-07-02 07:15:00Z

Assuming the elements of contents array should be returned as rows and a valid JSON structure like this:

{
  "branches": {
    "master": {
      "type1": {
        "files": [{"filename": "a.txt","contents": [{"identifier": "foo","value": "bar"}, {"identifier": "a2", "value": "a value"}]},
                  {"filename": "b.txt","contents": [{"identifier": "id2","value": "something else"}]}]
      }
    }
  }
}

Then you can use this:

select b.branch, 
       t.type, 
       f.file ->> 'filename' as filename, 
       c.content ->> 'identifier' as identifier, 
       c.content ->> 'value' as value
from the_table tt
  cross join jsonb_each(tt.the_column -> 'branches') as b(branch, branch_value)
  cross join jsonb_each(b.branch_value) as t(type, type_value)
  cross join jsonb_array_elements(t.type_value -> 'files') as f(file)
  cross join jsonb_array_elements(f.file -> 'contents') as c(content)

With the corrected JSON above, this returns:

branch | type  | filename | identifier | value         
-------+-------+----------+------------+---------------
master | type1 | a.txt    | foo        | bar           
master | type1 | a.txt    | a2         | a value       
master | type1 | b.txt    | id2        | something else

If contents isn't actually an array, so something like this:

{
  "branches": {
    "master": {
      "type1": {
        "files": [{"filename": "a.txt","contents": {"identifier": "foo","value": "bar"}},
                  {"filename": "b.txt","contents": {"identifier": "id2","value": "something else"}}]
      }
    }
  }
}

Then you can use this query:

select b.branch, t.type, f.file ->> 'filename' as filename, f.file #>> '{contents, identifier}' as identifier, f.file #>> '{contents,value}' as value
from the_table tt
  cross join jsonb_each(tt.the_column -> 'branches') as b(branch, branch_value)
  cross join jsonb_each(b.branch_value) as t(type, type_value)
  cross join jsonb_array_elements(t.type_value -> 'files') as f(file)

which returns:

branch | type  | filename | identifier | value         
-------+-------+----------+------------+---------------
master | type1 | a.txt    | foo        | bar           
master | type1 | b.txt    | id2        | something else

Collectives™ on Stack Overflow

Postgresql: denormalize JSONB with nested key/values and arrays

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related