JSON file not loading into redshift

Question

I have issues using the copy command in redshift to load in JSON objects, I am receiving a file in the below JSON format which fails when attempting to use the copy command, however when I adjust the json file to the bottom it works. This is not an ideal solution as I am not permiited to modify the JSON file

this works fine :

{
   "id": 1,
   "name": "Major League Baseball"
}
{
   "id": 2,
   "name": "National Hockey League"
}

This does not work (notice the extra square brackets)

[
{"id":1,"name":"Major League Baseball"},
{"id":2,"name":"National Hockey League"}
]

this is my json path

{
    "jsonpaths": [
        "$['id']",
        "$['name']"
    ]
}

I am attempting to copy some JSON into Redshift, the trouble is the JSON contains "[" and commas (see above JSON) in areas breaking the loading in of this data, I could write a script to get around this issue but ideally i want to avoid that — godzilla
– godzilla, Commented Jun 16, 2016 at 8:21
Each row's worth of data needs to be a separate json object like in the first example. The jsonpaths specification is a list, but it's a list of column locators within one object. To support the second example Redshift would have to parse vast files before it could decide what values were part of a row. — systemjack
– systemjack, Commented Jun 21, 2016 at 6:06

Milan Cermak · Accepted Answer · 2017-07-27 10:41:06Z

1

The problem with the COPY command is it does not really accept a valid JSON file. Instead, it expects a JSON-per-line which is shown in the documentation, but not obviously mentioned.

Hence, every line is supposed to be a valid JSON but the full file is not. That's why when you modify your file, it works.

answered Jul 27, 2017 at 10:41

Milan Cermak

8,1043 gold badges48 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user433342 · Accepted Answer · 2024-05-03 05:54:07Z

0

The problem is the brackets. You could use redshift spectrum maybe:

CREATE EXTERNAL TABLE my_data(id int, name varchar)
row format serde 'org.openx.data.jsonserde.JsonSerDe'
with serdeproperties ('strip.outer.array'='true');

Here's a COPY workaround assuming the entire file is under 16mb:

SET json_parse_truncate_strings=ON; --not necessarily needed but helpful

create table temptable(entirefile super);

copy temptable
from 's3://bucket/file.json'
iam_role '...'
format json 'noshred';

create mytable as
select data from temptable t, t.entirefile data;

edited May 3, 2024 at 5:54

answered May 3, 2024 at 4:03

user433342

1,1081 gold badge13 silver badges36 bronze badges

Collectives™ on Stack Overflow

JSON file not loading into redshift

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related