2

I have issues using the copy command in redshift to load in JSON objects, I am receiving a file in the below JSON format which fails when attempting to use the copy command, however when I adjust the json file to the bottom it works. This is not an ideal solution as I am not permiited to modify the JSON file

this works fine :

{
   "id": 1,
   "name": "Major League Baseball"
}
{
   "id": 2,
   "name": "National Hockey League"
}

This does not work (notice the extra square brackets)

[
{"id":1,"name":"Major League Baseball"},
{"id":2,"name":"National Hockey League"}
]

this is my json path

{
    "jsonpaths": [
        "$['id']",
        "$['name']"
    ]
}
4
  • can you clarify the question a bit? Commented Jun 16, 2016 at 5:13
  • I am attempting to copy some JSON into Redshift, the trouble is the JSON contains "[" and commas (see above JSON) in areas breaking the loading in of this data, I could write a script to get around this issue but ideally i want to avoid that Commented Jun 16, 2016 at 8:21
  • Each row's worth of data needs to be a separate json object like in the first example. The jsonpaths specification is a list, but it's a list of column locators within one object. To support the second example Redshift would have to parse vast files before it could decide what values were part of a row. Commented Jun 21, 2016 at 6:06
  • can you give me an example how this could be done? Commented Jun 21, 2016 at 9:04

2 Answers 2

1

The problem with the COPY command is it does not really accept a valid JSON file. Instead, it expects a JSON-per-line which is shown in the documentation, but not obviously mentioned.

Hence, every line is supposed to be a valid JSON but the full file is not. That's why when you modify your file, it works.

Sign up to request clarification or add additional context in comments.

Comments

0

The problem is the brackets. You could use redshift spectrum maybe:

CREATE EXTERNAL TABLE my_data(id int, name varchar)
row format serde 'org.openx.data.jsonserde.JsonSerDe'
with serdeproperties ('strip.outer.array'='true');

Here's a COPY workaround assuming the entire file is under 16mb:

SET json_parse_truncate_strings=ON; --not necessarily needed but helpful

create table temptable(entirefile super);

copy temptable
from 's3://bucket/file.json'
iam_role '...'
format json 'noshred';

create mytable as
select data from temptable t, t.entirefile data;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.