1

I am trying to send records of API calls to S3 then only specific fields to redshift. For example, my program sends the following JSON to S3 successfully:

{

"user": "user_name",
"type": "GET",
"date": "2016-03-22 16:14:13",
"data": [{
    "EmpID": 1,
    "LastName": "Smith",
    "FirstName": "Bob",
    "DOB": "1910-01-01",
    "SSN": "123456789",
    "HireDate": "1910-01-01",
    "City": "Town",
    "Address": "123 Abc"
}]

}

The "data" field may contain one record like the example above or many records if the query returns multiple results. I want to copy only the information from data into the "Employee" table on redshift with the same column names. Copy with json 'auto' does not work because it parses only top level fields. In an attempt to solve this, I created the following JSON paths file:

{

"jsonpaths": [
    "$.data[0].EmpID",
    "$.data[0].LastName",
    "$.data[0].FirstName",
    "$.data[0].DOB",
    "$.data[0].SSN",
    "$.data[0].HireDate",
    "$.data[0].City",
    "$.data[0].Address",
]

}

Here is an example run of my program:

  1. Run my AWS Java web project on Tomcat server
  2. Enter http://localhost:000/projectname/rest/employee/1 (which successfully returns a json object for employee id 1 from my MS SQL server database)
  3. Creates the json object at the top of this post with metadata about the call and the result of the query
  4. Writes this object successfully to 's3://bucket/prefix/'
  5. Tries to run the copy command and fails. When I go to the load tab of my redshift cluster for more information about why the load failed the status of the command is terminated and the SQL field says "COPY ANALYZE employee"
  6. When I try to run the manual version of the copy command below, I get the following error: "ERROR: XX000: Manifest file is not in correct json format"

    COPY employee
    FROM 's3://bucket/prefix/filename'
    CREDENTIALS 'aws_access_key_id=<>;aws_secret_access_key=<>'
    JSON 's3://bucket/jsonpaths.json';
    

In an effort to address the manifest problem, I created the following file:

{

  "entries": [
    {"url":"s3://bucket/prefix/file", "mandatory":true}
  ]

}

then ran the copy command:

    COPY employee
    FROM 's3://bucket/manifest.json'
    CREDENTIALS'aws_access_key_id=<>;aws_secret_access_key=<>' 
    MANIFEST

This yields the stl_load_error: "Delimiter not found" with the raw field value as the full json file saved to s3. If I add the jsonPaths file line from the previous copy command, I get the same manifest error as before.

Questions:

-Is my JSONPaths file in the correct format?

-Is this the best way to extract from lower levels of a JSONArray?

-Is the format of my manifest file correct? Why is a manifest file necessary?

Note: before I added the metadata, I tested the s3 to redshift connection with just the fields I wanted to load and it loaded successfully, so it is unlikely my redshift destination is improperly configured.

Thank you for your help!

1 Answer 1

2

From what I remember, when you try to load JSON data and your JSONPaths file has a syntax error, Redshift throws the (incorrect and misleading) exception that your manifest file is ill-formed. So, I am pretty sure that the following command would work, if there are no syntax errors in your JSONPaths file:

COPY employee
FROM 's3://bucket/prefix/filename'
CREDENTIALS 'aws_access_key_id=<>;aws_secret_access_key=<>'
JSON 's3://bucket/jsonpaths.json';

Now, what's wrong with your JSONPaths file? To me, it looks fine, but I can suggest couple of things you should try:

  • Remove the comma in the last entry, i.e., change "$.data[0].Address", to "$.data[0].Address".
  • If the above doesn't work out, try without using dot notation. For e.g., change "$.data[0].EmpID" to "$['data'][0]['EmpID']".

P.S.- This is completely wrong:

COPY employee
FROM 's3://bucket/manifest.json'
CREDENTIALS'aws_access_key_id=<>;aws_secret_access_key=<>' 
MANIFEST

That is because you are telling Redshift to use a manifest file, but not that the input data is in JSON format or what JSONPaths to use. So, by default,

the COPY command expects the source data to be in character-delimited UTF-8 text files. The default delimiter is a pipe character ( | ).

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks so much! Can't believe I missed that. The automatic copy command issued with Kinesis still does not work, but the manual copy command works at least.
Which part did you miss? The last comma or the change in notation to array notation (vs. dot notation)?
@ketan vijayvargiya, how would you handle it for all the objects using the bracket notation?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.