1

I am using Postgres 10 and have table like this

             Table "musicbrainz.acoustid_meta"

    Column    |       Type        | Collation | Nullable | Default
--------------+-------------------+-----------+----------+---------
 id           | integer           |           | not null |
 track        | character varying |           |          |
 artist       | character varying |           |          |
 album        | character varying |           |          |
 album_artist | character varying |           |          |
 track_no     | character varying |           |          |
 disc_no      | character varying |           |          |
 year         | character varying |           |          |
Indexes:
    "acoustid_meta_index" btree (id)

and I used to have csv files such as

id,track,artist,album,album_artist,track_no,disc_no,year
23033007,Satellite,Dave Matthews Band,Under the Table & Dreaming,Dave Matthews Band,3,\N,1994

that I imported with

psql jthinksearch -c "copy musicbrainz.acoustid_meta from '/home/ubuntu/code/acoustid-server/meta.full.$LATEST.csv' DELIMITER ',' CSV HEADER";
p

But now the files are jsonl files, with each line this this

{"id":339058430,"track":"Track14","artist":"Unknown Artist","album":"Unknown Title","album_artist":"Unknown Artist","track_no":14,"disc_no":null,"year":null}

How do I import these files safetly, I have tried using sed as a workaround to convert the file to csv but not quite right

cat $LATEST-meta-update.jsonl|sed 
-e 's/{"id"://' 
-e 's/"track"://' 
-e 's/"artist"://' 
-e 's/"album"://' 
-e 's/"album_artist"://' 
-e 's/"track_no"://' 
-e 's/"disc_no"://' 
-e 's/"year"://' 
-e 's/\\\\"//g'  >meta.csv

Also I have 5 different tables to import, so will have to construct sed for each.

Update Just realized the purpose of the end column that I ignored for simplicity

If is a new record to be added to table will have

"created":"2020-02-01T00:00:13.225963+00:00"

but if the records needs to replace existing record will have

"updated":"2020-02-03T13:20:12.988533+00:00"

When in do the insert using cross join populate_from_json how do I use a where clause to restrict to only use the ones with the created field ?

2
  • What is "jsonl"? One separate json object per line or something? I've never come across that before (not as a formal format) so I don't think you'll find a standard tool to handle it. Obviously you can throw together a simple python script to either parse each line or just split the lines out and expand them in PostgreSQL itself. Commented Sep 20, 2021 at 13:10
  • i think so each represents a line in a table, I dont know python I was hoping that postgres could handle simple json itself, no idea why it has been changed from csv to jsonl Commented Sep 20, 2021 at 13:29

1 Answer 1

3

If you have to do that only in Postgres create auxiliary schema with go-between tables like this:

create schema jsons;
create table jsons.acoustid_meta(data jsonb);

Copy the file to the go-between table:

copy jsons.acoustid_meta from ...

And parse jsons with the Postgres script:

insert into musicbrainz.acoustid_meta
select id, track, artist, album, album_artist, track_no, disc_no, year
from jsons.acoustid_meta 
cross join jsonb_populate_record(null::musicbrainz.acoustid_meta, data);
truncate jsons.acoustid_meta;

Update. You can examine json values by referring to data, example:

insert into musicbrainz.acoustid_meta
select id, track, artist, album, album_artist, track_no, disc_no, year
from jsons.acoustid_meta 
cross join jsonb_populate_record(null::musicbrainz.acoustid_meta, data)
where data->'created' is not null;
Sign up to request clarification or add additional context in comments.

6 Comments

But it doesn't look like the file IS json. It's line-separated separate json objects rather than being one json array of objects. However - you could do it in two steps. Read the file in as a single-column text table, then cast each value in that table to jsonb and then expand that.
Yes, but this is ok. It will be imported to the separate rows in a table in the form as is. No casts needed.
Thanks yes I dont quite understand the syntax but this does seem to work, I think this is probably the solution.
@klin I was assuming that it would try to parse it as a single json value. Thinking about it though, of course it will accept everything up to the newline as json, treat that as a row and then COPY will see the newline, start a new row... etc. cool.
@klin i have supplementary question regarding filter insert with a where clause, if you could take a look at the updated question would appreciate it.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.