0

I want to convert bibtex citations stored as bibjson pulled from a single JSON data field in PostgreSQL (13.5) into proper citation strings.

One example:

SELECT data
  FROM citation LIMIT 1;
 {"id": 1, "note": "DOI 10.1186/s12862-017-1032-x", "type": "article", "title": "Contrasting morphology with molecular data: an approach to revision of species complexes based on the example of European Phoxinus (Cyprinidae)", "author": [{"given": "Anja",
 "family": "Palandačić"}, {"given": "Alexander", "family": "Naseka"}, {"given": "David", "family": "Ramler"}, {"given": "Harald", "family": "Ahnelt"}], "issued": {"date-parts": [[2017]]}, "container-title": "BMC Evolutionary biology, 17(1), 184"}

I want to convert these finally to a proper citations string as latex would. For this I started with the title. I found out that PostgreSQL has a daunting amount of functions for dealing with JSON and that I can use e.g. '->' to get one elements text.

SELECT 
  data->>'title' AS title
  from citation limit 1; 

would give me then

"Contrasting morphology with molecular data: an approach to revision of species complexes based on the example of European Phoxinus (Cyprinidae)"

As a second step I wanted to get all the people and convert them to strings. And this is also where I have problems now to progress and my question arises.

SELECT 
  data->>'author' AS authors
  FROM citation limit 1; 

gives me this...

 [{"given": "R.", "family": "Reinartz"}, {"given": "S.", "family": "Lippold"}, {"given": "D.", "family": "Lieckfeldt"}, {"given": "A.", "family": "Ludwig"}]

but how would I smartly condense this into a proper string like this?

"Reinhartz, R. Lippold S., Lieckfeldt D., Ludwig, A."

Or do it even smarter and get Author XY et al if there is more than one author?

Any help with this would be appreciated.

UPDATE:

From the hints in the comments I tried jsonb_array_elements_text:

SELECT 
  jsonb_array_elements_text(data->'author') AS author 
FROM citation LIMIT 1;

However this gives me e.g. only the first object in the array.

{"given": "R.", "family": "Reinartz"}

Interestingly this only happens when I query form the table. When I copy paste the array manually like so:

select * from json_array_elements('[{"given": "R.", "family": "Reinartz"}, {"given": "S.", "family": "Lippold"}, {"given": "D.", "family": "Lieckfeldt"}, {"given": "A.", "family": "Ludwig"}]')

Then this step works:

                 value
-----------------------------------------
 {"given": "R.", "family": "Reinartz"}
 {"given": "S.", "family": "Lippold"}
 {"given": "D.", "family": "Lieckfeldt"}
 {"given": "A.", "family": "Ludwig"}

OK the limit of 1 restricts to 1 element in the output being shown of course. That was the reason here. So this works:

SELECT 
jsonb_array_elements_text(data->'author') AS author 
FROM citation;

And produces:

                 value
-----------------------------------------
 {"given": "R.", "family": "Reinartz"}
 {"given": "S.", "family": "Lippold"}
 {"given": "D.", "family": "Lieckfeldt"}
 {"given": "A.", "family": "Ludwig"}

In a next step process the inner objects:

select 
  concat(jsonb_array_elements(data->'author')->>'family', ' ',
    jsonb_array_elements(data->'author')->>'given') AS author
  from citation;

Concatenates the first and family name into a single string on each row.

@Bergi posted a solution which works out pasting together the title and the authors. See in his comment or:

SELECT data->>'title' || ' by ' || (SELECT string_agg((a->>'family') || ' ' || (a->>'given'), ', ') FROM jsonb_array_elements(data->'author') a) FROM citations

With a slight modification I now ended up with the following:

SELECT data->>'title' || 
       ' by ' || 
       (SELECT string_agg((a->>'family') || 
       ' ' || 
       (a->>'given'), ', ') FROM jsonb_array_elements(data->'author') a) || 
       ' ' || 
       concat('(', substring(data->'issued'->>'date-parts', '(?<=\[\[)\d{4}'), ')')
       FROM citation; 

Which also includes the date of the publication. Thanks all for you help.

12
  • For the first query, use ->> instead of -> for details see the manual Commented Apr 4, 2022 at 8:12
  • Updated the original post with that thanks. Commented Apr 4, 2022 at 10:06
  • For the array, use jsonb_array_elements_text in a subquery and aggregate them with string_agg. Commented Apr 4, 2022 at 10:39
  • But really, do you need to do this conversion in postgres with SQL? Commented Apr 4, 2022 at 10:39
  • Unfortunately I need to. I know in any programming language that would be more easy. I am already experimenting with jsonb_array_elements_text to no avail so far. I will also look into string_agg thx to you all. Commented Apr 4, 2022 at 10:42

1 Answer 1

1

You can use the extensive json support of PostgreSQL for example:

SELECT data->>'title' || 
       ' by ' || 
       (SELECT string_agg((a->>'family') || 
       ' ' || 
       (a->>'given'), ', ') FROM jsonb_array_elements(data->'author') a) || 
       ' ' || 
       concat('(', substring(data->'issued'->>'date-parts', '(?<=\[\[)\d{4}'), ')')
       FROM citation; 

Thanks to all for your support!

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.