0

I have an XML column that contains thousands of rows. Each row contains the xml representation of a metadata file.

How do I extract multiple xml fields from each row? I guess I need to use xpath (https://www.postgresql.org/docs/current/static/functions-xml.html), but the given examples are not enough for me to understand it.

Let's assume there's this in a row called "data" in the table "xml":

> <gmd:MD_Metadata xmlns:gmd="http://www.isotc211.org/2005/gmd"
> xmlns:gco="http://www.isotc211.org/2005/gco"
> xmlns:gml="http://www.opengis.net/gml"
> xmlns:xlink="http://www.w3.org/1999/xlink"
> xmlns:geonet="http://www.fao.org/geonetwork"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="http://www.isotc211.org/2005/gmd
> something.com/schemas/inspire/gmd/gmd.xsd">
> <gmd:contact>
>     <gmd:CI_ResponsibleParty>
>       <gmd:organisationName>
>         <gco:CharacterString>Something</gco:CharacterString>
>       </gmd:organisationName>
>       <gmd:contactInfo>
>         <gmd:CI_Contact>
>           <gmd:address>
>             <gmd:CI_Address>
>               <gmd:electronicMailAddress>
>                 <gco:CharacterString>[email protected]</gco:CharacterString>
>               </gmd:electronicMailAddress>
>             </gmd:CI_Address>
>           </gmd:address>
>         </gmd:CI_Contact>
>       </gmd:contactInfo>
>     </gmd:CI_ResponsibleParty>   
> </gmd:contact>

How do I get the organisationName and the electronicMailAddress for all rows in the xml column? What would the query look like as a select statement?

1 Answer 1

1

Something like the following should do the trick:

SELECT
    (xpath('//gmd:organisationName/gco:CharacterString/text()',t1,'{{gmd,http://www.isotc211.org/2005/gmd},{gco,http://www.isotc211.org/2005/gco}}'))[1]::text,
    (xpath('//gmd:electronicMailAddress/gco:CharacterString/text()',t1,'{{gmd,http://www.isotc211.org/2005/gmd},{gco,http://www.isotc211.org/2005/gco}}'))[1]::text
FROM xml,
    LATERAL unnest((
        SELECT
            xpath('//gmd:contact',data,'{{gmd,http://www.isotc211.org/2005/gmd}}')
    )) t1;

I examine all rows of xml and for each of them I run a LATERAL subquery to extract all contacts. Then for each contact I extract the organisationName and CharacterString fields. Unfortunately, the query is bit long because of all the namespace stuff.

Sign up to request clarification or add additional context in comments.

2 Comments

It works well, if you limit the nested select query by adding LIMIT 1, otherwise it returns more than one row, causing the query to fail. Since the database contains standardised metadata every single record contains a field that's called "gmd:contact"
@stopopol Yes, you are right. I rewrote the query with a LATERAL subquery, so that it has not problems with the multiple rows. It will not extract the fields from all rows.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.