1

I have some XML as follows:

<dd>
    <persson>
        <name>sam</name>
        <tel>9748</tel>
    </persson>
    <cat>
        <name>frank</name>
    </cat>
</dd>

I parsed it into two SQL tables, one for the tags and one for the pcdata. The start and stop columns represent the position the tag appears at and ends.

Tags:
 start | stop |  tag   
-------+------+--------
     3 |    5 | name
     6 |    8 | tel
     2 |    9 | persson
    11 |   13 | name
    10 |   14 | cat
     1 |   15 | dd
(6 rows)

Pcdata:
 pos | pcdata 
-----+--------
   4 | sam
   7 | 9748
  12 | frank
(3 rows)

Now i'd like to parse this database back into XML in the original form. I want to write a function that takes both tables and writes the XML out in a file. I'm using python and psycopg2 to do this.

2
  • What happened to the OP? Commented Nov 10, 2010 at 15:43
  • @Danosure: looks like he was editing his question, stuffed it up, and didn't know how to fix it. I've done a rollback. Commented Nov 10, 2010 at 18:53

3 Answers 3

3

H'mmm having decoded your "columns":

<dd><persson><name>sam</name><tel>9748</tel></persson>
1   2        3     4  5      6    7   8     9                    
<cat><name>frank</name></cat></dd>
10   11    12   13     14    15

I have some questions for you: How did you do that? Why did you do that? Exactly what do you want to achieve? Note that your question title is rather misleading -- "SQL tables" are merely where you have parked your peculiar representation of the data.

Here's some pseudocode to do what you want to do:

pieces = []
result = cursor.execute("select * from tags;")
for start, step, tag in result:
    pieces.append((start, "<" + tag + ">"))
    pieces.append((stop, "</" + tag + ">"))
result = cursor.execute("select * from pcdata;")
for pos, pcdata in result:
    pieces.append((pos, pcdata))
pieces.sort()
xml_stream = "".join(piece[1] for piece in pieces)
your_file_object.write(xml_stream)

In answer to the question about whether the above would put the "positions" in the output stream: No it won't; the following snippet shows it working. The positions are used only to get the soup sorted into the correct order. In the "join", piece[0] refers to the position, but it isn't used, only piece[1] which is the required text.

>>> pieces
[(3, '<name>'), (4, 'sam'), (5, '</name>')]
>>> ''.join(piece[1] for piece in pieces)
'<name>sam</name>'

Relenting on the SQL comment-question:

Although shown with SQLite, this is bog-standard SQL. If your database doesn't grok || as the concatenation operator, try +.

Question that you forgot to ask: "How do I get a <?xml blah-blah ?> thingie up the front?". Answer: See below.

console-prompt>sqlite3
SQLite version 3.6.14
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create table tags (start int, stop int, tag text);
sqlite> insert into tags values(3,5,'name');
sqlite> insert into tags values(6,8,'tel');
sqlite> insert into tags values(2,9,'persson');
sqlite> insert into tags values(11,13,'name');
sqlite> insert into tags values(10,14,'cat');
sqlite> insert into tags values(1,15,'dd');
sqlite> create table pcdata (pos int, pcdata text);
sqlite> insert into pcdata values(4,'sam');
sqlite> insert into pcdata values(7,'9748');
sqlite> insert into pcdata values(12,'frank');
sqlite> select datum from (
   ...>     select 0 as posn, '<?xml version="1.0" encoding="UTF-8"?>' as datum
   ...>     union
   ...>     select start as posn, '<' || tag || '>' as datum from tags
   ...>     union
   ...>     select stop as posn, '</' || tag || '>' as datum from tags
   ...>     union
   ...>     select pos as posn, pcdata as datum from pcdata
   ...>     )
   ...> order by posn;
<?xml version="1.0" encoding="UTF-8"?>
<dd>
<persson>
<name>
sam
</name>
<tel>
9748
</tel>
</persson>
<cat>
<name>
frank
</name>
</cat>
</dd>
sqlite>
Sign up to request clarification or add additional context in comments.

8 Comments

Yes i numbered the positions of each tag and stored it in the tables. It is the way i have been asked to represent the data.
How can i now regenerate the XML from the database tables that i've made?
"the way I have been asked" by whom? Teacher? Boss? Chain-saw-toting maniac?
Won't this add the position (start, stop) to the XML file? I'd like to simply recreate the original XML as it was (and the point of the positions are just to help figure out where a tag starts and ends)
@Spawn: No it won't. Try it and see. Or read my updated answer.
|
1

The simple answer is don't. If you are using Postgres 8.3 or above, build the XML using SQL. It will be much easier.

http://www.postgresql.org/docs/current/static/functions-xml.html

Comments

1

First off, if Postgres includes mechanisms to create XML for you, use them.

Second off, don't use string manipulation to create XML unless you really know what you're doing. And even then, don't. Just concatenating string values from your database will produce poorly-formed XML if any column contains an ampersand, for instance.

Unless you will be processing too much data to fit in memory, use John Machin's approach to parse the data into elements and lxml.etree to create the actual XML elements.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.