I am wondering if there is any way to create indexes based on some type conversion of xpath results. Specifically an integer index, but I can also imagine date, floating points, etc. Even if it is experimental or coming in a future version of postgres... or another database. I already looked at eXistdb which looked far from being production ready.
Some test XML:
<?xml version="1.0"?>
<book>
<isbn>1</isbn>
<title>Goldfinger</title>
<author>Ian Fleming</author>
</book>
I was able to obtain very satisfactory results with a table and index like this.
Table "public.test"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------+---------------------------------------------------+----------+--------------+-------------
id | integer | not null default nextval('test_id_seq'::regclass) | plain | |
num | integer | | plain | |
data | xml | | extended | |
Indexes:
"test_pkey" PRIMARY KEY, btree (id)
"test_title_index" btree (((xpath('/book/title/text()'::text, data))[1]::text))
for a query such as:
SELECT *
FROM test
WHERE (xpath('/book/title/text()', data))[1]::text = 'Goldfinger';
But there is data in the schemas where a non-text index would make a great deal more sense. For example (I know this is not valid, but it illustrates the point):
SELECT *
FROM test
WHERE (xpath('/book/isbn/text()', data))[1]::int BETWEEN 5 AND 10;
A little background:
I am experimenting with storing XML documents in postgres as I have an application where the primary data types are already in XML and often need to be retrieved as such. The schemas can be very complex, so splitting them into database columns is extremely time consuming, especially as the schemas evolve. I only mention this because I suspect a logical reaction to my question is going to be "break the data out into native columns".