0

I need to parse an HTML table:

<table>
  <tbody>
    <tr class="blue"><td>code</td><td>rate</td></tr>
    <tr class="gray_1"><td><span>USD</span><em>1</em></td><td>476.16</td></tr>
    <tr class="gray_2"><td><span>AUD</span><em>1</em></td><td>327.65</td></tr>
    <tr class="gray_9"><td><span>IRR</span><em>100</em></td><td>1.13</td></tr>
    <tr class="blue"><td>some comment</td><td>some comment</td></tr>
    <tr class="gray_1"><td><span>EUR</span><em>1</em></td><td>526.54</td></tr>
  </tbody>
</table>

and as a result to get:

+------+----+----+------+
|class |code|em  |value |
+------+----+----+------+
|gray_1|USD |   1| 11.11|
|gray_2|AUD |   1| 22.22|
|gray_9|IRR | 100| 33.33|
|gray_1|EUR |   1| 44.44|

I tried to do it like this:

with tbl as
(
    select xmltype('
        <table>
          <tbody>
            <tr class="blue"><td>code</td><td>rate</td></tr>
            <tr class="gray_1"><td><span>USD</span><em>1</em></td><td>476.16</td></tr>
            <tr class="gray_2"><td><span>AUD</span><em>1</em></td><td>327.65</td></tr>
            <tr class="gray_9"><td><span>IRR</span><em>100</em></td><td>1.13</td></tr>
            <tr class="blue"><td>some comment</td><td>some comment</td></tr>
            <tr class="gray_1"><td><span>EUR</span><em>1</em></td><td>526.54</td></tr>
          </tbody>
        </table>
    ') xml_data from dual
)
select
    *
from
    tbl,
    xmltable('//table/tbody/tr'
        passing tbl.xml_data
        columns
            data varchar2(128) path './td'
    )

But this causes an exception:

ORA-19279: XPTY0004 - XQuery dynamic type mismatch: expected singleton sequence - got multi-item sequence

What do I need to do to fix my code?

3
  • 1
    XML is not fully compatible with HTML so using XML methods on HTML is not recommended. Commented Sep 24, 2019 at 8:53
  • @jdweng I realize that, but is there a better way to parse html with Oracle? I need to get and parse html page by schedule, so I do it using an oracle job which calls a pl/sql procedure. I just find certain piece if a page, then throw away tags like <br>, &nbsp; etc. to get a code that could be parsed like an XML code Commented Sep 24, 2019 at 9:53
  • Most people recommend HtmlAgilityPack. I really don't think it that bad to use xml, but every time I do it I get lots of negative points. Commented Sep 24, 2019 at 10:11

1 Answer 1

2

Your path is looking for a td under the tr; but there are two, hence the "got multi-item sequence" error you're seeing. You can reference each td tag by its position, as td[1] etc. It's very reliant on the table structure being as expected though.

With this specific example you can do:

with tbl as
(
    select xmltype('
        <table>
          <tbody>
            <tr class="blue"><td>code</td><td>rate</td></tr>
            <tr class="gray_1"><td><span>USD</span><em>1</em></td><td>476.16</td></tr>
            <tr class="gray_2"><td><span>AUD</span><em>1</em></td><td>327.65</td></tr>
            <tr class="gray_9"><td><span>IRR</span><em>100</em></td><td>1.13</td></tr>
            <tr class="blue"><td>some comment</td><td>some comment</td></tr>
            <tr class="gray_1"><td><span>EUR</span><em>1</em></td><td>526.54</td></tr>
          </tbody>
        </table>
    ') xml_data from dual
)
select
    x.class, x.currency, x.amount, to_number(x.rate) as rate
from
    tbl
cross join
    xmltable('/table/tbody/tr'
        passing tbl.xml_data
        columns
            class varchar2(10) path '@class',
            currency varchar2(3) path 'td[1]/span',
            amount number path 'td[1]/em',
            rate varchar2(50) path 'td[2]'
    ) x
where
    x.currency is not null

which gets:

CLASS      CUR     AMOUNT       RATE
---------- --- ---------- ----------
gray_1     USD          1     476.16
gray_2     AUD          1     327.65
gray_9     IRR        100       1.13
gray_1     EUR          1     526.54

It won't take much variation in the HTML to break it though. See this answer for some reasons it is fragile, and why it generally considered unwise to try to parse HTML as XML.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.