0

I have to read HTML table data as XML. But I am not able to get all the information as in my required format.

      declare @xml xml='<body bgcolor="#FFFFFF">
  <div id="Edit01" style="position:absolute; left:5px; top:4px; width:462px; height:196px; z-index:1">    
    <table width="462" border="0" cellspacing="0" cellpadding="0">
      <tr>
        <td colspan="5" width="462">
          <span class="auditnoteheader">Charges: </span>
        </td>
      </tr>
      <tr>
        <td colspan="5" width="462">
          <span class="AuditNoteText">Submitted by ELSGH </span>
        </td>
      </tr>
      <tr>
        <td colspan="5" width="462">
          <span class="AuditNoteText">Jul 20 2018  9:15PM Eastern Standard Time</span>
        </td>
      </tr>
      <tr class="AuditNoteSubHeader">
        <td width="8" />
        <td width="230" valign="top">Charge</td>
        <td width="110" valign="top">Old Charge Status</td>
        <td width="114" valign="top">New Charge Status</td>
      </tr>
      <tr class="AuditNoteText">
        <td width="8" />
        <td width="230" valign="top">
          <font color="009900">99214      OFFICE OUTPATIENT VISIT 25 MINUTES</font>
        </td>
        <td width="110" valign="top">
          <font color="009900">Review</font>
        </td>
        <td width="114" valign="top">
          <font color="009900">Submitted</font>
        </td>
      </tr>
      <tr class="AuditNoteText">
        <td width="8" />
        <td width="230" valign="top">
          <font color="009900">36415      COLLECTION VENOUS BLOOD</font>
        </td>
        <td width="110" valign="top">
          <font color="009900">Review</font>
        </td>
        <td width="114" valign="top">
          <font color="009900">Submitted</font>
        </td>
      </tr>
      <tr class="AuditNoteSeparater">
        <td colspan="5" height="2">
                    --------------------------------------------------------------------------------------------
                </td>
      </tr>
    </table>
  </div>
</body>'

I was trying using this query.

 SELECT TR.AT1.query('data(span)') ,TR.AT1.query('*') ,TR.AT1.value('.','varchar(max)')
FROM @xml.nodes('/body/div/table') as T(N)
cross apply T.N.nodes('./tr/td') as TR(AT1)
cross apply TR.AT1.nodes('.') as para(p1)

Inside body tag I am getting multiple tables. first 3 tag(can be different) is table information. Next row with class="AuditNoteSubHeader" is table header and after it all class="AuditNoteText" contains table data. I need to extract this all information. Can any one please help on this ?

My expected output is:

enter image description here

for AuditNoteText I am getting multiple rows, So to differentiate it I had given numbers like AuditNoteText1, AuditNoteText2.

2 Answers 2

1

Your expected output is not the best format in my eyes. If this is not an external need, you might try something like this:

;WITH AllTr AS
(
    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RowIndex
          ,tr.value('@class','nvarchar(max)') AS trClass  
          ,tr.query('.') AS trNode
    FROM @xml.nodes('//table/tr') A(tr)
)
,AllTd AS
(
    SELECT AllTr.*
          ,ROW_NUMBER() OVER(PARTITION BY RowIndex ORDER BY (SELECT NULL)) AS ColumnIndex
          ,td.value('(.//*/@class)[1]','nvarchar(max)') AS tdClass  
          ,td.value('(.//text())[1]','nvarchar(max)') AS tdText
    FROM AllTr
    OUTER APPLY trNode.nodes('tr/td[.//text()]') A(td)
)
SELECT RowIndex
      ,ColumnIndex
      ,trClass
      ,tdClass
      ,tdText
FROM AllTd;

This will provide a row counter and a partitioned column counter. This might be better than name numbered class names.

Sign up to request clarification or add additional context in comments.

Comments

1
;WITH C1 AS (
  SELECT    ISNULL(T.N.value('@class', 'varchar(50)'), TR1.AT1.value('@class', 'varchar(50)')) Hdr
            , CONVERT(VARCHAR, DENSE_RANK() OVER ( PARTITION BY TR1.AT1 ORDER BY N )-1) AS HdrNum
          , TR.AT1.value('.', 'varchar(max)') AS Data
  FROM      @xml.nodes('/body/div/table/tr,/body/div/table/tr/td/span') AS T ( N )  
            CROSS APPLY T.N.nodes('./td') AS TR ( AT1 )
            OUTER APPLY T.N.nodes('./td/span') AS TR1 ( AT1 ) 
            WHERE TR.AT1.value('.', 'varchar(max)') NOT LIKE '%---%' 
                    AND TR.AT1.value('.', 'varchar(max)') <> ''
 )
 SELECT Hdr + CASE WHEN HdrNum = '0' THEN '' ELSE HdrNum END AS Hdr
 , Data
 FROM C1 ORDER BY hdr

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.