1

I have some html content in a SQL Server column, I want to read the content from the html.

For example:

<ektdesignns_choices ektdesignns_nodetype="element" title="How many gigs do you play each month?" ektdesignns_caption="How many gigs do you play each month?" name="ektpoll1303074024421" ektdesignns_name="ektpoll1303074024421" id="ektpoll1303074024421">
  <ol contenteditable="false" onkeypress="design_validate_choice(1, -1, this, 'Options are required.')" onclick="design_validate_choice(1, -1, this, 'Options are required.')" onblur="design_validate_choice(1, -1, this, 'Options are required.')" ektdesignns_validation="choice-req" ektdesignns_maxoccurs="1" ektdesignns_minoccurs="1" unselectable="on" title="How many gigs do you play each month?" class="design_list_vertical">
    <li>
      <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="1 or fewer_1" title="1 or fewer" id="ID2504263" />
      <label contenteditable="true" unselectable="off" for="ID2504263">1 or fewer</label>
    </li>
    <li>
       <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="2-4_2" title="2-4" id="ID5115606" />
       <label contenteditable="true" unselectable="off" for="ID5115606">2-4</label>
    </li>
    <li>
        <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="5-7_3" title="5-7" id="ID477116" />
        <label contenteditable="true" unselectable="off" for="ID477116">5-7</label>
    </li>
    <li>
        <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="8 or more_4" title="8 or more" id="ID5515606" />
        <label contenteditable="true" unselectable="off" for="ID5515606">8 or more</label>
    </li>
  </ol>
</ektdesignns_choices><input type="submit" value="Vote" />

I want read all the labels in this html. Anyone have any idea, how shall I go about it?

4 Answers 4

2

If your HTML is indeed XHTML compliant, and if you have the HTML stored in a XML column in your SQL Server table, then you could retrieve your labels from it in T-SQL using XQuery:

DECLARE @HtmlTbl TABLE (ID INT IDENTITY, Html XML)

INSERT INTO @HtmlTbl(Html) VALUES('<ektdesignns_choices ektdesignns_nodetype="element" title="How many gigs do you play each month?" ektdesignns_caption="How many gigs do you play each month?" name="ektpoll1303074024421" ektdesignns_name="ektpoll1303074024421" id="ektpoll1303074024421">
  <ol contenteditable="false" onkeypress="design_validate_choice(1, -1, this, ''Options are required.'')" onclick="design_validate_choice(1, -1, this, ''Options are required.'')" onblur="design_validate_choice(1, -1, this, ''Options are required.'')" ektdesignns_validation="choice-req" ektdesignns_maxoccurs="1" ektdesignns_minoccurs="1" unselectable="on" title="How many gigs do you play each month?" class="design_list_vertical">
    <li>
      <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="1 or fewer_1" title="1 or fewer" id="ID2504263" />
      <label contenteditable="true" unselectable="off" for="ID2504263">1 or fewer</label>
    </li>
    <li>
       <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="2-4_2" title="2-4" id="ID5115606" />
       <label contenteditable="true" unselectable="off" for="ID5115606">2-4</label>
    </li>
    <li>
        <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="5-7_3" title="5-7" id="ID477116" />
        <label contenteditable="true" unselectable="off" for="ID477116">5-7</label>
    </li>
    <li>
        <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="8 or more_4" title="8 or more" id="ID5515606" />
        <label contenteditable="true" unselectable="off" for="ID5515606">8 or more</label>
    </li>
  </ol></ektdesignns_choices><input type="submit" value="Vote" />')

This will retrieve all <label> elements from your (X)Html as a single XML string:

SELECT
    Html.query('//label')
FROM @HtmlTbl 
WHERE ID = 1

Output:

<label contenteditable="true" unselectable="off" for="ID2504263">1 or fewer</label>
<label contenteditable="true" unselectable="off" for="ID5115606">2-4</label>
<label contenteditable="true" unselectable="off" for="ID477116">5-7</label>
<label contenteditable="true" unselectable="off" for="ID5515606">8 or more</label>

Or this will select all the contents of the <label> tags, one per row:

SELECT
    C.value('(.)[1]', 'varchar(1000)')
FROM @HtmlTbl 
CROSS APPLY Html.nodes('//label') AS T(C)
WHERE ID = 1

Output:

1 or fewer
2-4
5-7
8 or more
Sign up to request clarification or add additional context in comments.

Comments

0

Pull the data from the DB and then use an HTML parser to pull out the information you want. It will make your life much easier.

Whatever you do, please don't try to use RegExs unless you are only looking for data that matches a regular expression. (since HTML is not a regular language, it often causes more problems than it solves)

Comments

0

If all HTML you have is as well formed as this one you can cast it to XML and use some XQuery to find the label nodes,

select T.N.value('.', 'nvarchar(100)')
from Table
    cross apply XMLCol.nodes('//label') as T(N)

Comments

0

You can use PATINDEX and SUBSTRING if you want to extract a value from a well defined tag

 /*
    <HTML><head><meta name='viewport' content='width=device-width, initial-scale=1.0'>
    </head>
    <BODY onload='document.frmLaunch.submit();'> Redirecting... 
    <FORM name='frmLaunch' method='POST' action='https://acsabsatest.bankserv.co.za/mdpayacs/pareq'>
    <input type=hidden name='PaReq' value='eJxVUctuwjAQ/JWID8jaJjy1tRRKJXKg4iWQuLnOtkSUJDgJ0H597ZCU9pSZ2ex4dxY3B0M0XZOuDEmcU1GoD/KS+KmzWC1HnHWDHu9IXIQrOku8kCmSLJXcZ75AaKntM/qg0lKi0udJ9Co55wgNxhOZaCq56CLcIabqRHKi0mNB5hK+m0Qrr/VAqKuosyotzZcccIbQEqzMpzyUZV6MAa7Xq//WmPg6878VgqsjPOZZVA4V1u+WxHK3iWfrlzlf87lYH6NgyfP9bhvddtvwCcH9gbEqSQrGRywQgceGYz4aswCh1lGd3CByH6484TM7WCNg7t4J70S4wl8BbbqGUt0u0zKkW56l5FoQfjHGVGi7RPN5bPA8c/nq0iY47AeDXn9Qh1wLziqxAQnOerWXIwiuBZrjQXNdi/5d/Qf60asq'>
    <input type=hidden name='TermUrl' value='http://TermUrl'>
    <input type=hidden name='MD' value='469695'></FORM></BODY></HTML>
    */
--Find the start of the tag
    SELECT PATINDEX('%<input type=hidden name=''PaReq'' value=%', @webViewData); 
--(Answer is 246)
-->Find the end of the tag
    SELECT PATINDEX('%''>%', substring(@webViewData,PATINDEX('%<input type=hidden name=''PaReq'' value=%', @webViewData),len(@webViewData))); 
--(Answer is 468)
--Get the value content of the tag
    select substring(@webViewData,246+39,468-40)
--Everything combined:
    select substring(@webViewData,PATINDEX('%<input type=hidden name=''PaReq'' value=%', @webViewData)+39,PATINDEX('%''>%', substring(@webViewData,PATINDEX('%<input type=hidden name=''PaReq'' value=%', @webViewData),len(@webViewData)))-40)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.