Extracting data from an HTML code

Question

I have this piece of HTML code as a string stored in a variable.

<p style="text-align: center;">
    <span style="font-size: small;font-family: comic sans ms', sans-serif;">
        <strong>
            word1&nbsp;
            <span style="line-height: 1.5;">
                word2&nbsp;
            </span>
            <span style="line-height: 1.5;">
                word3&nbsp;
            </span>
            <span style="line-height:1.5;"></span>
        </strong>
    </span>
</p>

I want only to extract word1 , word2  and word3 . How can I do it in an easiest and time efficient way?

I was thinking the character > that was not preceded immediately by < can be a index where I can start extracting my data.

It's not quite regex, but document.querySelector('p').innerText.split(' ') will extract the information, more or less. — litel
– litel, Commented May 4, 2016 at 6:50
@litel --> The HTML Code above was a string stored in a variable. How will I do it in my case? — user4621642
– user4621642, Commented May 4, 2016 at 6:55
What language are you extracting the variable from? How are you getting the variable? — litel
– litel, Commented May 4, 2016 at 6:57

Pawan Kashyap · Accepted Answer · 2016-05-04 06:43:07Z

1

Using jQuery you can fetch easily

lets try this one:-

$('p').text();

it will return the combined text contents of each element in the set of matched elements, including their descendants, or also used to set the text contents of the matched elements.

answered May 4, 2016 at 6:43

Pawan Kashyap

1464 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user4621642 Over a year ago

The HTML Code above was a string stored in a variable. How will I do it in my case?

Pawan Kashyap Over a year ago

You can add that code in some where in Div and set display property "none" and do like $('p').text().trim().split("\xa0");

madalinivascu · Accepted Answer · 2016-05-04 07:03:45Z

1

Try something like this:

var html = '<p style="text-align: center;">
    <span style="font-size: small;font-family: comic sans ms, sans-serif;">
        <strong>
            alyssa&nbsp;
            <span style="line-height: 1.5;">
                enganio&nbsp;
            </span>
            <span style="line-height: 1.5;">
                gono&nbsp;
            </span>
            <span style="line-height:1.5;"></span>
        </strong>
    </span>
</p>';
    var values = $(html).find('p strong').text().split(' ');

or

var v =[];
v.push($(html).find('p strong').clone().find('span').remove().end().text());
$(html).find('p strong span').each(function(i,val) {
if($.trim($(val).text()).length>0)
v.push($(val).text())
});
console.log(v);

edited May 4, 2016 at 7:03

answered May 4, 2016 at 6:25

madalinivascu

32.4k4 gold badges41 silver badges59 bronze badges

2 Comments

user4621642 Over a year ago

The HTML Code above was a string stored in a variable. How will I do it in my case?

Adder Over a year ago

Assuming code is in myhtml variable, just replace $('p strong') with $(myhtml).find('p strong') and $('p strong span') with $(myhtml).find('p strong span'). (has now been fixed in the reply)

memo · Accepted Answer · 2016-05-04 07:14:48Z

1

Since you used regex tag I will post a solution with regex.

var re = /\w+&nbsp;/g;
var results = html.match(re);

Then you can access the results from "results" array.

answered May 4, 2016 at 7:14

memo

2732 silver badges15 bronze badges

Comments

Dhara Parmar · Accepted Answer · 2016-05-04 06:27:55Z

0

Just use this, it will return you all text inside p tag - "alyssa  , enganio  , gono ":

alert($("p").text());

answered May 4, 2016 at 6:27

Dhara Parmar

8,1231 gold badge19 silver badges27 bronze badges

4 Comments

user4621642 Over a year ago

The HTML Code above was a string stored in a variable. How will I do it in my case?

Dhara Parmar Over a year ago

add that html in hidden div in page, process above line and as you get result just remove that hidden div

Dhara Parmar Over a year ago

ok then after getting 'alyssa enganio gono' you can just split them using string.split(" "). and you can get them seperated

user4621642 Over a year ago

No I have to preserved the blank spaces within the span, because it's part of the text format.

Community · Accepted Answer · 2017-05-23 10:28:48Z

0

I think you want fetch text of tag without text of children.

So just see this thread

This code:

 console.log($("strong").clone().children().remove().end().text());

And to changing a string to jQuery object see this thread

This code:

var element = $('<div id="a1"></div><div id="a3"></div>');

edited May 23, 2017 at 10:28

CommunityBot

11 silver badge

answered May 4, 2016 at 6:57

amirpaia

3762 silver badges10 bronze badges

Collectives™ on Stack Overflow

Extracting data from an HTML code

5 Answers 5

2 Comments

2 Comments

Comments

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

2 Comments

Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related