PHP Extract data between specific tags from an html file

Question

So I have a PHP script, which displays an html page. What I need to do, is to extract the innerHTML of a specific element, below I'll show the exact thing that I need to extract

So, what I need to extract is the 0.0225 sequence. Here is a fragment from an HTML file:

<tr>
    <td>Income</td>
    <td id="income">
        <font color="green">
            <span data-c="2250000">0.0225 RP</span>
        </font>
    </td>
</tr>

I tried parsing it with RegEx (I know that it is not recommended but I tried it) and I didn't got nothing. I've tried different DOM implementations for PHP, but the result was the same. I do not know what I can else do, so I'm asking how can I extract those numbers, for further editing, and placing them back...

So, here are my attempts:

The attempt with RegEx:

$html = file_get_contents('the link');    
$regex = '#<td id="income"><font color="green"><span data-c="[.*]">(.*?) BTC</span></font></td>#';
if (preg_match($regex, $html)){echo yay;};

The attempt with DOM:

$html = file_get_contents('the link');    
$dom = new DOMDocument();
$dom->load($html);
$element = $dom->getElemetById("income")->innerHTML;

Using a DOM parser is the right approach. Can you show what you tried and what didn't work? — Jeto
– Jeto, Commented Jul 29, 2018 at 22:53

user3783243 · Accepted Answer · 2018-07-29 23:12:32Z

2

It's not worth going into why your regex doesn't work, IMO (for general regex knowledge though .... a . doesn't count for new lines (unless s modifier is used) and .* in a character class is allowing either of those 2 literal characters).

For the domdocument you need to get further into the DOM tree to get the value. You can use the xpath for this.

$html = '<tr>
    <td>Income</td>
    <td id="income">
        <font color="green">
            <span data-c="2250000">0.0225 RP</span>
        </font>
    </td>
</tr>';
$dom = new domdocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
echo $xpath->query('//tr/td[@id="income"]/font/span')[0]->nodeValue;

answered Jul 29, 2018 at 23:12

user3783243

5,2125 gold badges27 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

Eugen-Andrei Coliban Over a year ago

It complains on $dom->loadHTML($html) is it somehow related to the fact that I get the HTML using file_get_contents() ? And related to the path, how should I modify it, if <tr> isn't parent in my case?

user3783243 Over a year ago

Shouldn't be loadhtml is looking for a string of HTML. If you var_dump that what is it? What is it saying the problem is? Modify the xpath as needed. $xpath->query('//td[@id="income"]/font/span')[0]->nodeValue; should get you there.

Eugen-Andrei Coliban Over a year ago

I tried to insert directly the link instead of $html and it works fine, but I've got another problem, it is related to the last row, I think that the path from quer(). In your case it works fine, but how should I modify it in my case, or are there some rules of creating this path? Sorry for (maybe) dumb questions, I'm just new to PHP and do not know how to wield it properly, yet))

Eugen-Andrei Coliban Over a year ago

Sorry for misleading both inserting the link and inserting the variable with file_get_content() do not work. in the first ase it throws he next error

Recoverable fatal error: Object of class DOMDocument could not be converted to string in C:\xampp\htdocs\index.php on line 4

, and in the second one this error :

Tag nav invalid in Entity;  htmlParseEntityRef: expecting ';' in Entity; Tag footer invalid in Entity;  Object of class DOMDocument could not be converted to string

Eugen-Andrei Coliban Over a year ago

I cannot post the HTML, but I'll post the code which I'm using

|

Collectives™ on Stack Overflow

PHP Extract data between specific tags from an html file

1 Answer 1

12 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

12 Comments

Your Answer

Sign up or log in

Post as a guest

Related