1

So I have a PHP script, which displays an html page. What I need to do, is to extract the innerHTML of a specific element, below I'll show the exact thing that I need to extract

So, what I need to extract is the 0.0225 sequence. Here is a fragment from an HTML file:

<tr>
    <td>Income</td>
    <td id="income">
        <font color="green">
            <span data-c="2250000">0.0225 RP</span>
        </font>
    </td>
</tr>

I tried parsing it with RegEx (I know that it is not recommended but I tried it) and I didn't got nothing. I've tried different DOM implementations for PHP, but the result was the same. I do not know what I can else do, so I'm asking how can I extract those numbers, for further editing, and placing them back...

So, here are my attempts:

The attempt with RegEx:

$html = file_get_contents('the link');    
$regex = '#<td id="income"><font color="green"><span data-c="[.*]">(.*?) BTC</span></font></td>#';
if (preg_match($regex, $html)){echo yay;};

The attempt with DOM:

$html = file_get_contents('the link');    
$dom = new DOMDocument();
$dom->load($html);
$element = $dom->getElemetById("income")->innerHTML;
1
  • 1
    Using a DOM parser is the right approach. Can you show what you tried and what didn't work? Commented Jul 29, 2018 at 22:53

1 Answer 1

2

It's not worth going into why your regex doesn't work, IMO (for general regex knowledge though .... a . doesn't count for new lines (unless s modifier is used) and .* in a character class is allowing either of those 2 literal characters).

For the domdocument you need to get further into the DOM tree to get the value. You can use the xpath for this.

$html = '<tr>
    <td>Income</td>
    <td id="income">
        <font color="green">
            <span data-c="2250000">0.0225 RP</span>
        </font>
    </td>
</tr>';
$dom = new domdocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
echo $xpath->query('//tr/td[@id="income"]/font/span')[0]->nodeValue;
Sign up to request clarification or add additional context in comments.

12 Comments

It complains on $dom->loadHTML($html) is it somehow related to the fact that I get the HTML using file_get_contents() ? And related to the path, how should I modify it, if <tr> isn't parent in my case?
Shouldn't be loadhtml is looking for a string of HTML. If you var_dump that what is it? What is it saying the problem is? Modify the xpath as needed. $xpath->query('//td[@id="income"]/font/span')[0]->nodeValue; should get you there.
I tried to insert directly the link instead of $html and it works fine, but I've got another problem, it is related to the last row, I think that the path from quer(). In your case it works fine, but how should I modify it in my case, or are there some rules of creating this path? Sorry for (maybe) dumb questions, I'm just new to PHP and do not know how to wield it properly, yet))
Sorry for misleading both inserting the link and inserting the variable with file_get_content() do not work. in the first ase it throws he next error Recoverable fatal error: Object of class DOMDocument could not be converted to string in C:\xampp\htdocs\index.php on line 4, and in the second one this error : Tag nav invalid in Entity; htmlParseEntityRef: expecting ';' in Entity; Tag footer invalid in Entity; Object of class DOMDocument could not be converted to string
I cannot post the HTML, but I'll post the code which I'm using
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.