1

I have a table like this which I spent a full day trying to get the data from:

<table class="table table-condensed">
<tr>
<td>Monthely rent</td>
<td><strong>Fr. 1'950. </strong></td>
</tr>

<tr>
<td>Rooms(s)</td>
<td><strong>3</strong></td>
</tr>

<tr>
<td>Surface</td>
<td><strong>93m2</strong></td>

</tr>

<tr>
<td>Date of Contract</td>
<td><strong>01.04.17</strong></td>
</tr>

</table>

As you can see the data is well organized, and I am trying to get this result:

monthly rent => Fr. 1'950. 
Rooms(s) => 3
Surface => 93m2
Date of Contract => 01.04.17

I have the table contained inside a variable $table and tried to use DOM

$dom = new DOMDocument(); 
$dom->loadHTML($table);
$dom = new \DomXPath($dom);
$result = $dom->query('//table/tr');
return $result; 

But to no avail, is there any easier way to get the contents in php/regex?

2 Answers 2

2

You're on the right track with DOM and Xpath. Do not use Regular Expressions to parse HTML/XML. RegEx are for matching text and often used as a part of a parser. But a parser for a format knows about it features - a RegEx does not.

You should keep you variable names a little more clean. Do not assign different types to the same variable in the same context. It only shows that the variable name might be to generic.

DOMXpath::query() allows you to use Xpath expressions, but only expression that return a node list. DOMXpath::evaluate() allows you to fetch scalar values, too.

So you can fetch the tr elements, iterate them and use additional expression to fetch the two values using the tr element as the context.

$document = new \DOMDocument(); 
$document->loadHTML($table);
$xpath = new \DOMXPath($document);

foreach ($xpath->evaluate('//table/tr') as $tr) {
  var_dump(
     $xpath->evaluate('string(td[1])', $tr),
     $xpath->evaluate('string(td[2]/strong)', $tr)
  );
}

Output:

string(13) "Monthely rent"
string(11) "Fr. 1'950. "
string(8) "Rooms(s)"
string(1) "3"
string(7) "Surface"
string(4) "93m2"
string(16) "Date of Contract"
string(8) "01.04.17"
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks so much, this is without a doubt the best answer I found on SO. Just one last question, I have table without <strong> in the second <td> so how can I tell the second xpath evaluate string(td[2]/strong) to add an OR operator to that /strong ?
Just to add, I just added string(td[2]/text() and it gets the content from the <td> without the strong, but not from the the <td> where it has <strong> in it
It is just string(td[2]). td[2] fetches the second position td. string() casts the first node of the result into a string. That returns all its text content. Including the descendants. td[2]/text() fetches the text child nodes of the second td. Cast the first of them into a string and you will get only that part of the text content.
1

Try this out:

$dom = new DOMDocument();
$dom->loadHTML($table);
$dom = new \DomXPath($dom);
$result = $dom->query('//table/tr/td/strong');

foreach($result as $item) {
  echo $item->nodeValue . "\n";
}

That will print the element. However, you will probably want to setup your data in a way that you dont have to deal with the html tags like <strong>. You might want to use xml or even json.

4 Comments

is there any regex solution for this? I mean to get the pairs
Not sure why the above answer wouldn't work for you, but with just regex, you could do something like this: regex101.com/r/uDhahb/1 That being said, you should be able to get both the name and the value from the DOMDocument.
@user7342807 You could use regex to get the items, but that will be complicated too. The problem with regex is that if the tags change, there could be issues getting to the data properly.
@Quixrick Is there any way to get the values? I only got the keys i.e monthly rent, rooms, surface ... but not the values. Your answer is the only thing that helped me even come close I did, preg_match_all(pattren, html, matches); not sure I am missing something

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.