0

I want to extract some data from a table using php preg_match_all(). I have the html as under, I want to get the values in td, say Product code: RC063154016. How can I do that? I don'y have any experience with regex,

  <table width="100%" border="0" cellspacing="0" cellpadding="0">
      <tbody>
        <tr>
          <td><span>Product code:</span> RC063154016</td>                   
          <td><span>Gender:</span> Female</td>
        </tr>
      </tbody>
    </table>
2
  • DomDocument might be better. Take a look at this. Commented Jan 31, 2014 at 12:02
  • 1
    HTML and regex tags are not good friends. Commented Jan 31, 2014 at 12:03

4 Answers 4

3

Use DomDocument

$str = <<<STR
<table width="100%" border="0" cellspacing="0" cellpadding="0">
      <tbody>
        <tr>
          <td><span>Product code:</span> RC063154016</td>                   
          <td><span>Gender:</span> Female</td>
        </tr>
      </tbody>
    </table>
STR;

$dom = new DOMDocument();
@$dom->loadHTML($str);
$tds = $dom->getElementsByTagName('td');
foreach($tds as $td){
  echo $td->nodeValue . '<br>';
}

OUTPUT

Product code: RC063154016
Gender: Female
Sign up to request clarification or add additional context in comments.

5 Comments

Yes that's nice. But there are a lot of td elements in a webpage, and i want the specific ones under that table with <table width="100%" border="0" cellspacing="0" cellpadding="0">! So what about that?
Well how do you identify what you want...id/class attributes/content of certain elements, your choice. An excellent time to read into DomDocument and DOMXpath. With those 2 tools you can manipulate HTML with absolute guarantee. Regex is not the best for structured languages. I use regex to parse simple html, but lets see your full table html, then we can determine the best path to use
I can just identify by table attributes like, width="100%" border="0" cellspacing="0"! SO, anything?
Post your html table markup that you're trying to parse. Regex maybe better, maybe DomDoc is better, lets see the code your working with, a little nuance here and there adds up to mountains, if you understand what I mean :)
Figured it out myself by using Query. Thanks for your answer! :)
0

This should do for you:

preg_match_all('|<td><span>Product code:</span>([^<]*)</td>|', $html, $match);

But if you think there can be random white spaces around tags, then this one:

preg_match_all('|<td>\s*<span>\s*Product code:\s*</span>([^<]*)</td>|', $html, $match);

Comments

0
$data = <<<HTML
  <table width="100%" border="0" cellspacing="0" cellpadding="0">
      <tbody>
        <tr>
          <td><span>Product code:</span> RC063154016</td>
          <td><span>Gender:</span> Female</td>
        </tr>
      </tbody>
    </table>
HTML;


if(preg_match_all('#<td>\s*<span>Product code:</span>\s*([^<]*)</td>#i', $data, $matches)) {
    print_r($matches);
}

Comments

0

Use any one parser and parse the HTML and use it. Don't use preg* functions here. Please read this answer How do you parse and process HTML/XML in PHP?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.