0

On one site there is data in form of table. I get its source code like this

<tbody>
    <tr>
        <td></td>
        <td><a href="http://www.altassets.net/ventureforum/" target="_blank">AltAssets Venture Forum</a></td>
        <td>27 March 2014</td>
        <td>London, UK</td>
    </tr>
    <tr>
        <td></td>
        <td>AltAssets Limited Partner Summit</td>
        <td>3-4 June 2014</td>
        <td>London, UK</td>
    </tr>
    <tr>
        <td></td>
        <td>AltAssets Limited Partner Summit</td>
        <td>3-4 June 2014</td>
        <td>London, UK</td>
    </tr>
    <tr>
        <td></td>
        <td>LP-GP Forum: Infrastructure &amp; Real Estate</td>
        <td>7 October 2014</td>
        <td>London, UK</td>
    </tr>
    <tr>
        <td></td>
        <td>Envirotech &amp; Clean Energy Investor Summit</td>
        <td>4-5 November 2014</td>
        <td>London, UK</td>
    </tr>
    <tr>
        <td></td>
        <td>AltAssets Fundraising &amp; IR Forum</td>
        <td>9 December 2014</td>
        <td>Hong Kong</td>
    </tr>
</tbody>

IS it possible to write regex which gives event, date, city separately?

3
  • 1
    Why don't you use a real HTML parser? Commented Feb 20, 2014 at 18:19
  • 1
    What do you mean by gives event, date, city separately? They are already in seperate <td> tags . . . how do you want them to be separated more? Commented Feb 20, 2014 at 18:19
  • @talemyn: I want to extract each of them and store in different var named event, data, city. I could not figure it out Commented Feb 20, 2014 at 18:28

2 Answers 2

1

You should be able to use: <td>.+?</td>

Sign up to request clarification or add additional context in comments.

4 Comments

thanks Paul. Means it's possible. can you please explore to get each td value seperately for each tr
Well, to TimWolla's point above... parsing HTML with RegEx isn't the best tool. If you are, I'd use two loops. The outer loop would be for the rows <tr>.+?</tr> and the inner loop would use the td.
I could not understand the structure of regex here. Can you please give clue by giving loop here
preg_match_all("/(?<=<td>).+?(?=</td>)/", $source_string, $matches) will grab just the values. Since your HTML is very well structured you could map the results array to a new array whose elements each contain an array of three elements (keys being event, date, city).
1
$matches = array();
preg_match_all("/<tr>(.*)<\/tr>/sU", $s, $matches);
$trs = $matches[1];
$td_matches = array();
foreach ($trs as $tr) {
    $tdmatch = array();
    preg_match_all("/<td>(.*)<\/td>/sU", $tr, $tdmatch);
    $td_matches[] = $tdmatch[1];
}
print_r($td_matches);

Put your string in $s. $td_matches contains a nested array with all TD-contents separated by each TR.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.