0

I´ve to parse a HTML site presenting a call list. After converting to XML the structur is:

<body>
    <form name="mainform" method="POST" action="baz" class="all">
        <input type="submit" value="" style="position:absolute;top:-9999px;left:-9999px;" name="apply"/>
        <p>foo</p>
        <div class="bar">
            ..
        </div>
        <br/>
        <div class="onPageTabsBox">
            <ul class="tabs onPageTabs">
                ...
            </ul>
        </div>
        <table id="baz">
            <tr class="thead">
                ...
            </tr>
        </table>
        <div id="uiScroll">
            <table id="bla">
                <tr class="showif_in">
                    ...
                </tr>
                ...    
                <tr class="showif_out">
                    <td class="call_out" title="outbound call" datalabel="29.12.19 11:13"/>
                    <td>29.12.19 11:13</td>
                    <td title="Doe, John (privat) = 0123456789" datalabel="Name / Rufnummer">
                        <a href=" " onclick="return onDial('0123456789');">Doe, John (privat)</a>
                    </td>
                    <td datalabel="foo">bar</td>
                    <td title="987654 (Internet)" datalabel="own number">987654</td>
                    <td class="duration" data-timestr="0:02" datalabel="duration">2 Min</td>
                    <td class="btncolumn">
                        ...                        
                    </td>
                </tr>
                <tr class="showif_out">
                    ...
                </tr>

Function I need is to get phone numbers from incoming, outgoing, ... calls. So I try to get the phone number(s) from that td node, where title contains " = " The function is at present like this:

function getCallList($config, string $type = '')
{
    ...
    $xmlSite = convertHTMLtoXML($response);
    switch ($type) {
        case 'in':
        case 'out':
        case 'fail':
        case 'rejected':
            $query = sprintf('//form/div/table/tr[@class="showif_%s"]', $type);
            break;
        default:                                   // get all recorded calls
            $query = '//form/div/table/tr';
    }
    $rows = $xmlSite->xpath($query);
    foreach ($rows as $row) {
        $numbers = $row->xpath('substring-after(//td[@title], " = ")');
    }
    ...
}

After consulting similar questions here I tried $numbers = $row->evaluate('substring-after(//td[@title], " = ")'); and several other xPath expressions - unfortunately I can't get the substring. Apart from that, I suspect that it should also be possible to get an array with the phone numbers with just one query.

1 Answer 1

1

As mentioned here and here, you unfortunately can't accomplish this in one query with XPath 1.0.

What you could do instead is list all the title attributes belonging to these <td>s, then use preg_match to grab anything that's after an = surrounded by spaces:

$rowTitleAttrs = $xmlSite->xpath('//tr[@class="showif_out"]/td/@title');

$phoneNumbers = [];
foreach ($rowTitleAttrs as $rowTitleAttr) {
  if (preg_match('/(?<= = )(?<phoneNumber>.*?)$/', $rowTitleAttr->title, $matches)) {
    $phoneNumbers[] = $matches['phoneNumber'];
  }
}

I took the liberty of simplifying your XPath query in the process, as a class name should be accurate enough to not have to state the whole path leading to it.

Demo: https://3v4l.org/1oqqA

Sign up to request clarification or add additional context in comments.

5 Comments

thx for your help. ->query doesn´t work for me ->xpath does. The regEx pattern does not matches jet. I try to figure out why. Apart from that, the regular expressions are still as cryptic as xpath for me.
@BlackSenator Oh you're using the SimpleXML version, missed that (though ideally that's something you should've shared). Anyway, edited my answer, should be better now :)
Sorry for the concealment :); But the RegEx doesn´t match -> empty array. Since now the question is drifting in another direction: should I open it a separate question?
I just noticed the change from ->value to ->title; Runs like a charm :)
@BlackSenator Happy to hear it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.