1

I'm trying to parse some html with PhpQuery, but it is not easy for me...

I need to extract only the URL(href tag) to an array but it is not working.

Please see this code for example purposes only:

$doc = phpQuery::newDocumentHTML('<div align = "left" style="background-color:#FFFFFF;border:1px solid #C3D9FF"> </p>

        <table cellPadding="2" cellSpacing="0" width="100%" height="60" style="border-collapse: collapse; ">

          <tr>
            <td align="left" width="531" height="20"><small>
            <strong>

            <a href="/1153414/">

            <font style="FONT-SIZE: 13px; LINE-HEIGHT: 14px">Industrial</font><a/> </a></small></strong>
            </td>

          </tr>
          <tr>
            <td align="left" vAlign="top" width="100%" height="1">
            <table align="left" border="0" cellPadding="0" cellSpacing="0" width="736">
              <tr>
                <td align="left" vAlign="top" width="67">
                <strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
                <font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">
                Data:</font></strong></td>

                <td align="left" vAlign="top" width="150">
                <font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">&nbsp;4-1-2011 </font></td>
                <td align="left" vAlign="top" width="59">
                <font color="#000000" face="Arial" size="2">
                <strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
                <font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Zona:</font></strong></td>
                <td align="left" vAlign="top" width="473">
                <font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">&nbsp; Castelo Branco</font></td>

              </tr>
              <tr>
                <td align="left" vAlign="top" width="67">
                <strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
                <font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Categoria:</font></strong></td>
                <td align="left" vAlign="top" width="150">
                <font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">&nbsp;Indústria / Produção </font></td>
                <td align="left" vAlign="top" width="59">

                <font color="#000000" face="Arial" size="2">
                <strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
                <font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Empresa:</font></strong></td>
                <td align="left" vAlign="top" width="473">
                <font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">&nbsp;Isotransfo, Unipessoal LDA</font></td>
              </tr>
              </table>
            </td>

          </tr>
        </table>

 </p>

        <table cellPadding="2" cellSpacing="0" width="100%" height="60" style="border-collapse: collapse; ">
          <tr>
            <td align="left" width="531" height="20"><small>
            <strong>

            <a href="/1153399/">

            <font style="FONT-SIZE: 13px; LINE-HEIGHT: 14px">Admite-se<a/> </a> </font></small></strong>
            </td>
          </tr>
          <tr>
            <td align="left" vAlign="top" width="100%" height="1">
            <table align="left" border="0" cellPadding="0" cellSpacing="0" width="736">

              <tr>
                <td align="left" vAlign="top" width="67">
                <strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
                <font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">
                Data:</font></strong></td>
                <td align="left" vAlign="top" width="150">
                <font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">&nbsp;4-1-2011 </font></td>
                <td align="left" vAlign="top" width="59">

                <font color="#000000" face="Arial" size="2">
                <strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
                <font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Zona:</font></strong></td>
                <td align="left" vAlign="top" width="473">
                <font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">&nbsp; Castelo Branco</font></td>
              </tr>
              <tr>

                <td align="left" vAlign="top" width="67">
                <strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
                <font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Categoria:</font></strong></td>
                <td align="left" vAlign="top" width="150">
                <font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">&nbsp;Indústria / Produção </font></td>
                <td align="left" vAlign="top" width="59">
                <font color="#000000" face="Arial" size="2">
                <strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">

                <font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Empresa:</font></strong></td>
                <td align="left" vAlign="top" width="473">
                <font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">&nbsp;Isotransfo, Unipessoal LDA</font></td>
              </tr>
              </table>
            </td>
          </tr>
        </table>

 </p>

        <table cellPadding="2" cellSpacing="0" width="100%" height="60" style="border-collapse: collapse; ">
          <tr>
            <td align="left" width="531" height="20"><small><font face="Arial">
            <strong>

            <a href="/1153280/">

            <font style="FONT-SIZE: 13px; LINE-HEIGHT: 14px">Precisa-se</font><a/> </a> </font></small></strong>

            </td>
          </tr>
          <tr>
            <td align="left" vAlign="top" width="100%" height="1">
            <table align="left" border="0" cellPadding="0" cellSpacing="0" width="736">
              <tr>
                <td align="left" vAlign="top" width="67">
                <strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
                <font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">

                Data:</font></strong></td>
                <td align="left" vAlign="top" width="150">
                <font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">&nbsp;4-1-2011 </font></td>
                <td align="left" vAlign="top" width="59">
                <font color="#000000" face="Arial" size="2">
                <strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
                <font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Zona:</font></strong></td>

                <td align="left" vAlign="top" width="473">
                <font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">&nbsp; ( Todas as Zonas )</font></td>
              </tr>
              <tr>
                <td align="left" vAlign="top" width="67">
                <strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
                <font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Categoria:</font></strong></td>

                <td align="left" vAlign="top" width="150">
                <font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">&nbsp;Saúde / Medicina / Enfermagem </font></td>
                <td align="left" vAlign="top" width="59">
                <font color="#000000" face="Arial" size="2">
                <strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
                <font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Empresa:</font></strong></td>
                <td align="left" vAlign="top" width="473">
                <font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">&nbsp;Emprego Radiologia</font></td>

              </tr>
              </table>
            </td>
          </tr>
        </table>

 </p>

        <table cellPadding="2" cellSpacing="0" width="100%" height="60" style="border-collapse: collapse; ">
          <tr>

            <td align="left" width="531" height="20"><small><font face="Arial">
            <strong>

            <a href="/1152665/">

            <font style="FONT-SIZE: 13px; LINE-HEIGHT: 14px">Operadores</font><a/> </a> </font></small></strong>
            </td>
          </tr>

          <tr>
            <td align="left" vAlign="top" width="100%" height="1">
            <table align="left" border="0" cellPadding="0" cellSpacing="0" width="736">
              <tr>
                <td align="left" vAlign="top" width="67">
                <strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
                <font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">
                Data:</font></strong></td>

                <td align="left" vAlign="top" width="150">
                <font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">&nbsp;4-1-2011 </font></td>
                <td align="left" vAlign="top" width="59">
                <font color="#000000" face="Arial" size="2">
                <strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
                <font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Zona:</font></strong></td>
                <td align="left" vAlign="top" width="473">
                <font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">&nbsp; Viseu</font></td>

              </tr>
              <tr>
                <td align="left" vAlign="top" width="67">
                <strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
                <font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Categoria:</font></strong></td>
                <td align="left" vAlign="top" width="150">
                <font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">&nbsp;Lojas / Comércio / Balcão </font></td>
                <td align="left" vAlign="top" width="59">

                <font color="#000000" face="Arial" size="2">
                <strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
                <font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Empresa:</font></strong></td>
                <td align="left" vAlign="top" width="473">
                <font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">&nbsp;Dia Portugal Supermercados - Soc. Unip., Lda.</font></td>
              </tr>
              </table>
            </td>

          </tr>
        </table>

 </p>

        <table cellPadding="2" cellSpacing="0" width="100%" height="60" style="border-collapse: collapse; ">
          <tr>
            <td align="left" width="531" height="20"><small><font face="Arial">
            <strong>

            <a href="/1153524/">

            <font style="FONT-SIZE: 13px; LINE-HEIGHT: 14px">Responsável</font><a/> </a> </font></small></strong>
            </td>
          </tr>
          <tr>
            <td align="left" vAlign="top" width="100%" height="1">
            <table align="left" border="0" cellPadding="0" cellSpacing="0" width="736">

              <tr>
                <td align="left" vAlign="top" width="67">
                <strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
                <font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">
                Data:</font></strong></td>
                <td align="left" vAlign="top" width="150">
                <font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">&nbsp;4-1-2011 </font></td>
                <td align="left" vAlign="top" width="59">

                <font color="#000000" face="Arial" size="2">
                <strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
                <font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Zona:</font></strong></td>
                <td align="left" vAlign="top" width="473">
                <font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">&nbsp; Santarem</font></td>
              </tr>
              <tr>

                <td align="left" vAlign="top" width="67">
                <strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
                <font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Categoria:</font></strong></td>
                <td align="left" vAlign="top" width="150">
                <font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">&nbsp;Comercial / Vendas </font></td>
                <td align="left" vAlign="top" width="59">
                <font color="#000000" face="Arial" size="2">
                <strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">

                <font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Empresa:</font></strong></td>
                <td align="left" vAlign="top" width="473">
                <font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">&nbsp;ALDI Supermercados Lda.</font></td>
              </tr>
              </table>
            </td>
          </tr>
        </table>
</div>');
//echo $doc['div table a']->attr('href');
foreach ($doc['div table a'] as $a) {
    $hrefs[] .= pq($a)->attr('href');
}
print_r ($hrefs);

If I echo out the code bellow it get's me only the href url, and it is ok:

echo $doc['div table a']->attr('href');

If I run the foreach statement I got an array with some null values:

foreach ($doc['div table a'] as $a) {
    $hrefs[] .= pq($a)->attr('href');
}
print_r ($hrefs);

The array I got is:

Array ( 
    [0] => /1153414/ 
    [1] => 
    [2] => /1153399/ 
    [3] => 
    [4] => /1153280/ 
    [5] => 
    [6] => /1152665/ 
    [7] => 
    [8] => /1153524/ 
    [9] => 
    ) 

How can I generate an array like this:

Array ( 
    [0] => /1153414/ 
    [1] => /1153399/ 
    [2] => /1153280/ 
    [3] => /1152665/ 
    [4] => /1153524/ 
    ) 

If you can give me some clues, I would be appreciated.

Sorry my bad english

Best Regards,

2 Answers 2

3

You have five instances of <a/> in your code. This creates an empty a element, rather than closing an existing one. Remove them and your code should work fine.


Edit A very simple way of removing empty values from an array is running array_filter with no second argument:

$hrefs = array_filter($hrefs);
Sign up to request clarification or add additional context in comments.

1 Comment

How! I see. The HTML is not mine, is from a website I need to scrap. Thanks for the help.
1
if (pq($a)->attr('href') != '') {
   $hrefs[] .= pq($a)->attr('href');
}

3 Comments

Thanks for the reply. It works! Do you understand why when I specify the "attr('href')" on the foreach I got the ''?
@What is the Question, your solution resolves the problem "array_filter($hrefs)" does the job in a silent way.
@Andre, lonesomeday was quicker at explaining :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.