3

I am looking for a way to remove duplicate lines from a variable:

$x = '<IMGURL>one.jpg</IMGURL>';
$x .= '<IMGURL>two.jpg</IMGURL>';
//remove the following line:
$x .= '<IMGURL>one.jpg</IMGURL>';
$x .= '<IMGURL>third.jpg</IMGURL>';

The output should be:

$x = '<IMGURL>one.jpg</IMGURL><IMGURL>two.jpg</IMGURL><IMGURL>third.jpg</IMGURL>';

Maybe some regex does the trick?

Edit:

Some more info

The source XML:

<?xml version=".0" encoding="utf-8"?>
<SHOP>
  <SHOPITEM>
    <name>BLUE product</name>
    <IMGURL>main_picture.jpg</IMGURL>
    <PRODUCT_VARIANT id="2">
      <name>blue L</name>
      <IMGURL>blue.jpg</IMGURL>
    </PRODUCT_VARIANT>
    <PRODUCT_VARIANT id="3">
      <name>BLUE XL</name>
      <IMGURL>blue.jpg</IMGURL>
    </PRODUCT_VARIANT>
    <PRODUCT_VARIANT id="4">
      <name>BLUE XXL</name>
      <IMGURL>blue.jpg</IMGURL>
    </PRODUCT_VARIANT>
  </SHOPITEM>
</SHOP>

From this I need two unique jpg:

  • main_picture.jpg
  • blue.jpg

The interesting part of the module what is processing the source XML:

foreach($xml->SHOPITEM as $product){
if(isset($product->IMGURL)){$xml_content .= '<IMAGE>'.htmlspecialchars($product->IMGURL).'</IMAGE>'."\n";}

foreach($product->variant as $option){
              if(isset($option->IMGURL)){$xml_content .= '<IMAGE>'.htmlspecialchars($option->IMGURL).'</IMAGE>'."\n";}
                      }
}
6
  • 1
    question is; how are those created in the first place? Commented Mar 10, 2016 at 18:45
  • 1
    it's xml. load it into dom, find the dupes, remove those nodes Commented Mar 10, 2016 at 18:45
  • This was my first idea, with XSLT. But the source XML is too complicated, if it's needed, I can post here a sample. Commented Mar 10, 2016 at 18:46
  • @Adrian yes, it's a good idea. Commented Mar 10, 2016 at 18:48
  • Do you want remove only <IMGURL> tag or relative <PRODUCT_VARIANT> parent? Commented Mar 10, 2016 at 19:03

1 Answer 1

3

This sample code reduce your XML to desired result:

$dom = new DOMDocument();
$dom->formatOutput = True;
libxml_use_internal_errors( 1 );
$dom->loadXML( $x, LIBXML_NOBLANKS );

$xpath = new DOMXPath( $dom );

$nodes = $xpath->query( '//SHOP/SHOPITEM/PRODUCT_VARIANT/IMGURL' );
$found = array();

foreach( $nodes as $key => $node )
{
    if( in_array( $node->nodeValue, $found ) )
    { $node->nodeValue = ''; }
    else
    { $found[] = $node->nodeValue; }
}

$result = $dom->saveXML();

3v4l demo

Basically, simply use an array to retrieve unique values and, after retrieving all <IMGURL> nodes through xpath, with a foreach loop check each node: if they exists in array, you set node value to an empty string, otherwise you add current node value to the array.

Above script analyze only <IMGURL> that have <PRODUCT_VARIANT> as parent node; if you want analyze all <IMGURL> nodes, simply change xpath line in:

$nodes = $xpath->query( '*//IMGURL' );
Sign up to request clarification or add additional context in comments.

1 Comment

Niiiice! Thank you very much.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.