0

I want to search keywords in this xml file. The freshvideo.xml contains "video" tags. I want to do this: e.g., if I search "gear slow", or "new England gear", the search returns the "id" of this "video" element.

Below is a sample of my xml file.

<freshvideos>
    <video>
        <id>
            <![CDATA[ 4f1a6a21e779d227eaff33de8f571f95 ]]>
        </id>
        <title>
            <![CDATA[ New England Snowstorm - \"Low Gear\" ]]>
        </title>
        <ensub>
            <![CDATA[ I put it in low gear and take it slow. ]]>
        </ensub>
        <cnsub>
            <![CDATA[ 我挂了抵挡,慢慢开。 ]]>
        </cnsub>

        <filesrc>
            <![CDATA[ videos/New England Snowstorm Low Gear.mp4 ]]>
        </filesrc>
    </video>
</freshvideos>

I first change all the keywords into lower case, and I also change all xml elements into lower case, to make the search case insensitive.

Currently I'm doing this:

$dom = new DOMDocument;
$dom->load("freshvideos.xml");
$xml = $dom->saveXML($dom);
$xml = strtolower($xml);
$lowerCaseDom = new DOMDocument;
$lowerCaseDom->loadXML($xml);

Problem is: Warning: DOMDocument::loadXML() [domdocument.loadxml]: StartTag: invalid element name in Entity Warning: DOMDocument::loadXML() [domdocument.loadxml]: Sequence ']]>' not allowed in content in Entity

I also thought of using this delimiter:

$xml = strtolower($xml);
$xml2 =<<<XML
echo strtolower($xml);
XML;
$lowerCaseDom->loadXML($xml2);

turned out that the string has quotation marks at the beginning below the "<<

So, how can I get this lowercase search.

Thanks in advance!

3
  • @Kyle could you please look at this? Commented Dec 20, 2013 at 2:48
  • $xml2 is a heredoc string, it's not valid XML. I'm not sure what you're doing there. Can't call functions inside it unless you use {} Commented Dec 20, 2013 at 3:16
  • Try $xml =$dom->saveXML($dom->documentElement); Commented Dec 20, 2013 at 3:16

1 Answer 1

1

When you run your document through strtolower, this is what ends up happening (remember, you're still passing around a string at this point, not a DOMDocument:

<freshvideos>
    <video>
        <id>
            <![cdata[ 4f1a6a21e779d227eaff33de8f571f95 ]]>
        </id>
        <title>
            <![cdata[ new england snowstorm - \"low gear\" ]]>
        </title>
        <ensub>
            <![cdata[ i put it in low gear and take it slow. ]]>
        </ensub>
        <cnsub>
            <![cdata[ 我挂了抵挡,慢慢开。 ]]>
        </cnsub>

        <filesrc>
            <![cdata[ videos/new england snowstorm low gear.mp4 ]]>
        </filesrc>
    </video>
</freshvideos>

Your opening CDATA tag is no longer valid once it's been lower cased like that, so you're going to get weird results when trying to process your document as xml. A CDATA section must be delimited by a string matching <![CDATA[ ]]> and nothing else.

Sign up to request clarification or add additional context in comments.

2 Comments

Hi @Zarazthuztra ! You are right! How blind and stupid I was! Thank you!
No worries dude, we hit small issues like this all the time. "missing semicolon" errors that are so easy to diagnose they're hard :) It's going to be a little more complicated, but I'd recommend grabbing each text element in CDATA that you want, strlower it, and then working with the existing CDATA element to put in the lower cased string. Checkout this as a good starting point for working with CDATA: php.net/manual/en/class.domcdatasection.php

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.