2

I want to extract some information from some html code using dom parser, but I'm stuck at a point.

<div id="posts">
    <div class="post">
        <div class="user">me:</div>
        <div class="post">I am an apple</div>
    </div>
    <div class="post">
        <div class="user">you:</div>
        <div class="post">I am a banana</div>
    </div>
    <div class="post">
        <div class="user">we:</div>
        <div class="post">We are fruits</div>
    </div>
</div>

This will print the users.

$users= $html->find('div[class=user]');
foreach($users as $user)
    echo $user->innertext;

This will print the posts.

$posts = $html->find('div[class=post]');
foreach($posts as $post)
    echo $post->innertext;

I want to print them together, and not sepparately, like so:

me:
I am an apple
you:
I am a banana
we:
We are fruits

How can I do this using the parser?

3
  • DOMDocument maybe? Commented Jan 13, 2015 at 12:20
  • if your html is as you have shown, you can use strip_tags() to get your output in very simple way Commented Jan 13, 2015 at 12:35
  • strip_tags will sepparate the code from text, what I want to do is parse, though. Commented Jan 13, 2015 at 12:43

3 Answers 3

1

Using the markup you provided, you can just point out the children of the main div (div#posts), then loop all children. Then for each children just get the first and second ones:

foreach($html->find('div#posts', 0)->children() as $post) {
    $user = $post->children(0)->innertext;
    $post = $post->children(1)->innertext;
    echo $user . '<br/>' . $post . '<hr/>';
}

Though I would really suggest use DOMDocument with this:

$dom = new DOMDocument;
$dom->loadHTML($html_markup);
$xpath = new DOMXpath($dom);
$elements = $xpath->query('//div[@id="posts"]/div[@class="post"]');
foreach($elements as $posts) {
    $user = $xpath->evaluate('string(./div[@class="user"])', $posts);
    $post = $xpath->evaluate('string(./div[@class="post"])', $posts);
    echo $user . '<br/>' . $post . '<hr/>';
}
Sign up to request clarification or add additional context in comments.

2 Comments

Again, what if I don't know if there are more children? I want to parse the exact specific divs.
@GeorgeIrimiciuc the problem with it is that what if there are multiple divs, with classes like that? since the html markup you gave sets specific placement on those values you want, getting it thru children seems straight to the point. anyways you could also use the ->class property to check it also, ->class == "user"
1

Assuming that you are using Simple HTML DOM Parser, you can use find() with comma separator format. Try this:

$posts = $html->find('div.post');
foreach($posts as $post){
  $children = $post->find('div.user,div.post');
  foreach($children as $child){
    echo $child->class.' -- ';
    echo $child->innerText(); echo '<br>';
  }
}

Output

user -- me:
post -- I am an apple
user -- you:
post -- I am a banana
user -- we:
post -- We are fruits

3 Comments

This works in this format, but what if I don't know if there are other childs?
This will not make a difference between users and posts, though.
@GeorgeIrimiciuc $child->class will do.
0

Use the code below

$users= $html->find('div[class=user]');
$posts = $html->find('div[class=post]');
foreach($users as $i=>$user){
    echo $user->innertext."<br>";
echo $posts[$i]->innertext;
    }

Hope this helps you

1 Comment

This will fail because there are more "div class=post" than there are div class=user. (very poor format design)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.