There are several problems:
getElementsByTagName apparently returns a single node, not an array, so it would not work if you have more than one article tag on the page. Instead use find which does return an array;
- But once you make that switch, you cannot use
find on a result of find, so you should do that on each individual matched article tag, or better use a combined selector as argument to find;
- Main issue: You must retrieve the text content of the node explicitly with
->plaintext, otherwise you get the object representation of the node, with all its attributes and internals;
- Some of the text contains HTML entities like
’. These can be decoded with html_entity_decode.
So this code should work:
$a = array();
foreach ($html->find('article h2') as $h2) { // any h2 within article
$a[] = html_entity_decode($h2->plaintext);
}
Using array_map, you could also do it like this:
$a = array_map(function ($h2) { return html_entity_decode($h2->plaintext); },
$html->find('article h2'));
If you need to retrieve other tags within articles as well, to store their texts in different arrays, then you could do as follows:
$a = array();
$b = array();
foreach ($html->find('article') as $article) {
foreach ($article->find('h2') as $h2) {
$a[] = html_entity_decode($h2->plaintext);
}
foreach ($article->find('h3') as $h3) {
$b[] = html_entity_decode($h3->plaintext);
}
}