0

Hi i at the momment try to parse some HTML news for our new fan page. Caus the company do not offer a RSS Feed.

I got a new JS File with that included

function getNews() {
      y = 0;
      news = new Array(7);
      news_content = new Array(5);
      for (var i = 0; i < news.length; i++)
      {
            var table = document.getElementById('news').contentWindow.getElementsByTagName('table')[y];
            news_content[0] = table.rows[0].cells[0].getElementsByTagName('img')[0].src;
            news_content[1] = table.rows[0].cells[1].getElementsByTagName('span')[0].innerHTML;
            news_content[2] = table.rows[0].cells[2].getElementsByTagName('span')[0].innerHTML;
            news_content[3] = table.rows[1].cells[0].getElementsByTagName('p')[0].innerHTML;
            news_content[4] = table.rows[0].cells[0].getElementsByTagName('a')[0].href;
            //alert(news[0] + "\n" + news[1] + "\n" + news[2] + "\n" + news[3] + "\n" + news[4]);
            news[i] = news_content[0] + "\n" + news_content[1] + "\n" + news_content[2] + "\n" + news_content[3] + "\n" + news_content[4] + "\n";
            y = y + 2;
      }
      alert (news[0] + "\n" + news[1] + "\n" + news[2] + "\n" + news[3] + "\n" + news[4])
}

and that html

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Unbenanntes Dokument</title>
<script src="test.js"></script>
</head>

<body>
<a href="page.html" onclick="getNews()">Hier klicken</a>
<iframe id="news" src="http://www.aerosoft-shop.com/list_news.php?cat=fs&lang=de">
</body>
</html>

At last if i pase the source code into the html file it works but is there no way to parse from a external page?

1 Answer 1

1

If you debug your code with a tool like Firebug, a errormessage would be returned like this: Permission denied to access property 'getElementsByTagName'

It's indeed not possible in JavaScript to access a IFrame which points to a different domain, not even subdomain of your domain (according to the comment on this answer it is possible). The question here is, if the site-owner wants you do crawl his site off or at least gave you an okay for it, because its generally not that welcomed to get crawled from other sources (traffic and maybe copyright problems).

Sign up to request clarification or add additional context in comments.

2 Comments

Actually accessing content from a different sub-domain (but same domain) is possible, if you add document.domain = "yourdomain.com"; in both documents.
Thanks for clarification, i edited my answer and pointed to your comment.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.