1

Is it right way to scrape other websites contents into my website using simple_html_dom. If it is wrong, suggest me what is the method to display news in my website.

4
  • Hmm... RSS feed? API? Commented May 14, 2015 at 5:51
  • I didn't know about that. Please guide me how it will works? Commented May 14, 2015 at 5:53
  • 1
    I actually never tried RSS feeds before so I can't be of much help there, but, it shouldn't be complicated as the w3schools tutorial is rather short and straightforward. w3schools.com/webservices/rss_intro.asp As for API, it is application interface, what I mean by that is you can check with that website if they already provide some kind of interface for developers like yourself to withdraw news from their site by use of some function. Commented May 14, 2015 at 5:54
  • 1
    It depends. An API is best, RSS/XML is second best, and scraping is third best. Scraping is the least stable since it is not a recognised mechanism to copy content, and you may find yourself blocked. To aid your long-term scraping, you should add a few seconds delay between each scrape, read/parse/obey robots.txt, use a unique user agent string, and be willing to be blocked if that's what the site owner chooses. Commented May 14, 2015 at 6:03

1 Answer 1

1

simple_html_dom is some extension I am guessing. If you are looking for something in Core PHP(PHP Extension), use DOMDocument

Basically by scraping you are taking the sites content. And if you are doing the same with their(sites team) consent then its okay, otherwise its not legal(depends on their T&C). Also sites have mechanism to block such acts.

Better ask the site team for content, they might be able to provide the data in much better and simpler way. Like API, RSS or a direct Database.

Sign up to request clarification or add additional context in comments.

2 Comments

If you crawl in the open (i.e. without a proxy) and have an identifiable user agent string, and you don't overload the scrape target, that is what search engines do and is fine in most jurisdictions. However, republishing the data can sometimes be seen as a breach of copyright, depending on the attitude of the author (e.g. search engines OK, price comparison sites not).
Note that law is not created by terms and conditions, thankfully - law is created by legislators. T&Cs attempt to bind users into a contract they haven't signed, and how binding that is probably depends on the country in question. Usually sites opposed to scraping (e.g. large consumer auction sites) will send a stiff legal letter, which will be far too expensive to challenge in court. Building a scraper service that is not dependent on the scraping of one site is thus very good advice!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.