Scrape using html dom parser

Question

Is it right way to scrape other websites contents into my website using simple_html_dom. If it is wrong, suggest me what is the method to display news in my website.

I didn't know about that. Please guide me how it will works? — Vignesh Bala
– Vignesh Bala, Commented May 14, 2015 at 5:53
I actually never tried RSS feeds before so I can't be of much help there, but, it shouldn't be complicated as the w3schools tutorial is rather short and straightforward. w3schools.com/webservices/rss_intro.asp As for API, it is application interface, what I mean by that is you can check with that website if they already provide some kind of interface for developers like yourself to withdraw news from their site by use of some function. — odedta
– odedta, Commented May 14, 2015 at 5:54
It depends. An API is best, RSS/XML is second best, and scraping is third best. Scraping is the least stable since it is not a recognised mechanism to copy content, and you may find yourself blocked. To aid your long-term scraping, you should add a few seconds delay between each scrape, read/parse/obey robots.txt, use a unique user agent string, and be willing to be blocked if that's what the site owner chooses. — halfer
– halfer, Commented May 14, 2015 at 6:03

Jigar · Accepted Answer · 2015-05-14 06:07:06Z

1

simple_html_dom is some extension I am guessing. If you are looking for something in Core PHP(PHP Extension), use DOMDocument

Basically by scraping you are taking the sites content. And if you are doing the same with their(sites team) consent then its okay, otherwise its not legal(depends on their T&C). Also sites have mechanism to block such acts.

Better ask the site team for content, they might be able to provide the data in much better and simpler way. Like API, RSS or a direct Database.

answered May 14, 2015 at 6:07

Jigar

3,3111 gold badge33 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

halfer Over a year ago

If you crawl in the open (i.e. without a proxy) and have an identifiable user agent string, and you don't overload the scrape target, that is what search engines do and is fine in most jurisdictions. However, republishing the data can sometimes be seen as a breach of copyright, depending on the attitude of the author (e.g. search engines OK, price comparison sites not).

halfer Over a year ago

Note that law is not created by terms and conditions, thankfully - law is created by legislators. T&Cs attempt to bind users into a contract they haven't signed, and how binding that is probably depends on the country in question. Usually sites opposed to scraping (e.g. large consumer auction sites) will send a stiff legal letter, which will be far too expensive to challenge in court. Building a scraper service that is not dependent on the scraping of one site is thus very good advice!

Collectives™ on Stack Overflow

Scrape using html dom parser

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related