1

I'm trying to fetch all the visible text from a website, I'm using python-scrapy for this work. However what i observe scrapy only works with HTML tags such as div,body,head etc. and not with angular js tags such as ng-view, if there is any element within ng-view tags and when I do a right-click on the page and do view source then the content inside the tag doesn't appear and it displays like <ng-view> </ng-view>, So how can I use python to scrap the elements within this ng-view tags.Thanks in advance..

1

1 Answer 1

1

To answer your question

how can I use python to scrap the elements within this ng-view tags

You can't.

The content you want to scrape renders on the client side(browser), what scrapy get's you is just static content from server, your browser than interprets the HTML code and renders the JS code. And JS code than fetches different content from server again and makes some stuff with it.

Can it be done?

Yes!

One of the ways is to use some sort oh headless browser like http://phantomjs.org/ to fetch all the content. Once you have the content you can save it and scrape it as you wish. The thing is that this kind of web scraping is not as easy and straight forward as just scraping regular HTML. There is a reason why Google still doesn't scrape web pages that render their content via JS.

Sign up to request clarification or add additional context in comments.

4 Comments

Is there any way I can do it using python, I don't have any experience working on JS.
you can save the content rendered in the headless browser, save it to your system, and than scrape the content with Python.
@min2bro if you want to use Python, you can use Selenium. It has a Python client driver - seleniumhq.org/download
@AlesMaticic Is there a way to do it within the client browser? I have a chrome extension that reads a page on the domain via XHR, but I'm running into the same issue. When I pass the result source to jquery, it doesn't contain the dynamic <div ng-view> code from Angular. Can I $compile it or something to get the full page code?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.