0

I have a webpage that prints out csv data using a cgi script. I want to parse the data using Python. So far I know that I can use urllib to request the url and obtain the html into one giant byte string. However, it contains much more than the csv data I need, including html tags, newlines, etc... What I would like to do with this data is to be able to filter rows and columns. This data will eventually go into another csv file which I can use as data to display in graphs (highcharts).

How can I parse the html for just the csv? And is there a library that can gather the csv into a dictionaries or even better, a csv file?

Thanks

3
  • 1
    Scrapy maybe scrapy.org Commented Apr 18, 2013 at 22:16
  • Thanks for the suggestion. It looks like Scrapy could definitely work. Unfortunately, this will be a lot more work than I imagined to simply filter rows and columns from a webpage :( Commented Apr 18, 2013 at 22:22
  • 1
    yes, direct DB access would make things much easier Commented Apr 18, 2013 at 22:43

1 Answer 1

1

Try

1) Use urlib as you metioned

2) Use Beautiful soup for geting a part of document you need

3) Use standard csv parser or pandas to parse data you received at the previous step

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.