0

I have a web scraper that saves the scrapes data into a CSV file. The data looks like this:

random text
Johm May
1234 Big Street
Atlanta, GA 30331
acre .14  small
random text
Jane Jones
4321 Little Street
Atlanta, GA 30322
acre .07 small
random text

I would like to:

(1) Add in the columns Name,Street,,Address <--- Note that this sample is delimited by a comma.

(2) I would like to add commas to the address results I posted above. An example would be:

jane jones
,4321 Little Street
,,Atlanta, GA 30344
,,,acre .07 small
,,,random text

Note how the commas are used to push each line to the desired column with the unneeded data acre .07 small and random text being pushed away from the named columns.

How do I do this in python? I can do it by hand, but I'm dealing with thousands of address and I need a simple way to do this in python.

Is it possible to pull all the data into a list after if has been scraped, and to assign a variable for the commas like a = , b = ,, c = ,,, and then to join the variable to a specific line in the list, and then to save it again?

Also, I need to add the column info as well: columns Name,Street,,Address

1
  • I think you'll need to clarify your question quite a bit. It sounds like you want a "sparse" CSV file where each row only has one column filled. And I guess "acre .07 small" and "random text" are both supposed to go in your "Address" column? Commented Oct 6, 2012 at 16:27

1 Answer 1

2

I'm just guessing what you mean on a lot of this, since your question seems to be missing some details, but this should get you something similar to what you want:

import csv

with open('data.txt', 'r') as f:
    with open('data.csv', 'wb') as csv_out:
        line_iter = iter(l.rstrip('\n') for l in f)
        writer = csv.writer(csv_out)
        writer.writerow(['Name', 'Street', '', 'Address'])
        try:
            line_iter.next()    # discard 'random text' (?)
            while True:
                writer.writerow([line_iter.next(), '', '', ''])
                writer.writerow(['', line_iter.next(), '', ''])
                writer.writerow(['', '', line_iter.next(), ''])
                writer.writerow(['', '', '', line_iter.next()])
                writer.writerow(['', '', '', line_iter.next()])
        except StopIteration:
            pass        # reached end of file

It gives this output for your example data above:

Name,Street,,Address
Johm May,,,
,1234 Big Street,,
,,"Atlanta, GA 30331",
,,,acre .14 small
,,,random text
Jane Jones,,,
,4321 Little Street,,
,,"Atlanta, GA 30322",
,,,acre .07 small
,,,random text
Sign up to request clarification or add additional context in comments.

2 Comments

Why not use f directly, instead of line_iter?
I could do that, but since file.readline() doesn't return an exception, I'd have to add code around each line_iter.next() to check for EOF and break.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.