5

I need some help, I have a CSV file that contains an address field, whoever input the data into the original database used commas to separate different parts of the address - for example:

Flat 5, Park Street

When I try to use the CSV file it treats this one entry as two separate fields when in fact it is a single field. I have used Python to strip commas out where they are between inverted commas as it is easy to distinguish them from a comma that should actually be there, however this problem has me stumped.

Any help would be gratefully received.

Thanks.

5
  • 3
    The problem is not how it is stored in the database but how the CSV file was generated. If you still have access to the DB, use python's built-in CSV module to re-generate the CSV file. It will then have properly escaped string sequences. Commented Jan 3, 2013 at 17:49
  • Please show an actual sample of the data you're trying to read (so we can tell if it's quoted in any way), and say what technique you're using to "use" the CSV file. Commented Jan 3, 2013 at 17:49
  • 1
    The proper way to handle this is to enclose the strings in double-quotes. CSV readers treat commas within quoted strings as part of the string. Commented Jan 3, 2013 at 17:49
  • Are you talking about these? en.wiktionary.org/wiki/inverted_comma Commented Jan 3, 2013 at 17:50
  • 1
    Is the address format the same for every record? Meaning, does every line contain the same amount of "unwanted" commas? If yes you can fix this in a few lines with split, surrounding the whole address field with double-quotes - or simply edit the header line to use multiple fields for the address. Commented Jan 3, 2013 at 17:54

2 Answers 2

3

You can define the separating and quoting characters with Python's CSV reader. For example:

With this CSV:

1,`Flat 5, Park Street`

And this Python:

import csv

with open('14144315.csv', 'rb') as csvfile:
    rowreader = csv.reader(csvfile, delimiter=',', quotechar='`')
    for row in rowreader:
        print row

You will see this output:

['1', 'Flat 5, Park Street']

This would use commas to separate values but inverted commas for quoted commas

Sign up to request clarification or add additional context in comments.

Comments

1

The CSV file was not generated properly. CSV files should have some form of escaping of text, usually using double-quotes:

1,John Doe,"City, State, Country",12345

Some CSV exports do this to all fields (this is an option when exporting from Excel/LibreOffice), but ambiguous fields (such as those including commas) must be escaped.

Either fix this manually or properly regenerate the CSV. Naturally, this cannot be fixed programatically.

Edit: I just noticed something about "inverted commas" being used for escaping - if that is the case see Jason Sperske's answer, which is spot on.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.