1

I have some lines in a CSV file like this:

1000001234,Account Name,0,0,"3,711.32",0,0,"18,629.64","22,340.96",COD,"20,000.00",Some string,Some string 2

If you notice, some numbers are enclosed in " " and has a thousand separator ",". I want to remove the thousand separator and the double quote enclosure. For the qoute enclosure, I'm thinking of using string.replace() but how about the comma inside the quote marks?

What's the best way of doing this in Python?

6 Answers 6

2

You could simply parse the CSV, make the necessary changes and then write it again.

(I haven't tested this code but it should be something like this)

import csv
reader = csv.reader(open('IN.csv', 'r'))
writer = csv.writer(open('OUT.csv', 'w')
for row in reader:
 # do stuff to the row here
 # row is just a list of items
 writer.writerow(row)
Sign up to request clarification or add additional context in comments.

6 Comments

I can't comment on the other post, but if you replace all the commas, you'll also destroy all the CSV commas and it won't be a CSV file anymore.
Definitely the way to go. Use csv module in the standard library.
@Dumb Guy, that's why I wanted to remove the commas inside the quote marks and not anywhere else. Thanks for the tip!
Other than re-writing the CSV file using csv.writer, is there another way?
You could make it into comma separated string by using newrow = ','.join(row), which should work if you've removed all the commas in the strings.
|
2

Here is a bit of regular expression fiddling that will do the trick:

>>> import re
>>> p = re.compile('["]([^"]*)["]')
>>> x = """1000001234,Account Name,0,0,"3,711.32",0,0,"18,629.64","22,340.96",COD,"20,000.00",Some string,Some string 2"""
>>> p.sub(lambda m: m.groups()[0].replace(',',''), x)
'1000001234,Account Name,0,0,3711.32,0,0,18629.64,22340.96,COD,20000.00,Some string,Some string 2'

Removes the commas from the parts of the string that is between pairs of quotes.

Comments

1

If all you want is to remove double quotes and commas from a string, a couple of replaces will do it:

s = s.replace('"','').replace(',','')

A faster way is to use s.translate, but that requires a minimum of preparation:

import string
identity = string.maketrans('', '')

...

s = s.translate(identity, '",')

This removes any occurrence of double quotes or commas, and does it pretty fast too. In general, the .translate method of string objects is the best way to remove certain kinds of characters from a string (as well as possibly performing some character-to-character translation, but, by using a translate table such as the identity one I show here, the translation part may in fact be easily bypassed). Note that .translate works a bit differently for Unicode objects (and therefore for Python 3 strings, too) -- I'm giving the approach that's suitable for plain Python 2 string objects.

2 Comments

But this would also remove the occurence of the commas outside of the quote marks, right?
@Francis, yes, it removes every comma within a field (use the csv module to parse the line into fields -- the comma-removal from individual fields is a follow-on step).
1

Here is something I just tested, you may not need pprint, I just want to use for clear output.

test.csv

1000001234,Account Name,0,0,"3,711.32",0,0,"18,629.64","22,340.96",COD,"20,000.00",Some string,Some string 2
1000001234,Account Name,0,0,"3,711.32",0,0,"18,629.64","22,340.96",COD,"20,000.00",Some string,Some string 2

Code, use csv reader, and pass each item to parseNum function to check valid digit or not.

from pprint import pprint
import csv

def parseNum(x):
    xx=x.replace(",","")
    if not xx.replace(".","").isdigit(): return x
    return "." in xx and float(xx) or int(xx)

x=[map(parseNum,line) for line in csv.reader(open("test.csv"))]

pprint(x)

Output

[[1000001234,
  'Account Name',
  0,
  0,
  3711.3200000000002,
  0,
  0,
  18629.639999999999,
  22340.959999999999,
  'COD',
  20000.0,
  'Some string',
  'Some string 2'],
 [1000001234,
  'Account Name',
  0,
  0,
  3711.3200000000002,
  0,
  0,
  18629.639999999999,
  22340.959999999999,
  'COD',
  20000.0,
  'Some string',
  'Some string 2']]

Note: If you need good precision on float numbers, replace float with Decimal

Comments

1

Use the csv module. It has all sorts of constants and parameters to help you set the delimiters, quotes, and everything else for the type of file you are working with. It even has a Sniffer that can help you identify the csv format of the file. In fact this is the only module I have found that can properly and easily work with csv files.

http://docs.python.org/library/csv.html

Comments

1

You should absolutely use the csv module. If you use a csv.reader, you only have one very small problem: testing fields to see if they're numbers, and stripping commas if they are. I've packaged it as a generator:

import csv

def read_and_fix_numbers(f):
    """Iterate over a file object that returns CSV data, stripping commas out of numbers."""
    for row in csv.reader(f):
        for field in row:
            try:
                x = float(field)
                field.replace(",", "")
            except ValueError:
                pass
            fixed.append(field)
        yield fixed

Usage:

>>> data = '1000001234,Account Name,0,0,"3,711.32",0,0,"18,629.64","22,340.96",COD,"20,000.00",Some string,Some string 2'
>>> import StringIO
>>> f = StringIO.StringIO(data)
>>> for row in read_and_fix_numbers(f):
        print row
['1000001234', 'Account Name', '0', '0', '3711.32', '0', '0', '18629.64', '22340.96', 'COD', '20000.00', 'Some string', 'Some string 2']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.