Re-format items inside list read from CSV file in Python

Question

I have some lines in a CSV file like this:

1000001234,Account Name,0,0,"3,711.32",0,0,"18,629.64","22,340.96",COD,"20,000.00",Some string,Some string 2

If you notice, some numbers are enclosed in " " and has a thousand separator ",". I want to remove the thousand separator and the double quote enclosure. For the qoute enclosure, I'm thinking of using string.replace() but how about the comma inside the quote marks?

What's the best way of doing this in Python?

Dumb Guy · Accepted Answer · 2009-12-08 03:48:48Z

2

You could simply parse the CSV, make the necessary changes and then write it again.

(I haven't tested this code but it should be something like this)

import csv
reader = csv.reader(open('IN.csv', 'r'))
writer = csv.writer(open('OUT.csv', 'w')
for row in reader:
 # do stuff to the row here
 # row is just a list of items
 writer.writerow(row)

answered Dec 8, 2009 at 3:48

Dumb Guy

3,44623 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Dumb Guy Over a year ago

I can't comment on the other post, but if you replace all the commas, you'll also destroy all the CSV commas and it won't be a CSV file anymore.

thebat Over a year ago

Definitely the way to go. Use csv module in the standard library.

FrancisV Over a year ago

@Dumb Guy, that's why I wanted to remove the commas inside the quote marks and not anywhere else. Thanks for the tip!

FrancisV Over a year ago

Other than re-writing the CSV file using csv.writer, is there another way?

Dumb Guy Over a year ago

You could make it into comma separated string by using newrow = ','.join(row), which should work if you've removed all the commas in the strings.

|

Joel Bender · Accepted Answer · 2009-12-08 04:06:17Z

2

Here is a bit of regular expression fiddling that will do the trick:

>>> import re
>>> p = re.compile('["]([^"]*)["]')
>>> x = """1000001234,Account Name,0,0,"3,711.32",0,0,"18,629.64","22,340.96",COD,"20,000.00",Some string,Some string 2"""
>>> p.sub(lambda m: m.groups()[0].replace(',',''), x)
'1000001234,Account Name,0,0,3711.32,0,0,18629.64,22340.96,COD,20000.00,Some string,Some string 2'

Removes the commas from the parts of the string that is between pairs of quotes.

answered Dec 8, 2009 at 4:06

Joel Bender

8854 silver badges8 bronze badges

Comments

Alex Martelli · Accepted Answer · 2009-12-08 03:49:00Z

1

If all you want is to remove double quotes and commas from a string, a couple of replaces will do it:

s = s.replace('"','').replace(',','')

A faster way is to use s.translate, but that requires a minimum of preparation:

import string
identity = string.maketrans('', '')

...

s = s.translate(identity, '",')

This removes any occurrence of double quotes or commas, and does it pretty fast too. In general, the .translate method of string objects is the best way to remove certain kinds of characters from a string (as well as possibly performing some character-to-character translation, but, by using a translate table such as the identity one I show here, the translation part may in fact be easily bypassed). Note that .translate works a bit differently for Unicode objects (and therefore for Python 3 strings, too) -- I'm giving the approach that's suitable for plain Python 2 string objects.

answered Dec 8, 2009 at 3:49

Alex Martelli

887k175 gold badges1.3k silver badges1.4k bronze badges

2 Comments

FrancisV Over a year ago

But this would also remove the occurence of the commas outside of the quote marks, right?

Alex Martelli Over a year ago

@Francis, yes, it removes every comma within a field (use the csv module to parse the line into fields -- the comma-removal from individual fields is a follow-on step).

YOU · Accepted Answer · 2009-12-08 04:13:52Z

Here is something I just tested, you may not need pprint, I just want to use for clear output.

test.csv

1000001234,Account Name,0,0,"3,711.32",0,0,"18,629.64","22,340.96",COD,"20,000.00",Some string,Some string 2
1000001234,Account Name,0,0,"3,711.32",0,0,"18,629.64","22,340.96",COD,"20,000.00",Some string,Some string 2

Code, use csv reader, and pass each item to parseNum function to check valid digit or not.

from pprint import pprint
import csv

def parseNum(x):
    xx=x.replace(",","")
    if not xx.replace(".","").isdigit(): return x
    return "." in xx and float(xx) or int(xx)

x=[map(parseNum,line) for line in csv.reader(open("test.csv"))]

pprint(x)

Output

[[1000001234,
  'Account Name',
  0,
  0,
  3711.3200000000002,
  0,
  0,
  18629.639999999999,
  22340.959999999999,
  'COD',
  20000.0,
  'Some string',
  'Some string 2'],
 [1000001234,
  'Account Name',
  0,
  0,
  3711.3200000000002,
  0,
  0,
  18629.639999999999,
  22340.959999999999,
  'COD',
  20000.0,
  'Some string',
  'Some string 2']]

Note: If you need good precision on float numbers, replace float with Decimal

eric.christensen · Accepted Answer · 2009-12-08 05:27:54Z

1

Use the csv module. It has all sorts of constants and parameters to help you set the delimiters, quotes, and everything else for the type of file you are working with. It even has a Sniffer that can help you identify the csv format of the file. In fact this is the only module I have found that can properly and easily work with csv files.

http://docs.python.org/library/csv.html

answered Dec 8, 2009 at 5:27

eric.christensen

3,2514 gold badges32 silver badges35 bronze badges

Comments

Robert Rossney · Accepted Answer · 2009-12-08 06:08:12Z

You should absolutely use the csv module. If you use a csv.reader, you only have one very small problem: testing fields to see if they're numbers, and stripping commas if they are. I've packaged it as a generator:

import csv

def read_and_fix_numbers(f):
    """Iterate over a file object that returns CSV data, stripping commas out of numbers."""
    for row in csv.reader(f):
        for field in row:
            try:
                x = float(field)
                field.replace(",", "")
            except ValueError:
                pass
            fixed.append(field)
        yield fixed

Usage:

>>> data = '1000001234,Account Name,0,0,"3,711.32",0,0,"18,629.64","22,340.96",COD,"20,000.00",Some string,Some string 2'
>>> import StringIO
>>> f = StringIO.StringIO(data)
>>> for row in read_and_fix_numbers(f):
        print row
['1000001234', 'Account Name', '0', '0', '3711.32', '0', '0', '18629.64', '22340.96', 'COD', '20000.00', 'Some string', 'Some string 2']

Collectives™ on Stack Overflow

Re-format items inside list read from CSV file in Python

6 Answers 6

6 Comments

Comments

2 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

6 Comments

Comments

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related