Python Regex replace

Question

Hey I'm trying to figure out a regular expression to do the following.

Here is my string

Place,08/09/2010,"15,531","2,909",650

I need to split this string by the comma's. Though due to the comma's used in the numerical data fields the split doesn't work correctly. So I want to remove the comma's in the numbers before running splitting the string.

Thanks.

If none of the answers are fitting enough, could you point out what you're missing still? — Morten Kristensen
– Morten Kristensen, Commented Nov 17, 2011 at 19:35

kjhughes · Accepted Answer · 2013-12-23 17:27:28Z

49

new_string = re.sub(r'"(\d+),(\d+)"', r'\1.\2', original_string)

This will substitute the , inside the quotes with a . and you can now just use the strings split method.

edited Dec 23, 2013 at 17:27

kjhughes

113k31 gold badges198 silver badges276 bronze badges

answered Nov 17, 2011 at 19:29

spowers

6875 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Code Jockey Over a year ago

...unless the number is "123,456,789" - then it will be "123.456,789"

spowers Over a year ago

good point, I considered only the case where the , was the radix point in a valid number.

Ciarán Over a year ago

Thank, yours is the correct solution I was looking for. Cheers!

Code Jockey Over a year ago

@Ciarán Careful to not have any number greater than 999,999 inside the quotes...

Code Jockey Over a year ago

The expression offered does not match "999,999,999" at all - you would need something with lookaround, like: "(\d+),(?=\d\d\d(,\d\d\d)*") (which should be replaced with "\1)

|

kojiro · Accepted Answer · 2011-11-17 19:16:09Z

26

>>> from StringIO import StringIO
>>> import csv
>>> r = csv.reader(StringIO('Place,08/09/2010,"15,531","2,909",650'))
>>> r.next()
['Place', '08/09/2010', '15,531', '2,909', '650']

answered Nov 17, 2011 at 19:16

kojiro

77.8k20 gold badges151 silver badges217 bronze badges

2 Comments

Ciarán Over a year ago

Though actually I now get this error Traceback (most recent call last): , line 19, in <module> from StringIO import StringIO ImportError: No module named StringIO

Acorn Over a year ago

In Python 3 it's io.StringIO: docs.python.org/py3k/library/…

Morten Kristensen · Accepted Answer · 2011-11-17 19:20:53Z

1

Another way of doing it using regex directly:

>>> import re
>>> data = "Place,08/09/2010,\"15,531\",\"2,909\",650"
>>> res = re.findall(r"(\w+),(\d{2}/\d{2}/\d{4}),\"([\d,]+)\",\"([\d,]+)\",(\d+)", data)
>>> res
[('Place', '08/09/2010', '15,531', '2,909', '650')]

answered Nov 17, 2011 at 19:20

Morten Kristensen

7,6634 gold badges34 silver badges53 bronze badges

Comments

dawg · Accepted Answer · 2013-12-23 19:11:48Z

You could parse a string of that format using pyparsing:

import pyparsing as pp
import datetime as dt

st='Place,08/09/2010,"15,531","2,909",650'

def line_grammar():
    integer=pp.Word(pp.nums).setParseAction(lambda s,l,t: [int(t[0])])
    sep=pp.Suppress('/')
    date=(integer+sep+integer+sep+integer).setParseAction(
              lambda s,l,t: dt.date(t[2],t[1],t[0]))
    comma=pp.Suppress(',')
    quoted=pp.Regex(r'("|\').*?\1').setParseAction(
              lambda s,l,t: [int(e) for e in t[0].strip('\'"').split(',')])
    line=pp.Word(pp.alphas)+comma+date+comma+quoted+comma+quoted+comma+integer
    return line

line=line_grammar()
print(line.parseString(st))
# ['Place', datetime.date(2010, 9, 8), 15, 531, 2, 909, 650]

The advantage is you parse, convert, and validate in a few lines. Note that the ints are all converted to ints and the date to a datetime structure.

Robus · Accepted Answer · 2011-11-17 19:20:26Z

0

a = """Place,08/09/2010,"15,531","2,909",650""".split(',')
result = []
i=0
while i<len(a):
    if not "\"" in a[i]:
        result.append(a[i])
    else:
        string = a[i]
        i+=1
        while True:
            string += ","+a[i]
            if "\"" in a[i]:
                break
            i+=1
        result.append(string)
    i+=1
print result

Result:
['Place', '08/09/2010', '"15,531"', '"2,909"', '650']
Not a big fan of regular expressions unless you absolutely need them

answered Nov 17, 2011 at 19:20

Robus

8,2895 gold badges52 silver badges70 bronze badges

Comments

Code Jockey · Accepted Answer · 2011-11-17 19:20:21Z

-1

If you need a regex solution, this should do:

r"(\d+),(?=\d\d\d)"

then replace with:

"\1"

It will replace any comma-delimited numbers anywhere in your string with their number-only equivalent, thus turning this:

Place,08/09/2010,"15,531","548,122,909",650

into this:

Place,08/09/2010,"15531","548122909",650

I'm sure there are a few holes to be found and places you don't want this done, and that's why you should use a parser!

Good luck!

answered Nov 17, 2011 at 19:20

Code Jockey

6,7236 gold badges36 silver badges45 bronze badges

2 Comments

Code Jockey Over a year ago

Does the downvoter dare to tell me why my solution doesn't answer the question?

user97370 Over a year ago

The question is very unclear, but I think it was really asking how to ignore commas in double quotes (that is, how to parse a line of a csv file) and the example given was just one example of something that got parsed wrong by naive comma-splitting. The regexp fails on cases like '1000,1234' as well as ignoring double quotes. The downvoter wasn't me though, so consider this just my guess as to his or her motivation. The poster who suggested using the csv module clearly gave the right answer I think.

Collectives™ on Stack Overflow

Python Regex replace

6 Answers 6

8 Comments

2 Comments

Comments

Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

8 Comments

2 Comments

Comments

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related