0

I have multiple CSV files that could represent similar things in multiple ways. For instance, 15 years can be written either as age: 15, age (years): 15, age: 15 years (these are all the patterns I've seen till now). I'd like to replace all those with 15 years. I know how to do it when I know the actual age or the column number, but the age is definitely different for each occurrence and the column is not fixed. The csv files could be like below:

CSV1:

h1,h2,h3
A1,age:15,hh
B3,age:10,fg

Desired CSV1

h1,h2,h3
A1,15 years,hh
B3,10 years,fg

When ever its just age: 15, its definitely years and not months or any other unit.

6
  • For the age field will the numbers just suffice? You can use the str.translae method if so. Create a table that maps letters to empty strings. An example is here Commented Jan 16, 2015 at 0:58
  • @b10n: That sounds like a good idea...except you've left out a lot of details -- so I suggest you post an answer with some actual code in it. Commented Jan 16, 2015 at 1:00
  • @dan: How do you propose determining which column to fix? Commented Jan 16, 2015 at 1:02
  • @martineau By "which column to fix" if you mean which column in the file to use, to be frank I have no answer because I have multiple files, each processed by researchers from all over the world. So the format is not the same. Commented Jan 16, 2015 at 1:06
  • @b10n I need "years" following the number. Commented Jan 16, 2015 at 1:09

3 Answers 3

1

Use re.sub like below,

re.sub(r'(,|^)(?:age\s*(?:\(years\))?:\s*(\d+)\s*(?:years)?)(?=,|$)',
       r'\1\2 years', string)

DEMO

Example:

import re
import csv
with open('file') as f:
    reader = csv.reader(f)
    for i in reader:
        print(re.sub(r'(,|^)(?:age\s*(?:\(years\))?:\s*(\d+)\s*(?:years)?)(?=,|$)', r'\1\2 years', ','.join(i)))

Output:

h1,h2,h3
A1,15 years,hh
B3,10 years,fg

OR

for i in reader:
    print(re.sub(r'(,|^)[^,\n]*age\s*:[^,\n]*\b(\d+)\b[^,\n]*', r'\1\2 years', ','.join(i)))
Sign up to request clarification or add additional context in comments.

3 Comments

I don't think it needs to be that complicated. re.sub(r'age.*?: (\d{1,2})[^,]*', r'\1 years', text)
i made a regex to satisfy these age: 15, age (years): 15, age: 15 years conditions only.
True, but long regular expressions can be really hard to debug. regex101.com/r/hK1uH1/5
1

Use the translate table methods in the string module.

import csv
from string import maketrans
from string import ascii_uppercase, ascii_lowercase
delete = ascii_uppercase + ascii_lowercase + ":"
tran = maketrans("", "")

with open("infile.csv", "rb") as infile, open("output.csv", "wb") as outfile:
    reader = csv.reader(infile)
    writer = csv.writer(outfile)
    for row in reader:
        #assuming the second field here
        row[1] = row[1].translate(tran, delete) + " years"
        writer.writerow(row)

I generally prefer string.translate over regex where applicable as it's easier to follow and debug.

1 Comment

@martineau thanks for the heads up on translate. You were right.
0

Its a guessing game, but if the rule is that you want to convert anything that has the word "year" and some decimal number, this should do.

import re

_is_age_search = re.compile(r"year|age", re.IGNORECASE).search
_find_num_search = re.compile(r"(\d+)").search

outdir = '/some/dir'
for filename in csv_filenames:
    with open(filename) as f_in, open(os.path.join(outdir, filename), 'w') as f_out:
        writer = csv.writer(f_out)
        for row in csv.reader(f_in):
            for i, val in enumerate(row):
                if _is_age_search(val):
                    search = _find_num_search(val)
                    if search:
                        row[i] = "%d years" % search.groups()
            writer.writerow(row)

2 Comments

Thank you. Its just that the word "year" may or may not be there. However, I could try it with the word "age".
@dan - you're right. added a regex search that can do multiple string compares.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.