0

I am trying to fix the first row of a CSV file. If column name in header starts from anything other than a-z, NUM has to be prepended. The following code fixes the special characters in each column of the first row but somehow can't get the !a-z.

path = ('test.csv')

for fname in glob.glob(path):

    with open(fname, newline='') as f:
        reader = csv.reader(f)
        header = next(reader) 
        header = [column.replace ('-','_') for column in header]
        header = [column.replace ('[!a-z]','NUM') for column in header]

what am I doing wrong. Please provide suggestions. Thanks

1
  • str.replace does not take regex patterns. You want re.sub instead. Commented Oct 24, 2017 at 17:30

3 Answers 3

1

You can do it like this.

# csv file: 
# 2Hello, ?WORLD
# 1, 2

import csv
with open("test.csv", newline='') as f:
    reader = csv.reader(f)
    header = next(reader)
    print("Original header", header)
    header = [("NUM" + header[indx][1::]) for indx in range(len(header)) if not header[indx][0].isalpha()]
    print("Modified header", header)

Output:

Original header ['2HELLO', '?WORLD']
Modified header ['NUMHELLO', 'NUMWORLD']

The above list comprehension is equivalent to the following for loop:

 for indx in range(len(header)):
        if not header[indx][0].isalpha():
            header[indx] = "NUM" + header[indx][1::]

If you want to replace only numbers, then use the following:

if header[indx][0].isdigit():

You can modify this according to your requirements in case if it changes based on many relevant string functions. https://docs.python.org/2/library/string.html

Sign up to request clarification or add additional context in comments.

Comments

0

I believe you would want to replace the 'column.replace' portion with something along these lines:

re.sub(r'[!a-z]', 'NUM', column)

The full documentation reference is here for specifics: https://docs.python.org/2/library/re.html https://www.regular-expressions.info/python.html

3 Comments

And the ! needs to be replaced with ^.
well I replaced with header = re.sub([^a-z'], 'NUM', str(header)), but another issue is it splits each word column and throws them individually in separate columns.
You would need to do something like this to make the re.sub() approach work: re.sub(r'^([^a-z])', r'NUM\1', header)
0

Since you said you want to prepend 'NUM', you could do something like this (which could be more efficient, but this shows the basic idea).

import string

column = '123'

if column[0] not in string.ascii_lowercase:
    column = 'NUM' + column

# column is now 'NUM123'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.