Efficiently assign columns headers to csv file

Question

I'm enquiring about efficiently assigning column headers to CSV files with a comma delimiter. At the moment time I'm manually assigning the headers once I know how many columns there are. The problem is, the number of columns varies with different files.

So the first Dataframe below has 3 columns. Which I assign via the following.

import pandas as pd

d = ({
    'Col 1' : ['X','Y'],  
    'Col 2' : ['A','B'], 
    'Col 3' : ['C','D'],        
    })

df = pd.DataFrame(data=d)

df.columns = ['A','B','C']

If I have the following df and I use the same code it will return an error.

ValueError: Length mismatch: Expected axis has 2 elements, new values have 3 elements

d = ({
    'Col 1' : ['X','Y'],  
    'Col 2' : ['A','B'],    
     })

df = pd.DataFrame(data=d)

df.columns = ['A','B','C']

I understand this is because there are only 2 columns. I'm asking about efficiently assigning headers A-n.

I know it's not not hard to alter df.columns to ['A','B'] but if I'm doing this multiple times a day it becomes very inefficient.

Why use spaces as the dictionary keys when you can put the actual column names there instead? d = {'A':[...], 'B':[...]}. By the way, after you edited you dictionary, it has only one value now (bvecause all three keys are identical). — DYZ
– DYZ, Commented Aug 22, 2018 at 3:25
I don't know how you're getting the DataFrame from what you've written, but in the second scenario, you're trying to assign a third column name to a DataFrame with only 2 columns. — aydow
– aydow, Commented Aug 22, 2018 at 3:38
@DYZ. @aydow. I've amended the input data. I basically want to iterate through the file and insert Column headers A-n. With n being the final header. At the moment I'm manually counting this. — user9639519
– user9639519, Commented Aug 22, 2018 at 4:07
Looks like you are solving a wrong problem. How does the data in the file look like? Is it a CSV file? — DYZ
– DYZ, Commented Aug 22, 2018 at 4:09
Yep. I'm pulling the same data out of the files but the amount of extra content varies. So each file that's uploaded has to be manually labelled. Does this make sense? — user9639519
– user9639519, Commented Aug 22, 2018 at 4:15

MarianD · Accepted Answer · 2018-08-22 04:32:50Z

1

Use the list comprehension, string built-in module and the length of your dictionary d:

df.columns = ([x for x in string.ascii_uppercase if ord(x) < ord("A") + len(d)])

as string.ascii_uppercase is the 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' string.

You may also use string.ascii_letters or similar constans, if you need more than 26 column headers.

answered Aug 22, 2018 at 4:32

MarianD

14.4k12 gold badges50 silver badges61 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Efficiently assign columns headers to csv file

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related