2

I'm enquiring about efficiently assigning column headers to CSV files with a comma delimiter. At the moment time I'm manually assigning the headers once I know how many columns there are. The problem is, the number of columns varies with different files.

So the first Dataframe below has 3 columns. Which I assign via the following.

import pandas as pd

d = ({
    'Col 1' : ['X','Y'],  
    'Col 2' : ['A','B'], 
    'Col 3' : ['C','D'],        
    })

df = pd.DataFrame(data=d)

df.columns = ['A','B','C']

If I have the following df and I use the same code it will return an error.

ValueError: Length mismatch: Expected axis has 2 elements, new values have 3 elements

d = ({
    'Col 1' : ['X','Y'],  
    'Col 2' : ['A','B'],    
     })

df = pd.DataFrame(data=d)

df.columns = ['A','B','C']

I understand this is because there are only 2 columns. I'm asking about efficiently assigning headers A-n.

I know it's not not hard to alter df.columns to ['A','B'] but if I'm doing this multiple times a day it becomes very inefficient.

5
  • Why use spaces as the dictionary keys when you can put the actual column names there instead? d = {'A':[...], 'B':[...]}. By the way, after you edited you dictionary, it has only one value now (bvecause all three keys are identical). Commented Aug 22, 2018 at 3:25
  • I don't know how you're getting the DataFrame from what you've written, but in the second scenario, you're trying to assign a third column name to a DataFrame with only 2 columns. Commented Aug 22, 2018 at 3:38
  • @DYZ. @aydow. I've amended the input data. I basically want to iterate through the file and insert Column headers A-n. With n being the final header. At the moment I'm manually counting this. Commented Aug 22, 2018 at 4:07
  • Looks like you are solving a wrong problem. How does the data in the file look like? Is it a CSV file? Commented Aug 22, 2018 at 4:09
  • Yep. I'm pulling the same data out of the files but the amount of extra content varies. So each file that's uploaded has to be manually labelled. Does this make sense? Commented Aug 22, 2018 at 4:15

1 Answer 1

1

Use the list comprehension, string built-in module and the length of your dictionary d:

df.columns = ([x for x in string.ascii_uppercase if ord(x) < ord("A") + len(d)])

as string.ascii_uppercase is the 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' string.

You may also use string.ascii_letters or similar constans, if you need more than 26 column headers.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.