1

This is my excel input

enter image description here

This is my expected output

I am expecting all possible combinations for all comma seperated values of each columns into separate rows

enter image description here

Current work

df = pd.DataFrame()
for file in files:
   if file.endswith('.xlsx'):
       df = df.append(pd.read_excel('downloads/' + file), ignore_index=True) 
df.head() 
df.to_excel(r'downloads/merged.xlsx')

df.type_c = df.type_c.str.split(',')
df1 = df.explode('type_c') 

df1.language_c = df1.language_c.str.split(',')
df1.explode('language_c')

Here I am exploding multiple columns, Can I get this done in single command, where it can do this exploding for all columns without specifying? OR should it run through a loop for all columns which has ',' in it?

2

2 Answers 2

1

Can just make it a definition.

def explodePandas(files):
    global df, df1 # If needed
    for file in files:
        if file.endswitch('.xlsx'):
            df = df.append(pd.read_excel('downloads/' + file), ignore_index = True)
    df.head()

    df.to_excel(r'downloads/merged.xlsx')

    df.type_c = df.type_c.str.split(',')
    df1 = df.explode('type_c') 

    df1.language_c = df1.language_c.str.split(',')
    df1.explode('language_c')

explodePandas()
Sign up to request clarification or add additional context in comments.

1 Comment

i dont want to specify every column name, I might have 20 other columns like this but would like explode function on it
0

Maybe this approach could assist you:

Generating working dataframe

  • Code

    dfx = pd.DataFrame(
        data={
            'scope': ['internal'],
            'type_c': ['bmm, pitcher'],
            'subtype_c': ['ad experiment'],
            'language_c': ['en, esp'],
        }
    )
    
  • Result

    dfx
    
        scope       type_c          subtype_c       language_c
    0   internal    bmm, pitcher    ad experiment   en, esp
    

Exploding rows based on entries that contain a comma

  • Code

    ## Iterable of all column names in dataframe
    cols = dfx.columns
    
    ## Looping through every column to split and make cross join
    for col in cols:
    
        ## Exploding a column into various columns using ',' as a separator
        dfx2 = dfx[col].str.split(pat=',', expand=True).T
    
        ## Renaming the obtained exploded result to match the original column name
        dfx2.rename(columns={0: col}, inplace=True)
    
        ## Dropping the processed column from the original dataframe
        dfx.drop(col, axis=1, inplace=True)
    
        ## Conducting a cross join between the 'exploded column' and the original dataframe
        dfx = pd.merge(
            left=dfx,
            right=dfx2,
            how='cross',
        )
    
    ## Ensuring that you only keep the columns from the original list
    ### Note: this is a 'hard-coded' solution to deal with the additional columns obtained with the cross join
    dfx = dfx.loc[:, cols].copy()
    
  • Result

    dfx
    
        scope   type_c          subtype_c       language_c
    0   internal    bmm         ad experiment   en
    1   internal    bmm         ad experiment   esp
    2   internal    pitcher     ad experiment   en
    3   internal    pitcher     ad experiment   esp
    

Hope this approach helps you!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.