5

Here's my problem. I have a dataframe with x columns and y lines. Some columns are actually lists. I want to transform those columns to multiple columns containing single values.

An example speaks by itself :

My dataframe :

            ans_length ans_unigram_numbers  ...  levenshtein_dist  que_entropy
0             [19, 14]             [12, 8]  ...              9.00     3.189898
1                 [19]                [12]  ...              4.00     3.189898
2                  [0]                 [0]  ...            170.00     4.299996
3                  [0]                 [0]  ...            170.00     4.303341
4                  [0]                 [0]  ...            170.00     4.304335
5                  [0]                 [0]  ...            170.00     4.311820
28                [56]                [23]  ...             24.00     4.110291
29                 [0]                 [0]  ...             56.00     4.181720
...                ...                 ...  ...               ...          ...
1976              [24]                [11]  ...             24.00     3.084963
1977              [24]                [11]  ...             24.00     3.084963
1992  [31, 24, 32, 28]    [14, 15, 17, 11]  ...             18.75     3.292770
1993  [31, 24, 32, 28]    [14, 15, 17, 11]  ...             18.75     3.292770

[1998 rows x 9 columns]

What I expect :

    ans_length_0    ans_length_1    ans_length_2    ans_length_3    \
0             19              14            
1             19                
2              0                
3              0                
4              0                
5              0                
28            56                
29             0                
1976          24                
1977          24                
1992          31              24               32             28    
1993          31              24               32             28    

ans_unigram_numbers_0   ans_unigram_numbers_1   ans_unigram_numbers_2   ans_unigram_numbers_3   \
                   12                       8           
                   12               
                   0                
                   0                
                   0                
                   0                
                   23               
                   0                
                   11               
                   11               
                   14                      15                      17                      11   
                   14                      15                      17                      11   

levenshtein_dist    que_entropy
               9       3.189898
               4       3.189898
             170       4.299996
             170       4.303341
             170       4.304335
             170        4.31182
              24       4.110291
              56        4.18172
              24       3.084963
              24       3.084963
            18.75       3.29277
            18.75       3.29277

The newly generated columns should take the name of the old one, adding an index at the end of it.

2 Answers 2

6

I think you can use:

cols = ['ans_length','ans_unigram_numbers']

df1 = pd.concat([pd.DataFrame(df[x].values.tolist()).add_prefix(x) for x in cols], axis=1)
df = pd.concat([df1, df.drop(cols, axis=1)], axis=1)
Sign up to request clarification or add additional context in comments.

1 Comment

Worked perfectly with an execution time of 2 seconds for 2000 lines :D Short and fast, thanks a lot, I've been working on it for 4 hours now
-1

Based on @jezrael answer, I created a function that do what is asked, from a given dataframe and a given list of columns :

def flattencolumns(df1, cols):
    df = pd.concat([pd.DataFrame(df1[x].values.tolist()).add_prefix(x) for x in cols], axis=1)
    return pd.concat([df, df1.drop(cols, axis=1)], axis=1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.