3

I've search for quite a time, but I haven't found any similar question. If there is, please let me know!

I am currently trying to divide one dataframe into n dataframes where the n is equal to the number of columns of the original dataframe. All the new resulting dataframes must always keep the first column of the original dataframe. An extra would be gather all togheter in a list, for example, for further access.

In order to visualize my intention, here goes an brief example:

 >> original df

 GeneID   A      B      C      D      E
   1     0.3    0.2    0.6    0.4    0.8
   2     0.5    0.3    0.1    0.2    0.6
   3     0.4    0.1    0.5    0.1    0.3
   4     0.9    0.7    0.1    0.6    0.7
   5     0.1    0.4    0.7    0.2    0.5

My desired output would be something like this:

 >> df1

 GeneID   A
   1     0.3 
   2     0.5
   3     0.4
   4     0.9
   5     0.1

 >> df2

 GeneID   B
    1    0.2
    2    0.3
    3    0.1
    4    0.7
    5    0.4


 ....

And so on, until all the columns from the original dataframe be covered. What would be the better solution ?

3 Answers 3

1

You can use df.columns to get all column names and then create sub-dataframes:

outdflist =[]
# for each column beyond first: 
for col in oridf.columns[1:]:
    # create a subdf with desired columns:
    subdf = oridf[['GeneID',col]]
    # append subdf to list of df: 
    outdflist.append(subdf)

# to view all dataframes created: 
for df in outdflist:
    print(df)

Output:

   GeneID    A
0       1  0.3
1       2  0.5
2       3  0.4
3       4  0.9
4       5  0.1
   GeneID    B
0       1  0.2
1       2  0.3
2       3  0.1
3       4  0.7
4       5  0.4
   GeneID    C
0       1  0.6
1       2  0.1
2       3  0.5
3       4  0.1
4       5  0.7
   GeneID    D
0       1  0.4
1       2  0.2
2       3  0.1
3       4  0.6
4       5  0.2
   GeneID    E
0       1  0.8
1       2  0.6
2       3  0.3
3       4  0.7
4       5  0.5

Above for loop can also be written more simply as list comprehension:

outdflist = [ oridf[['GeneID', col]] 
              for col in oridf.columns[1:] ]
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, It work just fine. Howerver, I was trying to do it without looping. Wen's answer do it perfectiy
1

You can do with groupby

d={'df'+ str(x): y for x , y in df.groupby(level=0,axis=1)}
d
Out[989]: 
{'dfA':      A
 0  0.3
 1  0.5
 2  0.4
 3  0.9
 4  0.1, 'dfB':      B
 0  0.2
 1  0.3
 2  0.1
 3  0.7
 4  0.4, 'dfC':      C
 0  0.6
 1  0.1
 2  0.5
 3  0.1
 4  0.7, 'dfD':      D
 0  0.4
 1  0.2
 2  0.1
 3  0.6
 4  0.2, 'dfE':      E
 0  0.8
 1  0.6
 2  0.3
 3  0.7
 4  0.5, 'dfGeneID':    GeneID
 0       1
 1       2
 2       3
 3       4
 4       5}

4 Comments

Thank you! Very breif and simple.Is there any way to allocate the new dataframes in a list instead of a dictionnaire ?
@JoãoFernandes you just need [ y for x , y in df.groupby(level=0,axis=1)]
@JoãoFernandes is this what you need ?
Yes, it is! Thank you!
0

You can create a list of column names, and manually loop through and create a new DataFrame each loop.

>>> import pandas as pd
>>> d = {'col1':[1,2,3], 'col2':[3,4,5], 'col3':[6,7,8]}
>>> df = pd.DataFrame(data=d)
>>> df
   col1  col2  col3
0     1     3     6
1     2     4     7
2     3     5     8
>>> newstuff=[]
>>> columns = list(df)
>>> for column in columns:
...     newstuff.append(pd.DataFrame(data=df[column]))

Unless your dataframe is unreasonably massive, above code should serve its job.

1 Comment

It will help if you can explain how the code is working.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.