0

I am trying to merge multiple columns where after one column the following column starts in a specific index. for example, as you can see in the code below, I have 15 sets of data from df20 to df90. As seen in the code, I have merge the data i and then followed by another starting from index = 1,000.

So I wanted my output to be df20 followed by df25 starting at index=1000, then followed by df30 starting at index=2000, then followed by df35 at index=3000. I wanted to see all 15 columns but I only have one column in my output.

I have tried it below, but doesn't seem to work. Please help.

dframe = [df20, df25, df30, df35, df40, df45, df50, df55, df60, df65, df70, df75, df80, df85, df90]
for i in dframe:
  a = i.merge((i).set_index((i).index+1000), how='outer', left_index=True, right_index=True)

print(a)

Output:

                      df90_x              df90_y
0                     0.000757                      NaN
1                     0.001435                      NaN
2                     0.002011                      NaN
3                     0.002497                      NaN
4                     0.001723                      NaN
...                        ...                      ...
10995                      NaN             1.223000e-12
10996                      NaN             1.305000e-12
10997                      NaN             1.809000e-12
10998                      NaN             2.075000e-12
10999                      NaN             2.668000e-12

[11000 rows x 2 columns]

Expected Output:

                      df20                 df25                  df30
0                     0.000757             0                     0
1                     0.001435             0                     0
2                     0.002011             0                     0
3                     0.002497             0                     0
4                     0.001723             0                     0
...                  ...                   ...                   ...
1000                                      1.223000e-12           0
1001                                      1.305000e-12           0
1002                                      1.809000e-12           0
1003                                      2.668000e-12           0
...                                                              ...
2000                                                             0.1234
2001                                                             0.4567
2002                                                             0.8901
2003                                                             0.2345
7
  • 1
    That is doing what merge is expected to do. Try pd.concat(dframe, axis=1) Commented Aug 12, 2021 at 6:17
  • 1
    what would you like the output to be? Why is the output you got wrong? Explaining this would help answer a lot of questions I have about the format of your data. Commented Aug 12, 2021 at 6:22
  • @MarijnvanVliet So I wanted my output to be df20 followed by df25 starting at index=1000, then followed by df30 starting at index=2000, then followed by df35 at index=3000. Commented Aug 12, 2021 at 6:34
  • @ThePyGuy I wanted to try javing my output to be df20 followed by df25 starting at index=1000, then followed by df30 starting at index=2000, then followed by df35 at index=3000. Commented Aug 12, 2021 at 6:35
  • 1
    If it's only the index you are concerned about, you can use pd.concat. Post a small sample from the dataframes, and also add the expected output for the sample data. Please take a look at How to ask and How to make good pandas example Commented Aug 12, 2021 at 6:37

2 Answers 2

1

you can try this code, if you want variable for num_dataframe , length_dataframe:

import pandas as pd
import random

dframe = list()
num_dataframe = 3
len_dataframe = 5

for i in range((num_dataframe)):
    dframe.append(pd.DataFrame({i:[random.randrange(1, 50, 1) for i in range(len_dataframe)]},
                               index=range(i*len_dataframe, (i+1)*len_dataframe)))


result = pd.concat([dframe[i] for i in range(num_dataframe)], axis=1)

result.fillna(0)

output:

enter image description here

and for your question, you want 20 data frame with 1000 length, you can try this:

import pandas as pd
import random

dframe = list()
num_dataframe = 20
len_dataframe = 1000

for i in range((num_dataframe)):
    dframe.append(pd.DataFrame({i:[np.random.random() for i in range(len_dataframe)]},
                               index=range(i*len_dataframe, (i+1)*len_dataframe)))


result = pd.concat([dframe[i] for i in range(num_dataframe)], axis=1)

result.fillna(0)

output:

enter image description here

as you mentioned in the comment, I edit the post and add this code:

dframe = [df20, df25, df30, df35, df40, df45, df50, df55, df60, df65, df70, df75, df80, df85, df90]

result = pd.concat([dframe[i] for i in range(len(dframe))], axis=0)

result.fillna(0)
Sign up to request clarification or add additional context in comments.

6 Comments

So how do I exactly input in my dataframes? I see those are only random numbers. can you enlighten me on this part please?
@KimYejun, I edit the post and add code as you request, maybe this helps you.
I have tried your code but the data inside the dataframes were not shown. everything is just zero :(
@KimYejun, I send three code blocks, which code block did you run?
Yes I did run everything. First code is the one I wanted my output to be. However I the values are random. So you gave me the 3rd code, which gives me the values of my dataframes from df20 to df90, but when I ran it, it only shows zeros not the actual values in my data frame :( I'm sorry, maybe I'm just really not good with coding
|
1

please refer to official page.


Concat multiple dataframes

df1=pd.DataFrame(
        {
            "A":["A0","A1","A2","A3"]
        },
        index=[0, 1, 2, 3]
)
df2=pd.DataFrame(
        {
            "B":["B4","B5"]
        },
        index=[4, 5]
)
df3=pd.DataFrame(
        {
            "C":["C6", "C7", "C8", "C9", "C10"]
        },
        index=[6, 7, 8, 9, 10]
)
result = pd.concat([df1, df2, df3], axis=1)
display(result)

Output:

      A    B    C
0    A0  NaN  NaN
1    A1  NaN  NaN
2    A2  NaN  NaN
3    A3  NaN  NaN
4   NaN   B4  NaN
5   NaN   B5  NaN
6   NaN  NaN   C6
7   NaN  NaN   C7
8   NaN  NaN   C8
9   NaN  NaN   C9
10  NaN  NaN  C10

Import file into a list via looping

method 1: you can create a list to put whole filenames into a list

filenames = ['sample_20.csv', 'sample_25.csv', 'sample_30.csv', ...]
dataframes = [pd.read_csv(f) for f in filenames]

method 1-1: If you do have lots of files then you need a faster way to create the name list

filenames = ['sample_{}.csv'.format(i) for i in range(20, 90, 5)]
dataframes = [pd.read_csv(f) for f in filenames]

method 2:

from glob import glob
filenames = glob('sample*.csv')
dataframes = [pd.read_csv(f) for f in filenames]

5 Comments

Thank you very much for this answer. Actually I have tried it in this similar way however I have a lot of dataframes, around a thousand or more to concat so I was trying to figure out how to do it using a for loop function instead.
I updated the post about the list looping of dataframes
Perhaps you need to use this way to arrange the index respectively, make sure there won't be any overlapping index, then doing the "concat whole" process
Sorry, but can you enlighten me with the list looping?
Updated looping file into a list, after this step, you might need to deal with index in every single dataframe from the dataframes list

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.