How can I use two pandas dataframes to create a new dataframe with specific rows from one dataframe?

Question

I am currently working with two sets of dataframes. Each set contains 60 dataframes. They are sorted to line up for mapping (eg. set1 df1 corresponds with set2 df1). First set is about 27 rows x 2 columns; second set is over 25000 rows x 8 columns. I want to create a new dataframe that contains rows from the 2nd dataframe according to the values in the 1st dataframe.

For simplicity I've created a shorten example of the first df of each set to illustrate. I want to use the 797 to take the first 796 rows (indexes 0 - 795) from df2 and add them to a new dataframe, and then rows 796 to 930 and filter them to a 2nd new dataframe. Any suggestions how I could that do for all 60 pairs of dataframes?

          0        1
0     797.0    930.0
1    1650.0   1760.0
2    2500.0   2570.0
3    3250.0   3333.0
4    3897.0   3967.0


0        -1    -2    -1    -3    -2    -1     2     0
1         0     0     0    -2     0    -1     0     0
2        -3     0     0    -1    -2    -1    -1    -1
3         0     1    -1    -1    -3    -2    -1     0
4         0    -3    -3     0     0     0    -4    -2

edit to add:

import pandas as pd

df1 = pd.DataFrame([(3, 5), (8, 11)])
df2 = pd.DataFrame([(1, 0, 2, 3, 1, 0, 1, 2), (2, 0.5, 1, 3, 1, 0, 1, 2), (3, 0, 2, 3, 1, 0, 1, 2), 
                    (4, 0, 2, 3, 1, 0, 1, 2), (5, 0, 2, 3, 1, 0, 1, 2), (6, 0, 2, 3, 1, 0, 1, 2), 
                    (7, 0, 2, 3, 1, 0, 1, 2), (8, 0, 2, 3, 1, 0, 1, 2), (9, 0, 2, 3, 1, 0, 1, 2), 
                    (10, 0, 2, 3, 1, 0, 1, 2), (11, 0, 2, 3, 1, 0, 1, 2), (12, 0, 2, 3, 1, 0, 1, 2), 
                    (13, 0, 2, 3, 1, 0, 1, 2), (14, 0, 0, 1, 2, 5, 2, 3), (15, 0.5, 1, 3, 1.5, 2, 3, 1)])


#expected output will be two dataframes containing rows from df2
output1 = pd.DataFrame([(1, 0, 2, 3, 1, 0, 1, 2), (2, 0.5, 1, 3, 1, 0, 1, 2), (6, 0, 2, 3, 1, 0, 1, 2), 
                    (7, 0, 2, 3, 1, 0, 1, 2), (12, 0, 2, 3, 1, 0, 1, 2), (13, 0, 2, 3, 1, 0, 1, 2), 
                    (14, 0, 0, 1, 2, 5, 2, 3), (15, 0.5, 1, 3, 1.5, 2, 3, 1)])
output2 = pd.DataFrame([(3, 0, 2, 3, 1, 0, 1, 2), (4, 0, 2, 3, 1, 0, 1, 2), (5, 0, 2, 3, 1, 0, 1, 2), 
                    (8, 0, 2, 3, 1, 0, 1, 2), (9, 0, 2, 3, 1, 0, 1, 2), (10, 0, 2, 3, 1, 0, 1, 2), 
                    (11, 0, 2, 3, 1, 0, 1, 2)])

Can you create minimal, complete, and verifiable example? It means first DataFrame should contains small numbers like pairs 2-4, 6-7, 8-10, second DataFrame contains some 10 rows and mainly can you add expected output from input data? — jezrael
– jezrael, Commented Jan 23, 2020 at 6:33
@jezrael - I have added a simplified version of a pair of dataframes and the output that I am hoping for. — whntrshll
– whntrshll, Commented Jan 23, 2020 at 7:14

jezrael · Accepted Answer · 2020-01-23 07:52:45Z

1

You can use list comprehension with flatten for indices:

rng = [x for a, b in df.values for x in range(int(a)-1, int(b))]
print (rng)
[2, 3, 4, 7, 8, 9, 10]

And then filter by DataFrame.iloc and Index.difference:

output1 = df2.iloc[df2.index.difference(rng)]
print (output1)
     0    1  2  3    4  5  6  7
0    1  0.0  2  3  1.0  0  1  2
1    2  0.5  1  3  1.0  0  1  2
5    6  0.0  2  3  1.0  0  1  2
6    7  0.0  2  3  1.0  0  1  2
11  12  0.0  2  3  1.0  0  1  2
12  13  0.0  2  3  1.0  0  1  2
13  14  0.0  0  1  2.0  5  2  3

output2 = df2.iloc[rng]
print (output2)
     0    1  2  3    4  5  6  7
2    3  0.0  2  3  1.0  0  1  2
3    4  0.0  2  3  1.0  0  1  2
4    5  0.0  2  3  1.0  0  1  2
7    8  0.0  2  3  1.0  0  1  2
8    9  0.0  2  3  1.0  0  1  2
9   10  0.0  2  3  1.0  0  1  2
10  11  0.0  2  3  1.0  0  1  2

EDIT:

#list of DataFrames
L1 = [df11, df21, df31]
L2 = [df12, df22, df32]

#if necessary output lists
out1 = []
out2 = []
#loop with zipped lists and apply solution
for df1, df2 in zip(L1, L2):
    print (df1)
    print (df2)

    rng = [x for a, b in df.values for x in range(int(a)-1, int(b))]
    output1 = df2.iloc[df2.index.difference(rng)]
    output2 = df2.iloc[rng]

    #if necessary append output df to lists
    out1.append(output1)
    out2.append(output2)

edited Jan 23, 2020 at 7:52

answered Jan 23, 2020 at 7:23

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

whntrshll Over a year ago

any idea how to loop the two sets through with that?

jezrael Over a year ago

@whntrshll - Is list of pairs something like L = [(df11, df12), (df21, df22), (df31, df32)] ?

whntrshll Over a year ago

no they are not. They are currently in two separate lists.

whntrshll Over a year ago

I tried your solution with the first two dataframes to make sure it would work and I got this error. "TypeError: 'numpy.float64' object cannot be interpreted as an integer" Here is my input... 'rng1 = [x for a, b in user1.values for x in range(a-1, b)] print(rng1) print('') eating = emg1.iloc[emg1.index.difference(rng1)] print (eating) print('') noneating = emg1.iloc[rng1] print (noneating) print('')'

jezrael Over a year ago

@whntrshll - change rng = [x for a, b in df1.values for x in range(a-1, b)] to rng = [x for a, b in df.values for x in range(int(a)-1, int(b))]

Wael Almadhoun · Accepted Answer · 2020-01-23 10:53:03Z

0

this might not be efficient, but I could generate your desired results

import pandas as pd
import numpy as np

df_out1 = pd.DataFrame()
df_out2 = pd.DataFrame()
#generate the secode dataframe 
for x, y in np.array(df1):   
    df_out2 = df_out2.append(df2.iloc[x-1:y], ignore_index=True)
#get the difference 
df_out1 = pd.concat([df_out2,df2]).drop_duplicates(keep=False)

to compare the results with yours

np.array_equal(df_out1.values,output1.values)
np.array_equal(df_out2.values,output2.values)

edited Jan 23, 2020 at 10:53

answered Jan 23, 2020 at 10:47

Wael Almadhoun

4294 silver badges7 bronze badges

Collectives™ on Stack Overflow

How can I use two pandas dataframes to create a new dataframe with specific rows from one dataframe?

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related