2

I am currently working with two sets of dataframes. Each set contains 60 dataframes. They are sorted to line up for mapping (eg. set1 df1 corresponds with set2 df1). First set is about 27 rows x 2 columns; second set is over 25000 rows x 8 columns. I want to create a new dataframe that contains rows from the 2nd dataframe according to the values in the 1st dataframe.

For simplicity I've created a shorten example of the first df of each set to illustrate. I want to use the 797 to take the first 796 rows (indexes 0 - 795) from df2 and add them to a new dataframe, and then rows 796 to 930 and filter them to a 2nd new dataframe. Any suggestions how I could that do for all 60 pairs of dataframes?

          0        1
0     797.0    930.0
1    1650.0   1760.0
2    2500.0   2570.0
3    3250.0   3333.0
4    3897.0   3967.0


0        -1    -2    -1    -3    -2    -1     2     0
1         0     0     0    -2     0    -1     0     0
2        -3     0     0    -1    -2    -1    -1    -1
3         0     1    -1    -1    -3    -2    -1     0
4         0    -3    -3     0     0     0    -4    -2

edit to add:

import pandas as pd

df1 = pd.DataFrame([(3, 5), (8, 11)])
df2 = pd.DataFrame([(1, 0, 2, 3, 1, 0, 1, 2), (2, 0.5, 1, 3, 1, 0, 1, 2), (3, 0, 2, 3, 1, 0, 1, 2), 
                    (4, 0, 2, 3, 1, 0, 1, 2), (5, 0, 2, 3, 1, 0, 1, 2), (6, 0, 2, 3, 1, 0, 1, 2), 
                    (7, 0, 2, 3, 1, 0, 1, 2), (8, 0, 2, 3, 1, 0, 1, 2), (9, 0, 2, 3, 1, 0, 1, 2), 
                    (10, 0, 2, 3, 1, 0, 1, 2), (11, 0, 2, 3, 1, 0, 1, 2), (12, 0, 2, 3, 1, 0, 1, 2), 
                    (13, 0, 2, 3, 1, 0, 1, 2), (14, 0, 0, 1, 2, 5, 2, 3), (15, 0.5, 1, 3, 1.5, 2, 3, 1)])


#expected output will be two dataframes containing rows from df2
output1 = pd.DataFrame([(1, 0, 2, 3, 1, 0, 1, 2), (2, 0.5, 1, 3, 1, 0, 1, 2), (6, 0, 2, 3, 1, 0, 1, 2), 
                    (7, 0, 2, 3, 1, 0, 1, 2), (12, 0, 2, 3, 1, 0, 1, 2), (13, 0, 2, 3, 1, 0, 1, 2), 
                    (14, 0, 0, 1, 2, 5, 2, 3), (15, 0.5, 1, 3, 1.5, 2, 3, 1)])
output2 = pd.DataFrame([(3, 0, 2, 3, 1, 0, 1, 2), (4, 0, 2, 3, 1, 0, 1, 2), (5, 0, 2, 3, 1, 0, 1, 2), 
                    (8, 0, 2, 3, 1, 0, 1, 2), (9, 0, 2, 3, 1, 0, 1, 2), (10, 0, 2, 3, 1, 0, 1, 2), 
                    (11, 0, 2, 3, 1, 0, 1, 2)])
2
  • 1
    Can you create minimal, complete, and verifiable example? It means first DataFrame should contains small numbers like pairs 2-4, 6-7, 8-10, second DataFrame contains some 10 rows and mainly can you add expected output from input data? Commented Jan 23, 2020 at 6:33
  • 1
    @jezrael - I have added a simplified version of a pair of dataframes and the output that I am hoping for. Commented Jan 23, 2020 at 7:14

2 Answers 2

1

You can use list comprehension with flatten for indices:

rng = [x for a, b in df.values for x in range(int(a)-1, int(b))]
print (rng)
[2, 3, 4, 7, 8, 9, 10]

And then filter by DataFrame.iloc and Index.difference:

output1 = df2.iloc[df2.index.difference(rng)]
print (output1)
     0    1  2  3    4  5  6  7
0    1  0.0  2  3  1.0  0  1  2
1    2  0.5  1  3  1.0  0  1  2
5    6  0.0  2  3  1.0  0  1  2
6    7  0.0  2  3  1.0  0  1  2
11  12  0.0  2  3  1.0  0  1  2
12  13  0.0  2  3  1.0  0  1  2
13  14  0.0  0  1  2.0  5  2  3

output2 = df2.iloc[rng]
print (output2)
     0    1  2  3    4  5  6  7
2    3  0.0  2  3  1.0  0  1  2
3    4  0.0  2  3  1.0  0  1  2
4    5  0.0  2  3  1.0  0  1  2
7    8  0.0  2  3  1.0  0  1  2
8    9  0.0  2  3  1.0  0  1  2
9   10  0.0  2  3  1.0  0  1  2
10  11  0.0  2  3  1.0  0  1  2

EDIT:

#list of DataFrames
L1 = [df11, df21, df31]
L2 = [df12, df22, df32]

#if necessary output lists
out1 = []
out2 = []
#loop with zipped lists and apply solution
for df1, df2 in zip(L1, L2):
    print (df1)
    print (df2)

    rng = [x for a, b in df.values for x in range(int(a)-1, int(b))]
    output1 = df2.iloc[df2.index.difference(rng)]
    output2 = df2.iloc[rng]

    #if necessary append output df to lists
    out1.append(output1)
    out2.append(output2)
Sign up to request clarification or add additional context in comments.

5 Comments

any idea how to loop the two sets through with that?
@whntrshll - Is list of pairs something like L = [(df11, df12), (df21, df22), (df31, df32)] ?
no they are not. They are currently in two separate lists.
I tried your solution with the first two dataframes to make sure it would work and I got this error. "TypeError: 'numpy.float64' object cannot be interpreted as an integer" Here is my input... 'rng1 = [x for a, b in user1.values for x in range(a-1, b)] print(rng1) print('') eating = emg1.iloc[emg1.index.difference(rng1)] print (eating) print('') noneating = emg1.iloc[rng1] print (noneating) print('')'
@whntrshll - change rng = [x for a, b in df1.values for x in range(a-1, b)] to rng = [x for a, b in df.values for x in range(int(a)-1, int(b))]
0

this might not be efficient, but I could generate your desired results

import pandas as pd
import numpy as np

df_out1 = pd.DataFrame()
df_out2 = pd.DataFrame()
#generate the secode dataframe 
for x, y in np.array(df1):   
    df_out2 = df_out2.append(df2.iloc[x-1:y], ignore_index=True)
#get the difference 
df_out1 = pd.concat([df_out2,df2]).drop_duplicates(keep=False)

to compare the results with yours

np.array_equal(df_out1.values,output1.values)
np.array_equal(df_out2.values,output2.values)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.