How to add dataframe column data to a range of indexes in another dataframe?

Question

I have a dataframe called df1:

        Long_ID     IndexBegin         IndexEnd
0       10000001        0                  3
1       10000002        3                  6
2       10000003        6                 10

I have a second dataframe called df2, which can be up to 1 million rows long:

      Short_ID
0        1
1        2
2        3
3        10
4        20
5        30
6        100
7        101
8        102
9        103

I want to link Long_ID to Short_ID in such a way that if (IndexBegin:IndexEnd) is (0:3), then Long_ID gets inserted into df2 at indexes 0 through 2 (IndexEnd - 1). The starting index and ending index are determined using the last two columns of df1.

So that ultimately, my final dataframe looks like this: df3:

      Short_ID       Long_ID
0        1          10000001
1        2          10000001
2        3          10000001
3        10         10000002
4        20         10000002
5        30         10000002
6        100        10000003
7        101        10000003
8        102        10000003
9        103        10000003

First, I tried storing the index of df2 as a key and Short_ID as a value in a dictionary, then iterating row by row, but that was too slow. This led me to learn about vectorization.

Then, I tried using where(), but I got "ValueError: Can only compare identically-labeled Series objects."

df2 = df2.reset_index()
df2['Long_ID'] = df1['Long_ID'] [ (df2['index'] < df1['IndexEnd']) & (df2['index'] >= df1['IndexBegin']) ]

I am relatively new to programming, and I appreciate if anyone can give a better approach to solving this problem. I have reproduced the code below:

df1_data = [(10000001, 0, 3), (10000002, 3, 6), (10000003, 6, 10)]
df1 = pd.DataFrame(df1_data, columns = ['Long_ID', 'IndexBegin', 'IndexEnd'])

df2_data = [1, 2, 3, 10, 20, 30, 100, 101, 102, 103]
df2 = pd.DataFrame(df2_data, columns = ['Short_ID'])

cs95 · Accepted Answer · 2019-07-21 22:14:01Z

5

df2 does not need "IndexEnd" as long as the ranges are contiguous. You may use pd.merge_asof:

(pd.merge_asof(df2.reset_index(), df1, left_on='index', right_on='IndexBegin')
   .reindex(['Short_ID', 'Long_ID'], axis=1))

   Short_ID   Long_ID
0         1  10000001
1         2  10000001
2         3  10000001
3        10  10000002
4        20  10000002
5        30  10000002
6       100  10000003
7       101  10000003
8       102  10000003
9       103  10000003

answered Jul 21, 2019 at 22:14

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BENY · Accepted Answer · 2019-07-21 22:26:34Z

4

Here is one way using IntervalIndex

df1.index=pd.IntervalIndex.from_arrays(left=df1.IndexBegin,right=df1.IndexEnd,closed='left')
df2['New']=df1.loc[df2.index,'Long_ID'].values

answered Jul 21, 2019 at 22:26

BENY

324k22 gold badges176 silver badges250 bronze badges

Comments

Ayoub ZAROU · Accepted Answer · 2019-07-21 21:55:00Z

0

you may do :

df3 = df2.copy()
df3['long_ID'] =  df2.merge(df1, left_on =df2.index,right_on = "IndexBegin", how = 'left').Long_ID.ffill().astype(int)

answered Jul 21, 2019 at 21:55

Ayoub ZAROU

2,4178 silver badges20 bronze badges

Comments

Gustavo Gradvohl · Accepted Answer · 2019-07-21 22:53:53Z

I created a function to solve your question. Hope it helps.

df = pd.read_excel('C:/Users/me/Desktop/Sovrflw_data_2.xlsx')
df

    Long_ID     IndexBegin  IndexEnd
0   10000001    0           3
1   10000002    3           6
2   10000003    6           10

df2 = pd.read_excel('C:/Users/me/Desktop/Sovrflw_data.xlsx')
df2

    Short_ID
0   1
1   2
2   3
3   10
4   20
5   30
6   100
7   101
8   102
9   103

def convert_Short_ID(df1,df2):
    df2['Long_ID'] = None
    for i in range(len(df2)):
        for j in range(len(df)):
            if (df2.index[i] >= df.loc[j,'IndexBegin']) and (df2.index[i] < df.loc[j,'IndexEnd']):
                number = str(df.iloc[j, 0])
                df2.loc[i,'Long_ID'] = df.loc[j, 'Long_ID']
                break
            else:
                df2.loc[i, 'Long_ID'] = np.nan

    df2['Long_ID'] = df2['Long_ID'].astype(str)
    return df2

convert_Short_ID(df,df2)   
    Short_ID    Long_ID
0   1           10000001
1   2           10000001
2   3           10000001
3   10          10000002
4   20          10000002
5   30          10000002
6   100         10000003
7   101         10000003
8   102         10000003
9   103         10000003

Ramon · Accepted Answer · 2019-07-23 23:25:27Z

Using Numpy to create the data before creating a Data Frame is a better approach since adding elements to a Data Frame is time-consuming. So:

import numpy as np
import pandas as pd

#Step 1: creating the first Data Frame
df1 = pd.DataFrame({'Long_ID':[10000001,10000002,10000003],
                 'IndexBegin':[0,3,6],
                   'IndexEnd':[3,6,10]})

#Step 2: creating the second chunk of data as a Numpy array
Short_ID = np.array([1,2,3,10,20,30,100,101,102,103])

#Step 3: creating a new column on df1 to count Long_ID ocurrences
df1['Qt']=df1['IndexEnd']-df1['IndexBegin']

#Step 4: using append to create a Numpy Array for the Long_ID item 
Long_ID = np.array([])
for i in range(len(df1)):
    Long_ID = np.append(Long_ID, [df1['Long_ID'][i]]*df1['Qt'][i])

#Finally, create the seconc Data Frame using both previous Numpy arrays
df2 = pd.DataFrame(np.vstack((Short_ID, Long_ID)).T, columns=['Short_ID','Long_ID'])
df2
    Short_ID    Long_ID
0        1.0 10000001.0
1        2.0 10000001.0
2        3.0 10000001.0
3       10.0 10000002.0
4       20.0 10000002.0
5       30.0 10000002.0
6      100.0 10000003.0
7      101.0 10000003.0
8      102.0 10000003.0
9      103.0 10000003.0

Collectives™ on Stack Overflow

How to add dataframe column data to a range of indexes in another dataframe?

5 Answers 5

Comments

Comments

Comments

I created a function to solve your question. Hope it helps.

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

I created a function to solve your question. Hope it helps.

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related