0

I've got two Excel Spreadsheets. The first has multiple columns, of which three (named differently) have common data. My second spreadsheet contains a key along with the data I want to bring into the first spreadsheet. For example:

Table 1:

Target Company ID Subsidiary ID Parent Company ID
1 2 3

Table 2:

Company ID Company Size
1 Small
2 Medium
3 Large

With Excel I would do a series of Vlookups, but Table 1 is huge and is taking a long time to calculate. I would like to generate a Table such as:

Target Company ID Target Size Subsidiary ID Sub Size Parent Company ID Parent Size
1 Small 2 Medium 3 Large

I've tried pd.merge, but it is not as targeted as a vlookup would be in assign a value to the specific column

2 Answers 2

5

You can make a mapping of Company ID to Company Size from df2 by .set_index() and specify the column.

Then, use .map() to map (like looking up) the Company ID in df1 to the mapping for each column, as follows:

mapping = df2.drop_duplicates('Company ID').set_index('Company ID')['Company Size']

for col in df1.columns:
    df1[f'{col.split()[0]} Size'] = df1[col].map(mapping)

Result:

print(df1)

   Target Company ID  Subsidiary ID  Parent Company ID Target Size Subsidiary Size Parent Size
0                  1              2                  3       Small          Medium       Large

In case you have other columns in the dataframes, you can select the related columns by .filter() for all columns with column names end with ID, as follows:

mapping = df2.drop_duplicates('Company ID').set_index('Company ID')['Company Size']

# select column automatically by `.filter` or enter manually as a list
cols = df1.filter(regex=r'ID$').columns    

for col in cols:
    df1[f'{col.split()[0]} Size'] = df1[col].map(mapping)

In case you want to put the related columns together, you can further use .sort_index(), as follows:

df1 = df1.sort_index(axis=1)

Result:

print(df1)

   Parent Company ID Parent Size  Subsidiary ID Subsidiary Size  Target Company ID Target Size
0                  3       Large              2          Medium                  1       Small
Sign up to request clarification or add additional context in comments.

14 Comments

I get an error: InvalidIndexError: Reindexing only valid with uniquely valued Index objects
try : mapping = df2.drop_duplicates('Company ID').set_index('Company ID')['Company Size']
@premiumcopypaper Your table 2 has duplicated entries of Company ID ? If yes, the code provided by Pascal can be used.
@PascalG.Bernard Thanks for sharing the codes to handle the issue : -)
My pleasure @SeaBean - Congrats for the code you have provided - I was working on it but you clearly outperformed my proposition
|
0

basic you want to marge the tables by left join 3 times each time tack a different column or just treat the 2 table as 3 tables like the following

table_1 = #read_file
table_2 = #read_file
targets = table_2[['Target Company ID','Target Size']]
Subsidiarys = table_2[['Subsidiary Company ID','Sub Size']]
Parents = table_2[['Parent Company ID','Parent Size']]
comb_table = pd.merge(table_1,targets,on=['Target Company ID'],how='left')
comb_table = pd.merge(comb_table ,Subsidiarys ,on=['Subsidiary Company ID'],how='left')
comb_table = pd.merge(comb_table ,Parents , on=['Parent Company ID'],how='left')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.