Pandas Using String Contains to match values in 2 dataframes

Question

Lets say I have 2 dataframes with names of cities but with different formats. So, I want to match them based on their states, and the first four characters of each city name. A small example is as follows:

import pandas as pd
df1 = pd.DataFrame({'city': ['NEW YORK', 'DALLAS', 'LOS ANGELES', 'SAN FRANCISCO'],
                   'state' : ['NY', 'TX', 'CA', 'CA'],
                   'value' : [1,2,3,4]})
df2 = pd.DataFrame({'city': ['NEW YORK CITY', 'DALLAS/ABC', 'LOS ANG', 'ABC'],
                    'state': ['NY', 'TX', 'CA', 'CA'],
                   'temp': [20,21,21,23]})
df1
        city    state   value
    0   NEW YORK    NY  1
    1   DALLAS  TX  2
    2   LOS ANGELES CA  3
    3   SAN FRANCISCO   CA  4

df2 
    city    state   temp
0   NEW YORK CITY   NY  20
1   DALLAS/ABC  TX  21
2   LOS ANG CA  21
3   ABC CA  23

What I want is a dataframe as follows:

city    state   temp    values
0   NEW YORK    NY  20  1
1   DALLAS  TX  21  2
2   LOS ANG CA  21  3

Now, it follows that I cannot use the isin() since that will since that will result in the city names not matching. So far, I am thinking of using str.contains but cannot think of an efficient way to do this.

Help is greatly appreciated.

Zero · Accepted Answer · 2017-09-30 09:22:31Z

Create a temporary city4 column with 4 character to use merge

In [5247]: pd.merge(df1.assign(city4=df1.city.str[:4]),
                    df2.assign(city4=df2.city.str[:4]), 
                    on=['city4', 'state']).drop('city4', 1)
Out[5247]:
        city_x state  value         city_y  temp
0     NEW YORK    NY      1  NEW YORK CITY    20
1       DALLAS    TX      2     DALLAS/ABC    21
2  LOS ANGELES    CA      3        LOS ANG    21

More specifically.

In [5251]: (pd.merge(df1.assign(city4=df1.city.str[:4]),
      ...:           df2.assign(city4=df2.city.str[:4]),
      ...:           on=['city4', 'state'])
              .drop(['city4', 'city_y'], 1)
              .rename(columns={'city_x': 'city'}))
Out[5251]:
          city state  value  temp
0     NEW YORK    NY      1    20
1       DALLAS    TX      2    21
2  LOS ANGELES    CA      3    21

Details

In [5255]: df1.assign(city4=df1.city.str[:4])
Out[5255]:
            city state  value city4
0       NEW YORK    NY      1  NEW
1         DALLAS    TX      2  DALL
2    LOS ANGELES    CA      3  LOS
3  SAN FRANCISCO    CA      4  SAN

In [5256]: df2.assign(city4=df2.city.str[:4])
Out[5256]:
            city state  temp city4
0  NEW YORK CITY    NY    20  NEW
1     DALLAS/ABC    TX    21  DALL
2        LOS ANG    CA    21  LOS
3            ABC    CA    23   ABC

Bharath M Shetty · Accepted Answer · 2017-09-30 09:38:52Z

0

one way using map by creating keys using state and 4 letters of city i.e

one = df1.state+df1.city.str[:4]
two = df2.state+df2.city.str[:4]
df1['temp']=(one).map(df2.set_index(two)['temp'].to_dict())
df1 = df1.dropna()

          city state  value  temp
0     NEW YORK    NY      1  20.0
1       DALLAS    TX      2  21.0
2  LOS ANGELES    CA      3  21.0

edited Sep 30, 2017 at 9:38

answered Sep 30, 2017 at 9:26

Bharath M Shetty

30.6k6 gold badges65 silver badges111 bronze badges

Collectives™ on Stack Overflow

Pandas Using String Contains to match values in 2 dataframes

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related