Value not found in data frame in python

Question

Original data frame has all 3 columns i.e. name, description and specialties columns in it.

I want to input a company name, compare its specialties with all other companies' specialties and during comparison whenever I found a match I want to print and save all the details of the found match.

df_descrip = df_original[['name', 'description']]
df_spec  = df_original[['name','specialties']]
INPUT ='TOTAL'
all_names = df_original['name']
df_original = df_original.set_index('name', drop = False)
columns = df_original.columns
for index, row in df_original.iterrows():
    if row['name'] == INPUT:
        specialties_input = df_original.loc[INPUT,'specialties']
        print('INPUT SPECIALTIES: ', specialties_input)

for spec in specialties_input:
    for item in df_spec['specialties']:
        if spec in item:
            # here I want to display details of a match

NOTE: Suppose If I input company name 'TOTAL' and it has 5 specialties (s1,s2,s3,s4,s5) I will compare all of them with the specialties of all companies in my data frame. let's say I find a match i-e s3 in specialties, how can I get the name of the matched company ?

Can you create minimal, complete, and verifiable example with expected output? — jezrael
– jezrael, Commented Mar 13, 2019 at 8:08
kindly check the edit and help if you can. i have added images of sample data thats not actual data. — irum zahra
– irum zahra, Commented Mar 13, 2019 at 8:29
hmmm, so check how to provide a great pandas example and create samples from actual data. — jezrael
– jezrael, Commented Mar 13, 2019 at 8:30
Please provide a script with data embedded so that people are able to run your script. Nobody will write by hand data from those images: see minimal reproducible example. — roschach
– roschach, Commented Mar 13, 2019 at 9:08

ecortazar · Accepted Answer · 2019-03-13 15:00:14Z

1

The data you've provided is not very clean or replicable, so I've created sample data here.

Assuming you can split specialties by ',', it's simpler to work with lists and sets than with strings for this kind on analysis.

# Sample Data
df = pd.DataFrame({'description': ['d1', 'd2', 'd3'], 
                   'specialties': ['s1,s2,s3', 's3,s4,s5,s6', 's5,s6,s7']}, 
                  index=['name1', 'name2', 'name3'])

# Sample Input
name_lookup = 'name3'

tgt_set = set(df.loc[name_lookup, 'specialties'].split(','))
intersection = df['specialties'].str.split(',').apply(lambda x: tgt_set.intersection(x))
match = intersection != set() # Remove companies with 0 matches

# Output:

intersection[match] # will deliver the specialties they have in common

df[match] # will return the data only on the ones that have at lest one specialty in common

edited Mar 13, 2019 at 15:00

answered Mar 13, 2019 at 14:12

ecortazar

1,4221 gold badge9 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

irum zahra Over a year ago

hi, can you help a lil bit more ? I want to get only 5 names with highest number of matches in intersection. how would I do that

ecortazar Over a year ago

Adding something like this will give you the top 5 names intersection.apply(len).sort_values().tail(5).index

irum zahra Over a year ago

I want a dataframe in result having entries with maximum matches

Collectives™ on Stack Overflow

Value not found in data frame in python

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related