0

Original data frame has all 3 columns i.e. name, description and specialties columns in it.

I want to input a company name, compare its specialties with all other companies' specialties and during comparison whenever I found a match I want to print and save all the details of the found match.

df_descrip = df_original[['name', 'description']]
df_spec  = df_original[['name','specialties']]
INPUT ='TOTAL'
all_names = df_original['name']
df_original = df_original.set_index('name', drop = False)
columns = df_original.columns
for index, row in df_original.iterrows():
    if row['name'] == INPUT:
        specialties_input = df_original.loc[INPUT,'specialties']
        print('INPUT SPECIALTIES: ', specialties_input)

for spec in specialties_input:
    for item in df_spec['specialties']:
        if spec in item:
            # here I want to display details of a match

NOTE: Suppose If I input company name 'TOTAL' and it has 5 specialties (s1,s2,s3,s4,s5) I will compare all of them with the specialties of all companies in my data frame. let's say I find a match i-e s3 in specialties, how can I get the name of the matched company ?

5

1 Answer 1

1

The data you've provided is not very clean or replicable, so I've created sample data here.

Assuming you can split specialties by ',', it's simpler to work with lists and sets than with strings for this kind on analysis.

# Sample Data
df = pd.DataFrame({'description': ['d1', 'd2', 'd3'], 
                   'specialties': ['s1,s2,s3', 's3,s4,s5,s6', 's5,s6,s7']}, 
                  index=['name1', 'name2', 'name3'])

# Sample Input
name_lookup = 'name3'

tgt_set = set(df.loc[name_lookup, 'specialties'].split(','))
intersection = df['specialties'].str.split(',').apply(lambda x: tgt_set.intersection(x))
match = intersection != set() # Remove companies with 0 matches

# Output:

intersection[match] # will deliver the specialties they have in common

df[match] # will return the data only on the ones that have at lest one specialty in common
Sign up to request clarification or add additional context in comments.

3 Comments

hi, can you help a lil bit more ? I want to get only 5 names with highest number of matches in intersection. how would I do that
Adding something like this will give you the top 5 names intersection.apply(len).sort_values().tail(5).index
I want a dataframe in result having entries with maximum matches

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.