3

I am having a lot of trouble joining two pandas data frames, because the merge should be based on a partial string match. More specifically:

I have a dataframe called df with about 10,000 rows that look like this:

{
    "writtenAt": "2015-01-01T18:31:01+00:00",
    "content":" India\u2019s banks will ramp up sales of bonds that act as capital buffers in 2015"
}

Now, I have another dataframe called compNames with about 500 rows, which looks like this:

{
    "ticker": "A",
    "name": "Agilent Technologies Inc.",
    "keyword": "Agilent"
}

I am trying to assign a ticker value from compNames to the matching entry of df by the following mechanism:

  1. check if any item from the entire column compNames['keyword'] is contained in an entry of df['content']

  2. if there is a match, then return the matching word as a separate column of the df dataframe (e.g. df['matchedName'])

  3. if there are multiple matches, then create a list of matching words to the corresponding entry of df['content']

  4. Finally, join df and compNames by using df['matchedName'] and compNames['keyword'] as my key variables

What I have so far is:

# Load select company names
compNames = pd.read_csv("compNameList_LARA.txt")
compList = '|'.join(compNames['keyword'].tolist())
df['compMatch'] = df.content.str.contains(compList)

# drop unmatched articles
df = df[df['compMatch']==True]

# assign firm names
df['matchedName'] = df['content'].apply(lambda x: [x for x in   compNames['keyword'].tolist() if x in df['content']])

However, when I do this, I get an empty list for the df['matchedName']

What went wrong?

1
  • Please provide a reproducible pandas example including desired output. From what I can tell, these two example datasets have no matches, so the result wouldn't be interesting. Add some more data that includes matches to clarify the question. Commented Feb 27 at 22:29

1 Answer 1

7

Figured it out. I just needed to do:

df['content'] = df['content'].str.lower().str.split()
df['matchedName'] = df['content'].apply(lambda x: [item for item in x if item in compNames['keyword'].tolist()])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.