0

I have a dataframe ('sp500news') which looks like the following:

date_publish  \
79944   2007-01-29 19:08:35   
181781  2007-12-14 19:39:06   
213175  2008-01-22 11:17:19   
93554   2008-01-22 18:52:56   
  ...

title  
 79944   Microsoft Vista corporate sales go very well                                            
 181781  Williams No Anglican consensus on Episcopal Church                                      
 213175  CSX quarterly profit rises                                                              
 93554   Citigroup says 30 bln capital helps exceed target                                       
    ...

I am trying to update each company name with its corresponding ticker from a the 'symbol' column of df ('constituents') which looks like:

Symbol  Name    Sector
0   MMM 3M  Industrials
1   AOS A.O. Smith  Industrials
2   ABT Abbott  Health Care
3   ABBV    AbbVie  Health Care
...
116  C      Citigroup    Financials       
...

I've already tried:

for item in sp500news['title']:
    for word in item:
        if word in constituents['Name']:
            indx = constituents['Name'].index(word)
            str.replace(word, constituents['Symbol'][indx])
8
  • 1
    How do you want your output to look like.. Commented Feb 4, 2019 at 11:06
  • column 'title' from sp500 news, all company names replaced by ticker values (from 'symbol' column in 'constituents') Commented Feb 4, 2019 at 11:07
  • Where are your ticker values? Commented Feb 4, 2019 at 11:08
  • 'symbol' column of df 'constituents' Commented Feb 4, 2019 at 11:08
  • 1
    How do I know a symbol is corresponding to which company name? Your question is very unclear. Commented Feb 4, 2019 at 11:12

2 Answers 2

1

Try this:

Here are the dummy dataframes which represent your data

df1 = pd.DataFrame({'Symbol': ['MV', 'AOS','ABT'],
                  'Name': ['Microsoft Vista', 'A.0.', 'Abbot']})
df1
  Symbol    Name
0   MV  Microsoft Vista
1   AOS A.0.
2   ABT Abbot
df2 = pd.DataFrame({'title': [79944, 181781, 213175],
                   'comment': ['Microsoft Vista corporate sales go very well',
                              'Abbot consensus on Episcopal Church',
                              'A.O. says 30 bln captial helps exceed target']})

    title   comment
0   79944   Microsoft Vista corporate sales go very well
1   181781  Abbot consensus on Episcopal Church
2   213175  A.O. says 30 bln captial helps exceed target

Make a dictionary of values mapping names to their respective symbols

rep = dict(zip(df1.Name,df1.Symbol))
rep

{'Microsoft Vista': 'MV', 'A.0.': 'AOS', 'Abbot': 'ABT'}

Replace them using the Series.replace method

df2['comment'] = df2['comment'].replace(rep, regex = True)
df2
   title    comment
0   79944   MV corporate sales go very well
1   181781  ABT consensus on Episcopal Church
2   213175  A.O. says 30 bln captial helps exceed target
Sign up to request clarification or add additional context in comments.

1 Comment

I'm a bit confused. What you want to put in a function? What exactly do you mean by an object?
0

try the following code

df = pd.DataFrame({'title': ['Citigroup says 30 bln capital helps exceed target',
                             'Williams No Anglican consensus on Episcopal Church',
                             'Microsoft Vista corporate sales go very well']})

constituents = pd.DataFrame({'symbol': ['MMM', 'C', 'MCR', 'WLM'],
                             'name': ['3M', 'Citigroup', 'Microsoft', 'Williams']})

for name, symbol in zip(constituents['name'], constituents['symbol']):
    df['title'] = df['title'].str.replace(name, symbol)

Output

                                           title
0      C says 30 bln capital helps exceed target
1  WLM No Anglican consensus on Episcopal Church
2         MCR Vista corporate sales go very well

I basically just copied a few rows of your sp500news['title] and made up some of constituents['Name'] just to demonstrate the transformation. Essentially, I am accessing the string method object of the pd.Series object of column title from sp500news, so then I can apply replace to it when it finds the matching company name.

1 Comment

Avoid using for loops on a dataframe. For loops are slow

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.