0

I have following data frame (both columns str type):

+------+-----------------+
| year | indicator_short |
+------+-----------------+
| 2020 | ind_1           |
| 2019 | ind_2           |
| 2019 | ind_3           |
| N/A  | ind_4           |
+------+-----------------+

I would like to add new column which will contain concatenation of two existing columns, but I would like them to be formatted like:

+------+-----------------+--------------------+
| year | indicator_short |   indicator_full   |
+------+-----------------+--------------------+
| 2020 | ind_1           | Indicator_1 (2020) |
| 2019 | ind_2           | Indicator_2 (2019) |
| 2019 | ind_3           | Indicator_3 (2019) |
| N/A  | ind_4           | Indicator_4 (N/A)  |
+------+-----------------+--------------------+

One thing is coming to my mind is use formatting, something like':

df['indicator_full'][df['indicator_short']=='ind_1'] = 'Indicator_1 ({})'.format(df['year'])

but it gives wrong result.

3 Answers 3

3

I'd go with string concatenation and formatting the string column as:

years = '('+df['year'].astype(str).str.replace(r'.0$','')+')' 
# years =  '('+df['year']+')' if the year col is a string
df['indicator_full   '] = ('Indicator_'+df.indicator_short.str.rsplit('_').str[-1]) \
                                          .str.cat(years, sep=' ')

print(df)
     year indicator_short   indicator_full   
0  2020.0           ind_1  Indicator_1 (2020)
1  2019.0           ind_2  Indicator_2 (2019)
2  2019.0           ind_3  Indicator_3 (2019)
3     NaN           ind_4   Indicator_4 (nan)
Sign up to request clarification or add additional context in comments.

Comments

2

Use Series.str.extract for get integers from indicator_short, get integers from floats in year column and last join together:

i = df['indicator_short'].str.extract('(\d+)', expand=False)
y = df['year'].astype('Int64').astype(str).replace('<NA>','N/A')

df['indicator_full'] = 'Indicator_' + i + ' (' + y + ')'
print (df)
0  2020.0           ind_1  Indicator_1 (2020)
1  2019.0           ind_2  Indicator_2 (2019)
2  2019.0           ind_3  Indicator_3 (2019)
3     NaN           ind_4   Indicator_4 (N/A)

Comments

2

Use .str.cat() to concat the two columns after replacing ind with Indicator using .str.replace.

df['indicator_full']=(df.indicator_short.str.replace('ind','Indicator')).str.cat("("+df['year']+ ")", sep=(" ") )

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.