I am learning Pandas and have a DataFrame of strings which looks a little like this:
df = pd.DataFrame([['Apple', 'Med6g7867'], ['Orange', 'Med7g8976'], ['Banana', 'Signal'], ['Peach', 'Med8g8989'], ['Mango', 'Possible result %gggyy']], columns=['A', 'B'])
df
A B
0 Apple Med6g7867
1 Orange Med7g8976
2 Banana Signal
3 Peach Med8g8989
4 Mango Possible result %gggyy
Note column B has two types of value, either a unique identifier of the form MedXgXXXX or a descriptive string. I would like to do two related things.
- Substitute all the values of B with the unique identifier to NaN
- Retain the descriptive string but truncate any that have a % sign so that I only retain the string prior to the % sign.
I would like a table like this:
A B
0 Apple NaN
1 Orange NaN
2 Banana Signal
3 Peach NaN
4 Mango Possible result
Currently I can subset the table like so:
df[df['B'].str.contains("Med")]
df[df['B'].str.contains("%")]
but no implementation of replace() I try allows me to do this.
Any help appreciated.