I have a pandas dataframe that looks like df and I want to add a column so it looks like df2.
import pandas as pd
df =pd.DataFrame({'Alternative' : ['a_x_17MAR2016_Collectedran30dom', 'b_17MAR2016_CollectedStuff', 'c_z_k_17MAR2016_Collectedan3dom'], 'Values': [34, 65, 7]})
df2 = pd.DataFrame({'Alternative' : ['a_x_17MAR2016_Collectedran30dom', 'b_17MAR2016_CollectedStuff', 'c_z_k_17MAR2016_Collectedan3dom'], 'Values': [34, 65, 7], 'Alts': ['a x 17MAR2016', 'b 17MAR2016', 'c z k 17MAR2016']})
df
Out[4]:
Alternative Values
0 a_x_17MAR2016_Collectedran30dom 34
1 b_17MAR2016_CollectedStuff 65
2 c_z_k_17MAR2016_Collectedan3dom 7
df2
Out[5]:
Alternative Alts Values
0 a_x_17MAR2016_Collectedran30dom a x 17MAR2016 34
1 b_17MAR2016_CollectedStuff b 17MAR2016 65
2 c_z_k_17MAR2016_Collectedan3dom c z k 17MAR2016 7
In other words I have a string that I can separate with an underscore delimeter that is of varying length. I want to separate it, then combine it delimeted by a space, but remove any string(s) after starting with the string containing the substring 'Collected'.
I can locate the index of the string containing the substring 'Collected' in an individual list as I found here and then combine the other strings, but I cannot seem to do it in a very 'pythonic' way across all of the dataframe.
Thanks in advance
