So I have a dataframe with some text in one column. I'm trying to find 2 strings within each row of the column, and then slice the row text between those two strings to get a substring. Something like this:
startinds = df[column].str.find("First Event = ")
endinds = df[column].str.find("\nLast Event = ")
df["first_timestamp"] = df[column].str.slice(startinds,endinds)
Now this doesn't work because startinds and endinds are series, so I can't use them as indices for slicing the strings in column.
Anyone know a way I can access the values to do the substrings on each row?
Example Input:
Data
0 "Blahblah
First Event = 09/20/2017 12:00:00
Last Event = 09/20/2017 13:00:00
Blahblahblah"
1 "Blahblahblahblah
Blahablahblah
First Event = 09/20/2017 12:30:00
Last Event = 09/20/2017 12:45:00
Blahblahblah"
Output:
first_timestamp
0 "First Event = 09/20/2017 12:00:00"
1 "First Event = 09/20/2017 12:30:00"
"First Event = " + df.Data.str.extract('(?<=First Event = )(.*)(?=\\\\nLast Event)', expand=False)?