0

I have a dataframe column with a strings representing a paths. I'd like to use some of that path as the value in another column.

The strings are similar to the following and in a Column Titled 'Image Location'

C:\Users\Chris H\Desktop\20161017HCT116\Day 4\D2\Image9.tif
C:\Users\Chris H\Desktop\20161017HCT116\Day 4\D6\Image7.tif
C:\Users\Chris H\Desktop\20161017HCT116\Day 4\D7\Image3.tif
...
C:\Users\Chris H\Desktop\20161017HCT116\Day 6\D2\Image7.tif
C:\Users\Chris H\Desktop\20161017HCT116\Day 6\D2\Image1.tif
C:\Users\Chris H\Desktop\20161017HCT116\Day 6\D2\Image6.tif
C:\Users\Chris H\Desktop\20161017HCT116\Day 6\D3\Image4.tif
C:\Users\Chris H\Desktop\20161017HCT116\Day 6\D3\Image9.tif
...
C:\Users\Chris H\Desktop\20161017HCT116\Day 8\D1\Image4.tif
C:\Users\Chris H\Desktop\20161017HCT116\Day 8\D1\Image9.tif
C:\Users\Chris H\Desktop\20161017HCT116\Day 8\D1\Image3.tif
C:\Users\Chris H\Desktop\20161017HCT116\Day 8\D2\Image7.tif
C:\Users\Chris H\Desktop\20161017HCT116\Day 8\D2\Image1.tif
C:\Users\Chris H\Desktop\20161017HCT116\Day 8\D2\Image6.tif

Right now I'm doing the following :

df['Interval'] = df['Image Location'].str.split('\\').apply(lambda x: x[5])
df['Device'] = df['Image Location'].str.split('\\').apply(lambda x: x[6])

This clearly requires the path not to change very much because I'm counting the number of \ to find the Interval and Device values.

I'm wondering if there's a more robust way to do this. For instance, maybe find a pattern such as Day # and D# Any thoughts would be appreciated.

2 Answers 2

2

If you don't want to depend on the number of \'s, you can do something like this:

df['Image Location'].map(lambda x: re.findall(r'(?<=Day )[0-9]+', x)).map(lambda x: np.nan if not x else x[0])
df['Image Location'].map(lambda x: re.findall(r'(?<=D)[0-9]+', x)).map(lambda x: np.nan if not x else x[0])

This will first find substring Day (or D) and return the numbers that immediately follow that. So, it assumes there is no other such pattern anywhere else in the string because it will pick up all patterns like D followed by any number of digits.

UPDATE: Looks like it's easier to use Series.str.extract as @MaxU suggested. Here it goes:

df['Image Location'].str.extract('[Day ]([0-9]+)')
df['Image Location'].str.extract('[D]([0-9]+)') 
Sign up to request clarification or add additional context in comments.

1 Comment

This was the direction I was thinking originally. I'm not sure which solution is better, this or the one from @MaxU This seems like it would be robust to some path change between \Day # and \D# e.g. C:\Users\Chris H\Desktop\20161017HCT116\Day 8\run 1\D2\Image6.tif but that's unlikely to happen. Max's solution is robust to the Interval changing from Days to Hours C:\Users\Chris H\Desktop\20161017HCT116\48 hr\D2\Image6.tif That is probably more likely, but both are great solutions!
1

i would use Series.str.extract() method:

In [40]: df[['Interval','Device']] = \
             df['Image Location'].str.extract(r'([^\\]+)\\([^\\]+)\\[^\\]+$', expand=True)

In [41]: df
Out[41]:
                                                 Image Location Interval Device
0   C:\Users\Chris H\Desktop\20161017HCT116\Day 4\D2\Image9.tif    Day 4     D2
1   C:\Users\Chris H\Desktop\20161017HCT116\Day 4\D6\Image7.tif    Day 4     D6
2   C:\Users\Chris H\Desktop\20161017HCT116\Day 4\D7\Image3.tif    Day 4     D7
3   C:\Users\Chris H\Desktop\20161017HCT116\Day 6\D2\Image7.tif    Day 6     D2
4   C:\Users\Chris H\Desktop\20161017HCT116\Day 6\D2\Image1.tif    Day 6     D2
5   C:\Users\Chris H\Desktop\20161017HCT116\Day 6\D2\Image6.tif    Day 6     D2
6   C:\Users\Chris H\Desktop\20161017HCT116\Day 6\D3\Image4.tif    Day 6     D3
7   C:\Users\Chris H\Desktop\20161017HCT116\Day 6\D3\Image9.tif    Day 6     D3
8   C:\Users\Chris H\Desktop\20161017HCT116\Day 8\D1\Image4.tif    Day 8     D1
9   C:\Users\Chris H\Desktop\20161017HCT116\Day 8\D1\Image9.tif    Day 8     D1
10  C:\Users\Chris H\Desktop\20161017HCT116\Day 8\D1\Image3.tif    Day 8     D1
11  C:\Users\Chris H\Desktop\20161017HCT116\Day 8\D2\Image7.tif    Day 8     D2
12  C:\Users\Chris H\Desktop\20161017HCT116\Day 8\D2\Image1.tif    Day 8     D2
13  C:\Users\Chris H\Desktop\20161017HCT116\Day 8\D2\Image6.tif    Day 8     D2

Here is parsed and explained RegEx

The RegEx in this solution assumes that you last two path parts (directories) are always: Interval and Device correspondingly.

It does NOT matter how many \ (back-slashes) are there at the beginning of the path

1 Comment

Very interesting solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.