Extract substring from urls stored in a pandas column

Question

Pandas column contains a series of urls. I'd like to extract a substring from the url. MRE code below.

s = pd.Series(['https://url-location/img/xxxyyy_image1.png'])

s.apply(lambda x: x[x.find("/")+1:st.find("_")])

I'd like to extract xxxyyy and store them into a new column.

Wiktor Stribiżew · Accepted Answer · 2021-09-14 19:23:56Z

3

You can use

>>> s.str.extract(r'.*/([^_]+)')
        0
0  xxxyyy

See the regex demo. Details:

.* - zero or more chars other than line break chars as many as possible
/ - a slash
([^_]+) - Capturing group 1 (the value captured into this group will be the actual return value of Series.str.extract): one or more chars other than _ char.

edited Sep 14, 2021 at 19:23

answered Sep 14, 2021 at 19:22

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

kms Over a year ago

how does it skip over /img and know to look at the one _ in the substring?

Wiktor Stribiżew Over a year ago

@kms .* is a greedily quantified pattern, it grabs the whole string at first. The engine starts backtracking then, trying to match some text with the subsequent patterns. So the / char found is the last / char that is followed by one or more chars other than _. The [^_] is a negated character class, it matches any char other than a _ char, so it cannot match across more _s, it will stop before the first _ or end of string. Here is my YT video about backtracking in regex.

Andreas · Accepted Answer · 2021-09-14 19:26:54Z

1

Also possible:

s.str.split('/').str[-1].str.split('_').str[0]
# Out[224]: xxxyyy

This works, because .str allows for the slice annotation. So .str[-1] will provide the last element after the split for example.

answered Sep 14, 2021 at 19:26

Andreas

9,2853 gold badges20 silver badges47 bronze badges

Collectives™ on Stack Overflow

Extract substring from urls stored in a pandas column

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related