How do I extract some string from a long string in Python?

Question

I have a lot of long strings - not all of them have the same length and content, so that's why I can't use indices - and I want to extract a string from all of them. This is what I want to extract:

http://www.someDomainName.com/anyNumber

SomeDomainName doesn't contain any numbers and and anyNumber is different in each long string. The code should extract the desired string from any string possible and should take into account spaces and any other weird thing that might appear in the long string - should be possible with regex right? -. Could anybody help me with this? Thank you.

Update: I should have said that www. and .com are always the same. Also someDomainName! But there's another http://www. in the string

No! I mean they are always www. and .com. See my update please. — Loolooii
– Loolooii, Commented Sep 30, 2012 at 17:05

jfs · Accepted Answer · 2012-09-30 17:31:50Z

2

import re
results = re.findall(r'\bhttp://www\.someDomainName\.com/\d+\b', long_string)

answered Sep 30, 2012 at 17:31

jfs

417k210 gold badges1k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Loolooii Over a year ago

Thanks. Exactly what I wanted.

Rohit Jain · Accepted Answer · 2012-09-30 17:25:01Z

1

>>> import re
>>> pattern = re.compile("(http://www\\.)(\\w*)(\\.com/)(\\d+)")
>>> matches = pattern.search("http://www.someDomainName.com/2134")
>>> if matches:
        print matches.group(0)
        print matches.group(1)
        print matches.group(2)
        print matches.group(3)
        print matches.group(4)

http://www.someDomainName.com/2134
http://www.
someDomainName
.com/
2134

In the above pattern, we have captured 5 groups -

One is the complete string that is matched
Rest are in the order of the brackets you see.. (So, you are looking for the second one..) - (\\w*)

If you want, you can capture only the part of the string you are interested in.. So, you can remove the brackets from rest of the pattern that you don't want and just keep (\w*)

>>> pattern = re.compile("http://www\\.(\\w*)\\.com/\\d+")
>>> matches = patter.search("http://www.someDomainName.com/2134")
>>> if matches:
       print matches.group(1) 

someDomainName

In the above example, you won't have groups - 2, 3 and 4, as in the previous example, as we have captured only 1 group.. And yes group 0 is always captured.. That is the complete string that matches..

edited Sep 30, 2012 at 17:25

answered Sep 30, 2012 at 17:12

Rohit Jain

214k45 gold badges419 silver badges534 bronze badges

5 Comments

Loolooii Over a year ago

Are you sure this works for every string? Because this doesn't match anything in my case. How do I use a string instead of w*, because I know the name and there's no need for that.

Loolooii Over a year ago

Only the number is variable each time.

Rohit Jain Over a year ago

What input string are you giving.. As I showed you that matches in my case... Variable number, any domain name...

Rohit Jain Over a year ago

If you are having fixed domain name, then you can replace (\\w*) with your domain name - someDomainName.. It will match..

Loolooii Over a year ago

J.F. Sebastian's answer solved my problem. Thank you though for your explanation and time.

Chrismit · Accepted Answer · 2012-09-30 17:09:07Z

0

Yeah, your simplest bet is regex. Here's something that will probably get the job done:

import re
matcher = re.compile(r'www.(.+).com\/(.+)
matches = matcher.search(yourstring)
if matches:
    str1,str2 = matches.groups()

answered Sep 30, 2012 at 17:09

Chrismit

1,52814 silver badges23 bronze badges

Comments

Ant · Accepted Answer · 2012-09-30 17:37:03Z

0

If you are sure that there are no dots in SomeDomainName you can just take the first occurence of the string ".com/" and take everything from that index on

this will avoid you the use of regex which are harder to maintain

exp = 'http://www.aejlidjaelidjl.com/alieilael'
print exp[exp.find('.com/')+5:]

answered Sep 30, 2012 at 17:37

Ant

5,4842 gold badges30 silver badges48 bronze badges

Collectives™ on Stack Overflow

How do I extract some string from a long string in Python?

4 Answers 4

1 Comment

5 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related