1

Vogue (@voguemagazine) • Instagram photos and videos

Fashionista (@fashionista_com) • Instagram photos and videos

The Business of Fashion (@bof) • Instagram photos and videos I parsed the string inside the <title> tag in the Instagram's page.

I need to parse the screen name which is all strings before (@....) in the string above.

For my examples above, it will be Vogue, Fashionista, and The Business of Fashion respectively.

I tried something like

string.split(' ')[0].replace('\n', '') but this just parses the very first token.

1 Answer 1

2

module "re" will help. Please find below a pattern that makes this possible:

import re
pattern = re.compile("(.+?) \(@.*?\)")

string = "Vogue (@voguemagazine) • Instagram photos and videos"
word = pattern.findall(string)[0]

In pattern "(.+?) \(@.*?\)"

  • (.+?) - catches all characters before space ("") and parentheses;
  • \(@.*?\) - catches things in parentheses (i.e. between "(\" and "\)"), e.g. "@" and all other characters (".*?")
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.