0

How to extract a list of sub strings based on some patterns in python?

for example,

str = 'this {{is}} a sample {{text}}'.

expected result : a python list which contains 'is' and 'text'

3
  • Are you trying to extract only substrings that appear in double curly braces? Commented Dec 21, 2010 at 17:54
  • @Rafe Yes. I just need the string in that curly braces. Commented Dec 21, 2010 at 17:57
  • It is generally not a good idea to name a variable string as it is a commonly used Python module. Commented Dec 21, 2010 at 17:59

5 Answers 5

14
>>> import re
>>> re.findall("{{(.*?)}}", "this {{is}} a sample {{text}}")
['is', 'text']
Sign up to request clarification or add additional context in comments.

1 Comment

@Siva: you'll need to escape the [s because they have a meaning within the regular expression: re.findall(r"\\[\\[(.*?)]]", "this [[is]] a sample [[text]].")
2

Assuming "some patterns" means "single words between double {}'s":

import re

re.findall('{{(\w*)}}', string)

Edit: Andrew Clark's answer implements "any sequence of characters at all between double {}'s"

Comments

2

You can use the following:

res = re.findall("{{([^{}]*)}}", a)
print "a python list which contains %s and %s" % (res[0], res[1])

Cheers

2 Comments

You have to use %r instead of %s otherwise you won't get the quotes ;)
Thanks, I didn't know this. I would usually have put the format ('') in the printing string itself ( '%s' ) for example. Cheers
1

A regex-based solution is fine for your example, although I would recommend something more robust for more complicated input.

import re

def match_substrings(s):
    return re.findall(r"{{([^}]*)}}", s)

The regex from inside-out:

[^}] matches anything that's not a '}'
([^}]*) matches any number of non-} characters and groups them
{{([^}]*)}} puts the above inside double-braces

Without the parentheses above, re.findall would return the entire match (i.e. ['{{is}}', '{{text}}']. However, when the regex contains a group, findall will use that, instead.

Comments

0

You could use a regular expression to match anything that occurs between {{ and }}. Will that work for you?

Generally speaking, for tagging certain strings in a large body of text, a suffix tree will be useful.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.