1

Say i have this string

"Input:Can we book an hotel in Lagos ? Parse: book VB ROOT +-- Can MD aux +-- we PRP nsubj +-- hotel NN dobj | +-- an DT det | +-- in IN prep | +-- Lagos NNP pobj +-- ? . punct "

and i want to get a list like this

['book VB ROOT', 'Can MD aux',..., '? . punct']

using regular expression.

I have tried doing

result = re.findall('\||\+-- (.*?)\+--|\| ', result, re.DOTALL)

any help would be appreciated

1
  • Do you really have to use regex? You can achieve what you want with a simple split() Commented Jun 3, 2016 at 12:40

3 Answers 3

1

Without regex by playing with built-in functions and methods:

>>> filter(bool, map(str.strip, s.replace('+--', '|').split('Parse:')[1].split('|')))
['book VB ROOT', 'Can MD aux', 'we PRP nsubj', 'hotel NN dobj', 'an DT det', 'in IN prep', 'Lagos NNP pobj', '? . punct']
Sign up to request clarification or add additional context in comments.

Comments

0

I would use re.split..

>>> s = 'Can we book an hotel in Lagos ? Parse: book VB ROOT  +-- Can MD aux  +-- we PRP nsubj  +-- hotel NN dobj  |   +-- an DT det  |   +-- in IN prep  |       +-- Lagos NNP pobj  +-- ? . punct'
>>> re.split(r'\s*\|?\s*\+\s*--\s*', s.split('Parse:')[1].strip())
['book VB ROOT', 'Can MD aux', 'we PRP nsubj', 'hotel NN dobj', 'an DT det', 'in IN prep', 'Lagos NNP pobj', '? . punct']

1 Comment

not working this is the output ['Can we book an hotel in Lagos ? Parse'] note the string is "Input: Can we book an hotel in Lagos ? Parse: book VB ROOT +-- Can MD aux +-- we PRP nsubj +-- hotel NN dobj | +-- an DT det | +-- in IN prep | +-- Lagos NNP pobj +-- ? . punct"
0

Here's a version that does use a regex, but doesn't require looping over all the parts twice:

def extract(line):
    _, _, parts = line.strip().partition(' Parse: ')
   return re.split('(?: \|)? \+-- ', parts)

line = "Input:Can we book an hotel in Lagos ? Parse: book VB ROOT +-- Can MD aux +-- we PRP nsubj +-- hotel NN dobj | +-- an DT det | +-- in IN prep | +-- Lagos NNP pobj +-- ? . punct "
print(extract(line))
>>> ['book VB ROOT', 'Can MD aux', 'we PRP nsubj', 'hotel NN dobj', 'an DT det', 'in IN prep', 'Lagos NNP pobj', '? . punct']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.