3

I'm not really sure of the best way to summarize this in one sentence for the title, so please edit it to make it clearer if necessary.

I have a list of strings (parsed from a Web page) of the format

"\tLocation\tNext Available Appointment: Date\n"

I'd like to turn this into a list of lists, each with the format

["Location", "Date"]

I know what regular expression I would use, but I don't know how to use the results.

(For reference, here's the regular expression that would find what I want.)

^\t(.*)\t.*: (.*)$

I found how to match regexes against text, but not how to extract the results to something else. I am new to Python, though, so I acknowledge that I probably missed something while searching.

1
  • use the above regex in re.findall Commented May 28, 2015 at 12:35

2 Answers 2

4

You can use re.findall() function within a list comprehension :

import re
[re.findall(r'^\t(.*)\t.*: (.*)$',i) for i in my_list]

For example :

>>> my_list=["\tLocation\tNext Available Appointment: Date\n","\tLocation2\tNext Available Appointment: Date2\n"]
>>> [re.findall(r'^\t(.*)\t.*: (.*)$',i) for i in my_list]
[[('Location', 'Date')], [('Location2', 'Date2')]]

You can also use re.search() with groups() method :

>>> [re.search(r'^\t(.*)\t.*: (.*)$',i).groups() for i in my_list]
[('Location', 'Date'), ('Location2', 'Date2')]

Note that the advantage of re.search here is that you'll get a list of tuples instead of list of list of tuples (with findall()).

Sign up to request clarification or add additional context in comments.

4 Comments

I didn't know about tuples. I think a list of tuples makes more sense than a list of lists.
@stephenwade Yeah,i suggest the second one,myself!
This is literally my first Python script. I had something to do and I decided I'd try to do it in Python instead of Bash.
@stephenwade Be sure that your choice is correct! keep using python and enjoy ;)
2

You can get a flat list with

import re
p = re.compile(r'^\t(.*)\t.*: (.*)$')
test_str = "    Location    Next Available Appointment: Date\n"
print [item for sublist in re.findall(p, test_str) for item in sublist]

Output:

['Location', 'Date']

See IDEONE demo

EDIT:

Or, you can make use of finditer:

import re
p = re.compile(r'(?m)^\t(.*)\t.*: (.*)$')
test_str = "    Location    Next Available Appointment: Date\n  Location1   Next Available Appointment: Date1\n"
print [(x.group(1), x.group(2)) for x in re.finditer(p, test_str)]

Output od another demo:

[('Location', 'Date'), ('Location1', 'Date1')]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.