Use regex backreferences to create array

Question

I'm not really sure of the best way to summarize this in one sentence for the title, so please edit it to make it clearer if necessary.

I have a list of strings (parsed from a Web page) of the format

"\tLocation\tNext Available Appointment: Date\n"

I'd like to turn this into a list of lists, each with the format

["Location", "Date"]

I know what regular expression I would use, but I don't know how to use the results.

(For reference, here's the regular expression that would find what I want.)

^\t(.*)\t.*: (.*)$

I found how to match regexes against text, but not how to extract the results to something else. I am new to Python, though, so I acknowledge that I probably missed something while searching.

use the above regex in re.findall

Avinash Raj
– Avinash Raj

2015-05-28 12:35:56 +00:00
Commented May 28, 2015 at 12:35 — Avinash Raj
– Avinash Raj, Commented May 28, 2015 at 12:35

Kasravnd · Accepted Answer · 2015-05-28 12:43:22Z

4

You can use re.findall() function within a list comprehension :

import re
[re.findall(r'^\t(.*)\t.*: (.*)$',i) for i in my_list]

For example :

>>> my_list=["\tLocation\tNext Available Appointment: Date\n","\tLocation2\tNext Available Appointment: Date2\n"]
>>> [re.findall(r'^\t(.*)\t.*: (.*)$',i) for i in my_list]
[[('Location', 'Date')], [('Location2', 'Date2')]]

You can also use re.search() with groups() method :

>>> [re.search(r'^\t(.*)\t.*: (.*)$',i).groups() for i in my_list]
[('Location', 'Date'), ('Location2', 'Date2')]

Note that the advantage of re.search here is that you'll get a list of tuples instead of list of list of tuples (with findall()).

edited May 28, 2015 at 12:43

answered May 28, 2015 at 12:37

Kasravnd

108k19 gold badges167 silver badges195 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

stephenwade Over a year ago

I didn't know about tuples. I think a list of tuples makes more sense than a list of lists.

Kasravnd Over a year ago

@stephenwade Yeah,i suggest the second one,myself!

stephenwade Over a year ago

This is literally my first Python script. I had something to do and I decided I'd try to do it in Python instead of Bash.

Kasravnd Over a year ago

@stephenwade Be sure that your choice is correct! keep using python and enjoy ;)

Wiktor Stribiżew · Accepted Answer · 2015-05-28 13:14:35Z

2

You can get a flat list with

import re
p = re.compile(r'^\t(.*)\t.*: (.*)$')
test_str = "    Location    Next Available Appointment: Date\n"
print [item for sublist in re.findall(p, test_str) for item in sublist]

Output:

['Location', 'Date']

See IDEONE demo

EDIT:

Or, you can make use of finditer:

import re
p = re.compile(r'(?m)^\t(.*)\t.*: (.*)$')
test_str = "    Location    Next Available Appointment: Date\n  Location1   Next Available Appointment: Date1\n"
print [(x.group(1), x.group(2)) for x in re.finditer(p, test_str)]

Output od another demo:

[('Location', 'Date'), ('Location1', 'Date1')]

edited May 28, 2015 at 13:14

answered May 28, 2015 at 12:41

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Collectives™ on Stack Overflow

Use regex backreferences to create array

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related