how to split a python list by a regex expression

Question

I am reading a file from the web row by row and each row is a list. The list has three columns visibly separated by this pattern: +++$+++.

this is my code:

with closing(requests.get(url, stream=True)) as r:
    reader = csv.reader(codecs.iterdecode(r.iter_lines(), 'latin-1'))
    for i, row in enumerate(reader):
        if i < 5:
            t = row[0].split('(\s\+{3}\$\+{3}\s)+')
            print(t)

I have tried to split the list using this instruction in python3.6 and can't get it to work. Any suggestion is well appreciated:

the list:

['m0 +++$+++ 10 things i hate about you +++$+++ http://www.dailyscript.com/scripts/10Things.html']
['m1 +++$+++ 1492: conquest of paradise +++$+++ http://www.hundland.org/scripts/1492-ConquestOfParadise.txt']
['m2 +++$+++ 15 minutes +++$+++ http://www.dailyscript.com/scripts/15minutes.html']
['m3 +++$+++ 2001: a space odyssey +++$+++ http://www.scifiscripts.com/scripts/2001.txt']
['m4 +++$+++ 48 hrs. +++$+++ http://www.awesomefilm.com/script/48hours.txt']

this is my regex expression:

row[0].split('(\s\+{3}\$\+{3}\s)+')

each row has only one component -> row[0]

when I print the result is not splitting the row.

.split() on a string isn't a regex match at all - it's literally looking for the string (\s\+{3}\$\+{3}\s)+! You want re.split(r'(\s\+{3}\$\+{3}\s)+', row[0]) instead. — jasonharper
– jasonharper, Commented Jul 15, 2018 at 23:27
Or use row[0].split(" +++$+++ "), since nothing you're doing here appears to benefit from the power of regular expressions. — jasonharper
– jasonharper, Commented Jul 15, 2018 at 23:29
Also remove the brackets in the re.split to not return the +++$+++ — F. Elliot
– F. Elliot, Commented Jul 15, 2018 at 23:32
thanks, @jasonharper for the clarification. I learned this one now. — user_dhrn
– user_dhrn, Commented Jul 16, 2018 at 3:40

Joe0815 · Accepted Answer · 2018-07-16 10:31:34Z

1

Doing

row[0].split(' +++$+++ ')

should give you exactly what you wanted without regex.

answered Jul 16, 2018 at 10:31

Joe0815

362 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Inquisitor01 · Accepted Answer · 2018-07-15 23:51:39Z

0

Assuming you don't want to use split(), if you want to relax things and return a tuple maybe this can help.

Input

import re
input = '''['m0 +++$+++ 10 things i hate about you +++$+++ http://www.dailyscript.com/scripts/10Things.html']
['m1 +++$+++ 1492: conquest of paradise +++$+++ http://www.hundland.org/scripts/1492-ConquestOfParadise.txt']
['m2 +++$+++ 15 minutes +++$+++ http://www.dailyscript.com/scripts/15minutes.html']
['m3 +++$+++ 2001: a space odyssey +++$+++ http://www.scifiscripts.com/scripts/2001.txt']
['m4 +++$+++ 48 hrs. +++$+++ http://www.awesomefilm.com/script/48hours.txt']'''
output = re.findall('\[\'([\S\s]+?)[\s]+[\+]{3}\$[\+]{3}[\s]+([\S\s]+?)[\s][\+]{3}\$[\+]{3}[\s]+([\S\s]+?)\'\]', input)
print(output)

Output:

[('m0', '10 things i hate about you', 'http://www.dailyscript.com/scripts/10Things.html'), ('m1', '1492: conquest of paradise', 'http://www.hundland.org/scripts/1492-ConquestOfParadise.txt'), ('m2', '15 minutes', 'http://www.dailyscript.com/scripts/15minutes.html'), ('m3', '2001: a space odyssey', 'http://www.scifiscripts.com/scripts/2001.txt'), ('m4', '48 hrs.', 'http://www.awesomefilm.com/script/48hours.txt')]

.

I' also trying to experiment with an alternating regex, but for the life of me, I can't get the formula to work haha.. eventually. I'll post it later, but hopefully the above helps

edited Jul 15, 2018 at 23:51

answered Jul 15, 2018 at 23:40

Inquisitor01

472 silver badges11 bronze badges

1 Comment

user_dhrn Over a year ago

Thanks, @Inquisitor01 I got a good one from jasonharper. Appreciate it.

Collectives™ on Stack Overflow

how to split a python list by a regex expression

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related