I want to split my text into list based on certain pattern. For example my text is:
134. Lorem Ipsum is simply dummy text of the printing and typesetting industry 135. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book 136. It has survived not only five centuries
I want to convert it into a list based on the unique number as below:
[134. Lorem Ipsum is simply dummy text of the printing and typesetting industry,
135. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s,
136. It has survived not only five centuries]
I already tried using:
import re
xx = re.split(pattern="d{1,3}. ", string=file_read)
list = []
for xy in xx:
xy = re.sub(pattern="\s+", repl=" ", string=xy)
list.append(xy)
But the output is:
[134. Lorem Ipsum is simply dummy text of the printing and typesetting industry 135. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s 136. It has survived not only five centuries]
.means "any character". If you want it to be interpreted as a period, you have to escape it with a backslash, like\.re.findall(pattern="\d{1,3}. \w+", string=file_read)gets the number and then the first word.