I want to read the text between two characters (“#*” and “#@”) from a file. My file contains thousands of records in the above-mentioned format. I have tried using the code below, but it is not returning the required output. My data contains thousands of records in the given format.
import re
start = '#*'
end = '#@'
myfile = open('lorem.txt')
for line in fhand:
text = text.rstrip()
print (line[line.find(start)+len(start):line.rfind(end)])
myfile.close()
My Input:
\#*OQL[C++]: Extending C++ with an Object Query Capability
\#@José A. Blakeley
\#t1995
\#cModern Database Systems
\#index0
\#*Transaction Management in Multidatabase Systems
\#@Yuri Breitbart,Hector Garcia-Molina,Abraham Silberschatz
\#t1995
\#cModern Database Systems
\#index1
My Output:
51103
OQL[C++]: Extending C++ with an Object Query Capability
t199
cModern Database System
index
...
Expected output:
OQL[C++]: Extending C++ with an Object Query Capability
Transaction Management in Multidatabase Systems
forcycle and addcontents = myfile.read()and thenprint(re.findall(r'#\*(.*?)#@', contents, re.S))^#\*(.*)(?:\r?\n){2}#@regex101.com/r/5ouxbw/1for match in re.findall(r'#\*(.*?)#@', contents, re.S): // do something with the match