I would like to group string in this format:
Some_text Some_text 1 2 3
BEGIN Some_text Some_text
44 76 1321
Some_text Some_text
END Some_text
Some_Text Some_text 1 4 5
I would like to group it from BEGIN to END with it, like that:
Some_text Some_text 1 2 3
<!-- START -->
BEGIN Some_text Some_text
44 76 1321
Some_text Some_text
END <!-- END --> Some_text
Some_Text Some_text 1 4 5
<!-- START --> and <!-- END --> - this is just a comment on the start and end of grouping.
I want to get only text between BEGIN and END
I have something like that, but it doesn't work for every case - when there is a lot of data, it just doesn't work:
reg = re.compile(rf"{begin}[\-\s]+(.*)\n{end}", re.DOTALL)
core = re.search(reg, text).group(1)
lines = core.split("\n")
text is my string and then after grouping I exchange it for a list - I don't know how to make this regex directly from the list, then I would not have to do it on string text but on python list text
Give me some tips or help how I can solve it.
Sample code:
import re
text="Some_text Some_text 1 2 3\nBEGIN Some_text Some_text\n44 76 1321\nSome_text Some_text\nEND Some_text\nSome_Text Some_text 1 4 5"
begin = "BEGIN"
end = "END"
reg = re.compile(rf"{begin}[\-\s]+(.*)\n{end}", re.DOTALL)
core = re.search(reg, text).group(1)
lines = core.split("\n")
print(lines)
It works but I don't know why sometimes it doesn't, when it takes a lot of text e.g: 20k words I want to get only text between BEGIN and END
rf"^BEGIN[.\n]*\nEND"[.]is just the literal.-- not the regex metacharacter...text="Some_text Some_text 1 2 3\nBEGIN Some_text Some_text\n44 76 1321\nSome_text Some_text\nEND Some_text\nSome_Text Some_text 1 4 5" begin = "BEGIN" end = "END" reg = re.compile(rf"{begin}[\-\s]+(.*)\n{end}", re.DOTALL) core = re.search(reg, text).group(1) lines = core.split("\n")