I try to Match Paragraphs using Python and Re.
An example of a text:
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum.
two or more line breaks here
Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
two or more line breaks here
Ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
This Expression seems to almost do the job:
paragraphs = re.findall(r'(?s)((?:[^\n][\n]?)+)', textContent)
But I want to make sure to only match if there are two or more line-breaks. Currently it matches too often.
Edit:
ART. WEFWEFEW
1 SDVSDRG: **<at the momemnt it breaks here, but it shouldnt>**
a. wevvdfvdfd
b. sdfsdfsdfsdfsdfsdghtrhrth
Edit2:
ART. WEFWEFEW
1 SDVSDRG:
**here are two line-breaks, but dont split this paragraph**
**at the momemnt it breaks here, but it shouldnt**
a. wevvdfvdfd
b. sdfsdfsdfsdfsdfsdghtrhrth
two or more line-breaksORtwo-or-more empty lines?Currently it matches too often.- Which portions of the example text do you expect.findall()to return?