Split text file Python

Question

I am working on text files like this:

Chapter 01

Lorem ipsum

dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt

Chapter 02

consectetur adipiscing

sed do eiusmod tempor

Chapter 03

et dolore magna aliqua.

with delimiters like "chapter", "Chapter", "CHAPTER", etc... and 1 or 2 digits ("Chapter 1" or "Chapter 01").

I managed to open and read the file in Python, with .open() and .read()

mytext = myfile.read()

Now I need to split my string, in order to get text for "Chapter XX".

For Chapter 02, that would be :

consectetur adipiscing

sed do eiusmod tempor

I'm new to Python, I read about regex, match, map, or split, but... well...

(I'm writing a Gimp Python-fu plugin, so I use Python version bundled in Gimp, which is 2.7.15).

End genocide - save Gaza · Accepted Answer · 2021-03-24 15:37:54Z

2

You can use regular expressions like so:

import re

split_text = re.split("Chapter [0-9]+\n",  # splits on "Chapter " + numbers + newline
                      mytext, 
                      flags=re.IGNORECASE) # splits on "CHAPTER"/"chapter"/"Chapter" etc

>>> split_text
['', '\nLorem ipsum\n\ndolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt\n\n', '\nconsectetur adipiscing\n\nsed do eiusmod tempor\n\n', '\net dolore magna aliqua.']

You can now choose the text from each chapter by the index of split_text e.g.:

print(split_text[2])

>>> 
consectetur adipiscing

sed do eiusmod tempor

edited Mar 24, 2021 at 15:37

answered Jul 21, 2018 at 10:42

End genocide - save Gaza

25k10 gold badges113 silver badges133 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Aominé · Accepted Answer · 2018-07-21 10:57:20Z

0

you can try this bellow

chapter = [""]
for i in range(1,4):

  nb1=text.find("Chapter "+ "%02d" % (i,))
  nb2=text.find("Chapter "+ "%02d" % (i+1,))

  chapter.append(text[nb1:nb2])

for i in range(1,4):
    print(chapter[i])

or with regular expressions :

import re

chapter = re.split("Chapter [0-4]+\n", text)

for i in range(1,4):
    print(chapter[i])

edited Jul 21, 2018 at 10:57

answered Jul 21, 2018 at 10:40

Aominé

5104 silver badges11 bronze badges

1 Comment

End genocide - save Gaza Over a year ago

with delimiters like chapter, Chapter, CHAPTER, etc... and 1 or 2 digits (Chapter 1 or Chapter 01) This doesn't account for the variability in case in 'Chapter', or for chapter numbers out of the example's scope, or for numbers less than 10 without leading 0's (in the first code block, the regex expression does capture this last case).

endlish emmet · Accepted Answer · 2022-06-23 06:59:41Z

0

import re

# removing void strings.
splitted_str = list(filter(lambda x: x != '', re.split("Chapter [0-9]+", my_text)))
print(splitted_str)

edited Jun 23, 2022 at 6:59

answered Jun 23, 2022 at 6:59

endlish emmet

11 bronze badge

Collectives™ on Stack Overflow

Split text file Python

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related