read file into array separated by paragraph Python

Question

I have a text file, I want to read this text file into 3 different arrays, array1 array2 and array3. the first paragraph gets put in array1, the second paragraph gets put in array2 and so on. the 4th paragraph will then be put in array1 element2 and so forth, paragraphs are separated by a blank line. any ideas?

Codahk · Accepted Answer · 2011-11-27 01:39:07Z

12

This is the basic code I would try:

f = open('data.txt', 'r')

data = f.read()
array1 = []
array2 = []
array3 = []
splat = data.split("\n\n")
for number, paragraph in enumerate(splat, 1):
    if number % 3 == 1:
        array1 += [paragraph]
    elif number % 3 == 2:
        array2 += [paragraph]
    elif number % 3 == 0:
        array3 += [paragraph]

This should be enough to get you started. If the paragraphs in the file are split by two new lines then "\n\n" should do the trick for splitting them.

edited Nov 27, 2011 at 1:39

answered Nov 27, 2011 at 1:33

Codahk

1,2203 gold badges15 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

MERose Over a year ago

with open('data.txt', 'r') as f: would be preferable.

Rob Cowie · Accepted Answer · 2011-11-27 02:34:30Z

4

import itertools as it


def paragraphs(fileobj, separator='\n'):
    """Iterate a fileobject by paragraph"""
    ## Makes no assumptions about the encoding used in the file
    lines = []
    for line in fileobj:
        if line == separator and lines:
            yield ''.join(lines)
            lines = []
        else:
            lines.append(line)
    yield ''.join(lines)

paragraph_lists = [[], [], []]
with open('/Users/robdev/Desktop/test.txt') as f:
    paras = paragraphs(f)
    for para, group in it.izip(paras, it.cycle(paragraph_lists)):
        group.append(para)

print paragraph_lists

answered Nov 27, 2011 at 2:34

Rob Cowie

22.6k6 gold badges65 silver badges58 bronze badges

1 Comment

traal Over a year ago

Huge plus for using a streaming state machine approach to splitting text by paragraph! This should be the preferred solution rather than simply split("\n\n") which has many suboptimal edge cases.

JKC · Accepted Answer · 2017-09-18 06:12:12Z

I know this question was asked long before but just putting my inputs so that it will be useful to somebody else at some point of time. I got to know much easier way to split the input file into paragraphs based on the Paragraph Separator(it can be a \n or a blank space or anything else) and the code snippet for your question is given below :

with open("input.txt", "r") as input:
    input_ = input.read().split("\n\n")   #\n\n denotes there is a blank line in between paragraphs.

And after executing this command, if you try to print input_[0] it will show the first paragraph, input_[1] will show the second paragraph and so on. So it is putting all the paragraphs present in the input file into an List with each List element contains a paragraph from the input file.

CPSO · Accepted Answer · 2019-05-01 12:48:58Z

2

This code will search for lines between two points:

rr = [] #Array for saving lines    
for f in file_list:
    with open(f, 'rt') as fl:
        lines = fl.read()
        lines = lines[lines.find('String1'):lines.find('String2')] 
        rr.append(lines)

answered May 1, 2019 at 12:48

CPSO

1301 silver badge7 bronze badges

Comments

Karl Knechtel · Accepted Answer · 2011-11-27 02:10:29Z

1

Because I feel like showing off:

with open('data.txt') as f:
    f = list(f)
    a, b, c = (list(__import__('itertools').islice(f, i, None, 3)) for i in range(3))

edited Nov 27, 2011 at 2:10

answered Nov 27, 2011 at 1:39

Karl Knechtel

61.4k14 gold badges131 silver badges193 bronze badges

6 Comments

Rob Cowie Over a year ago

That doesn't split the content of the file; The islice objects will iterate lines in the file.

Karl Knechtel Over a year ago

Each one is an iterator over a subset of lines in the file, which is then explicitly converted into a list. What's the problem? Edit: tested and found that everything ends up in the first iterator, for reasons I don't understand. This is fixed by reading from a list instead of a stream (which is what I tested originally); edited accordingly.

Rob Cowie Over a year ago

Yeah. OP wants to iterate paragraphs, not lines

Karl Knechtel Over a year ago

Follow-up: the problem occurs because each islice attempts to read the stream fully before the next gets to operate. Annoying; there ought to be a more elegant way to multiplex a stream. @Rob "paragraphs" are normally defined by newlines in text files; if the lines of text are explicitly wrapped then the OP needs to say so, and identify exactly what separates paragraphs.

Rob Cowie Over a year ago

I'd say paragraphs are usually separated by a blank line, so two newlines '\n\n'. There may be (and are likely to be) more than one line (and hence newline chars) within a paragraph. So simple line-based iteration is not enough. Anyway, the islice issues is an interesting one.

|

Bora Caglayan · Accepted Answer · 2011-12-10 15:22:52Z

0

Using slices would also work.

par_separator = "\n\n"
paragraphs = "1\n\n2\n\n3\n\n4\n\n5\n\n6".split(par_separator)
a,b,c = paragraphs[0:len(paragraphs):3], paragraphs[1:len(paragraphs):3],\
        paragraphs[2:len(paragraphs):3]

Within slice: [start index, end index,step]

answered Dec 10, 2011 at 15:22

Bora Caglayan

1511 silver badge7 bronze badges

Comments

brc · Accepted Answer · 2017-01-04 09:56:30Z

0

More elegant way to bypass slices:

def grouper(n, iterable, fillvalue=None):
    args = [iter(iterable)] * n
    return itertools.izip_longest(fillvalue=fillvalue, *args)

for p in grouper(5,[sent.strip() for sent in text.split('\n') if sent !='']):
    print p

Just make sure you deal with None in final text

answered Jan 4, 2017 at 9:56

brc

2692 silver badges6 bronze badges

Collectives™ on Stack Overflow

read file into array separated by paragraph Python

7 Answers 7

1 Comment

1 Comment

Comments

Comments

6 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

1 Comment

1 Comment

Comments

Comments

6 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related