How to group array of the same name using Python?

Question

I have over a thousand array categories in a text file, for example:

Category A1 and Cateogry A2: (array in matlab code)

A1={[2,1,2]};
A1={[4,2,1,2,3]};
A2={[3,3,2,1]};
A2={[4,4,2,2]};
A2={[2,2,1,1,1]};

I would like to use Python to help me read the file and group them into:

A1=[{[2,1,2]} {[4,2,1,2,3]}];  
A2=[{[3,3,2,1]} {[4,4,2,2]} {[2,2,1,1,1]}];

Padraic Cunningham · Accepted Answer · 2015-05-31 11:21:20Z

4

Use a dict to group, I presume you mean group as strings as they are not valid python containers coming from a .mat matlab file:

from collections import OrderedDict
od = OrderedDict()
with open("infile") as f:
    for line in f:
        name, data = line.split("=")
        od.setdefault(name,[]).append(data.rstrip(";\n"))

from pprint import pprint as pp
pp((od.values()))
[['{[2,1,2]}', '{[4,2,1,2,3]}'],
['{[3,3,2,1]}', '{[4,4,2,2]}', '{[2,2,1,1,1]}']]

To group the data in your file just write the content:

with open("infile", "w") as f:
    for k, v in od.items():
        f.write("{}=[{}];\n".format(k, " ".join(v))))

Output:

A1=[{[2,1,2]} {[4,2,1,2,3]}];
A2=[{[3,3,2,1]} {[4,4,2,2]} {[2,2,1,1,1]}];

Which is actually your desired output with the semicolons removed from each sub array, the elements grouped and the semicolon added to the end of the group to keep the data valid in your matlab file.

The collections.OrderedDict will keep the order from your original file where using a normal dict will have no order.

A safer approach when updating a file is to write to a temp file then replace the original file with the updated using a NamedTemporaryFile and shutil.move:

from collections import OrderedDict

od = OrderedDict()
from tempfile import NamedTemporaryFile
from shutil import move

with open("infile") as f, NamedTemporaryFile(dir=".", delete=False) as temp:
    for line in f:
        name, data = line.split("=")
        od.setdefault(name, []).append(data.rstrip("\n;"))
    for k, v in od.items():
        temp.write("{}=[{}];\n".format(k, " ".join(v)))
move(temp.name, "infile")

If the code errored in the loop or your comp crashed during the write, your original file would be preserved.

edited May 31, 2015 at 11:21

answered May 31, 2015 at 10:39

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Sven Marnach Over a year ago

Any particular reason for using an OrderedDict?

Padraic Cunningham Over a year ago

@SvenMarnach, it is a file so I presume order matters, the OP's desired output is also in order. Will a normal dict have the same order?

Sven Marnach Over a year ago

Claiming that the OrderedDict will "keep the order of the original file" is misleading. The original file might have lines in the order A2, A2, A1, A2, A1, A1. The ordered dict will end up with the keys A2, A1, the order of the first appearance of each key. If you assume that the lines are also grouped by key, keeping the order might make sense, but I can't see how it makes sense without that assumption. And with this assumption, I'd go for a solution using itertools.groupby.

Padraic Cunningham Over a year ago

@SvenMarnach, I used the input given and matched the expected output wanted which the only way I could guarantee doing so was using an OrderedDict. Even if the order is A2, A2, A1, A2, A1, A1. you are still keeping the order of the first time you see each key so there is still order as opposed to no order whatsoever using a normal dict, I don't see how that is misleading at all. Yes a groupby would also work once the elements are grouped but as I don't have the full file content then I cannot say for sure.

Kasravnd · Accepted Answer · 2015-05-31 11:09:33Z

3

You can first loop over you lines and then split your lines with = then use ast.literal_eval and str.strip to extract the list within brackets and at last use a dictionary with a setdefault method to get your expected result :

import ast
d={}
with open('file_name') as f :
    for line in f:
        var,set_=line.split('=')
        d.setdefault(var,[]).append(ast.literal_eval(set_.strip("{}\n;")))
    print d

result :

{'A1': [[2, 1, 2], [4, 2, 1, 2, 3]], 'A2': [[3, 3, 2, 1], [4, 4, 2, 2], [2, 2, 1, 1, 1]]}

If you want the result to be exactly as your expected format you can do :

d={}
with open('ex.txt') as f,open('new','w')as out:
    for line in f:
        var,set_=line.split('=')
        d.setdefault(var,[]).append(set_.strip(";\n"))
    print d
    for i,j in d.items():
        out.write('{}=[{}];\n'.format(i,' '.join(j)))

At last you'll have the following result in new file :

A1=[{[2,1,2]} {[4,2,1,2,3]}];
A2=[{[3,3,2,1]} {[4,4,2,2]} {[2,2,1,1,1]}];

edited May 31, 2015 at 11:09

answered May 31, 2015 at 10:45

Kasravnd

108k19 gold badges167 silver badges195 bronze badges

6 Comments

Padraic Cunningham Over a year ago

where does the OP say anything about creating lists of lists? You also don't need translate when a simple strip will work

Kasravnd Over a year ago

@PadraicCunningham Op does't says but it seems that he/she wants a data structure contain the arrays.

Padraic Cunningham Over a year ago

No it seems they want to group lines in their file

Kasravnd Over a year ago

@PadraicCunningham Yeah strip is more straight, and this is a suggestion and relative to OP's request!

Padraic Cunningham Over a year ago

The file is a .mat file which is matlab not python, have you looked at the desired output?

|

Collectives™ on Stack Overflow

How to group array of the same name using Python?

2 Answers 2

4 Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related