1

I have over a thousand array categories in a text file, for example:

Category A1 and Cateogry A2: (array in matlab code)

A1={[2,1,2]};
A1={[4,2,1,2,3]};
A2={[3,3,2,1]};
A2={[4,4,2,2]};
A2={[2,2,1,1,1]};

I would like to use Python to help me read the file and group them into:

A1=[{[2,1,2]} {[4,2,1,2,3]}];  
A2=[{[3,3,2,1]} {[4,4,2,2]} {[2,2,1,1,1]}];
0

2 Answers 2

4

Use a dict to group, I presume you mean group as strings as they are not valid python containers coming from a .mat matlab file:

from collections import OrderedDict
od = OrderedDict()
with open("infile") as f:
    for line in f:
        name, data = line.split("=")
        od.setdefault(name,[]).append(data.rstrip(";\n"))

from pprint import pprint as pp
pp((od.values()))
[['{[2,1,2]}', '{[4,2,1,2,3]}'],
['{[3,3,2,1]}', '{[4,4,2,2]}', '{[2,2,1,1,1]}']]

To group the data in your file just write the content:

with open("infile", "w") as f:
    for k, v in od.items():
        f.write("{}=[{}];\n".format(k, " ".join(v))))

Output:

A1=[{[2,1,2]} {[4,2,1,2,3]}];
A2=[{[3,3,2,1]} {[4,4,2,2]} {[2,2,1,1,1]}];

Which is actually your desired output with the semicolons removed from each sub array, the elements grouped and the semicolon added to the end of the group to keep the data valid in your matlab file.

The collections.OrderedDict will keep the order from your original file where using a normal dict will have no order.

A safer approach when updating a file is to write to a temp file then replace the original file with the updated using a NamedTemporaryFile and shutil.move:

from collections import OrderedDict

od = OrderedDict()
from tempfile import NamedTemporaryFile
from shutil import move

with open("infile") as f, NamedTemporaryFile(dir=".", delete=False) as temp:
    for line in f:
        name, data = line.split("=")
        od.setdefault(name, []).append(data.rstrip("\n;"))
    for k, v in od.items():
        temp.write("{}=[{}];\n".format(k, " ".join(v)))
move(temp.name, "infile")

If the code errored in the loop or your comp crashed during the write, your original file would be preserved.

Sign up to request clarification or add additional context in comments.

4 Comments

Any particular reason for using an OrderedDict?
@SvenMarnach, it is a file so I presume order matters, the OP's desired output is also in order. Will a normal dict have the same order?
Claiming that the OrderedDict will "keep the order of the original file" is misleading. The original file might have lines in the order A2, A2, A1, A2, A1, A1. The ordered dict will end up with the keys A2, A1, the order of the first appearance of each key. If you assume that the lines are also grouped by key, keeping the order might make sense, but I can't see how it makes sense without that assumption. And with this assumption, I'd go for a solution using itertools.groupby.
@SvenMarnach, I used the input given and matched the expected output wanted which the only way I could guarantee doing so was using an OrderedDict. Even if the order is A2, A2, A1, A2, A1, A1. you are still keeping the order of the first time you see each key so there is still order as opposed to no order whatsoever using a normal dict, I don't see how that is misleading at all. Yes a groupby would also work once the elements are grouped but as I don't have the full file content then I cannot say for sure.
3

You can first loop over you lines and then split your lines with = then use ast.literal_eval and str.strip to extract the list within brackets and at last use a dictionary with a setdefault method to get your expected result :

import ast
d={}
with open('file_name') as f :
    for line in f:
        var,set_=line.split('=')
        d.setdefault(var,[]).append(ast.literal_eval(set_.strip("{}\n;")))
    print d

result :

{'A1': [[2, 1, 2], [4, 2, 1, 2, 3]], 'A2': [[3, 3, 2, 1], [4, 4, 2, 2], [2, 2, 1, 1, 1]]}

If you want the result to be exactly as your expected format you can do :

d={}
with open('ex.txt') as f,open('new','w')as out:
    for line in f:
        var,set_=line.split('=')
        d.setdefault(var,[]).append(set_.strip(";\n"))
    print d
    for i,j in d.items():
        out.write('{}=[{}];\n'.format(i,' '.join(j)))

At last you'll have the following result in new file :

A1=[{[2,1,2]} {[4,2,1,2,3]}];
A2=[{[3,3,2,1]} {[4,4,2,2]} {[2,2,1,1,1]}];

6 Comments

where does the OP say anything about creating lists of lists? You also don't need translate when a simple strip will work
@PadraicCunningham Op does't says but it seems that he/she wants a data structure contain the arrays.
No it seems they want to group lines in their file
@PadraicCunningham Yeah strip is more straight, and this is a suggestion and relative to OP's request!
The file is a .mat file which is matlab not python, have you looked at the desired output?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.