0

I recently changed my religion from Matlab to Python for my data-analysis. I often need to import a lot of data-files which I used to store in a structure array in Matlab. This way I could both store a lot of different data in a single structure-array where the parameters to which the data belongs will be stored within this structure. I would end up with structures like:

data(i).parameterx = a
data(i).parametery = b
data(i).data1 = [3, 5, 7, 8, 423, 2, 56, 6]
data(i).data2 = [4, 2, 5; 4, 6, 2; 5, 1, 3]

The length of data is the number of data-files I imported. This way I can retrieve all the parameters of each data-set based on the index of data. So I want to do the same in Python. I tried using class but it seems that I can't store it in an array. What I tried was the following:

#!/usr/bin/env python

# Import modules
import numpy                as np
import fnmatch
import os

# Set up all elements of the path
class path:
    main, dir = 'root/dir', 'data_dir'
    complete = os.path.join(main,dir)
    ext = 'odt'     # Extension of the file to look for

# Initialize data array and data-class
all_data=[]
class odt_data:
    pass

# Loop through all files in path.complete and find all ODT-files
for (root, dirnames, filenames) in os.walk(path.complete):
    for filename in fnmatch.filter(filenames, '*.'+path.ext):
        # Only ODT-files
        # Extract parameters from filename
        parameters = filename.split('__')
        odt_data.parameterx = parameters[0]; odt_data.parametery = parameters[2]

        # Import data from file and assign to correct attribute of 'odt_data'
        file=os.path.join(root, filename)
        data=np.loadtxt(file)
        odt_data.data1 = data[:,0]; odt_data.data2 = data[:,1]

        # Append new data to array of all data
        all_data.append(odt_data)

The problem here is that is doesn't really save the data in odt_data but rather the reference to it. Hence when I do print(all_data) the result is:

[<class '__main__.odt_data'>, <class '__main__.odt_data'>, <class '__main__.odt_data'>]

And therefore only the last imported data is stored: all_data[0] is exactly the same as all_data[1] and so on. Hence, I'm not able to access the data of the first or second imported file, only that of the last one. So when I call all_data[1].parameterx I get the parameterx value of the last imported file, not the second! NOTE: I don't care about how it prints, I only care about accessing all the imported data

Anybody knows a way to store the actual data in class odt_data in an array (or possibly use something else than a class)

0

1 Answer 1

1

You just need to initialize the class in the loop.

However, this isn't how you would store such data in Python. You don't benefit from using a class here (in either case). Python actually has structured arrays, but it is overkill in this situation. The simplest solution would be to use a list of dicts:

# Import modules
import numpy as np
import os

main = 'root/dir'
dir = 'data_dir'
complete = os.path.join(main,dir)
ext = '.odt'     # Extension of the file to look for

# Initialize data array
all_data = []

for root, _, filenames in os.walk(path.complete):
    for filename in filenames:
        if os.path.splitext(filename)[-1] != ext:
            continue

        odt_data = {}
        # Only ODT-files
        # Extract parameters from filename
        odt_data['parameterx'], odt_data['parametery'] = filename.split('__')

        # Import data from file and assign to correct attribute of 'odt_data'
        fname = os.path.join(root, filename)
        data = np.loadtxt(fname)
        odt_data['data1'] = data[:,0]
        odt_data['data2'] = data[:,1]

        # Append new data to array of all data
        all_data.append(odt_data)
Sign up to request clarification or add additional context in comments.

2 Comments

List of dicts is indeed the way to go here. I have one additional question about your script: what does the pass in your if-statement do? Because how I interpret pass it means it just goes out of the if-statement and the part after that is executed whether the condition is true or not.
Sorry, that was supposed to be continue, not pass. I fixed it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.