0

Description

Basically my problem is about loading data from CSV files. I already made a code able to load a given number of columns inside arrays (see example). Now I would like to improve the code so I can change the number of column to read and load without modifying my code everytime. Said an other way, I would like my code to dynamically adapt to the number of columns I choose. Let me give you an example of my present code.

Code example

Steps :

1. With Tkinter I select the files I want to load, this part of the code returns file_path, containing the several file paths.

2 Then I define the useful parameters for CSV reading. I create the arrays I want to be loaded with my datas, and then I load the datas.

n = len(file_path)    # number of files

# here I just determine the size of each files with a custom function, m is the maximum size
all_size , m = size_data(file_path,row_skip,col_to_read,delim)

# I create the arrays
shape = (n, m)
time = zeros(shape)
CH1 = zeros(shape)

# define CSV parameters before using loadtxt
row_skip = 5
delim = ';'
col_to_read = (0,1)    # <= This is where I choose the columns to be read

# I load the arrays
for k in range(0, len(file_path)):
    end = all_size[k]    # this is the size of the array to be loaded.
                         # I do this in order to avoid the annoying error
                         # ValueError: could not broadcast input array from shape (20) into shape (50)

    time[k][:end], CH1[k][:end] = loadtxt(file_path[k],
                                           delimiter=delim,
                                           skiprows=row_skip,
                                           usecols=col_to_read,
                                           unpack=True)

My problem is that if each file has 3 columns, i.e col_to_read = (0,1,2), I have to add a new array CH2 = zeros(shape) during creation and during loading. I would like a solution that is dynamically adapting to the number of columns I want to load. Only col_to_read would be hand changed. Ideally I would like to implement this code inside a function, because I do a lot of data analysis and I don't want the same code being pasted on every program.

First idea

I already found a way to dynamically create a given number of zeros arrays (see here). That's quite direct.

dicty = {}
for i in file_path:
    dicty[i] = []

this seems good, but now I would like to make the last line working whatever the number of variables. I believe there is a convenient way to adapt my code and use this dicty, but there's something I don't understand and I'm stuck.

I would appreciate any help.

1 Answer 1

0

Well, I found a solution to this problem I had in my mind since few weeks. Asking it here surely helped me make the problem clearer.

I learned more about dictionaries, as it was something new for me, and I understood it was very powerfull. I could replace the whole code by few lines :

def import_data(file_path,row_skip,col_to_read,delim):

# file_path is all the PATH strings of CSV files
# row_skip just to start loading from certain row
# col_to_read = (0,1,2), where I choose the col to read
# delim = ';' the delimiter for my CSV files

    dicty = {}                       # create ditcionary
    for i in file_path:              # in order to associate each file
        dicty[i] = []                # with n columns

    for k in range(0, len(file_path)):
        dicty[file_path[k]] = loadtxt(file_path[k], delimiter=delim,
                                      skiprows=row_skip, usecols=col_to_read,
                                      unpack=True)

    # it gives
    # dicty = {'my_file1.csv': array([1,2,3]),
    #          'my_file2.csv': array([2,4,6]),
    #          'my_file3.csv': array([5,10,15])}

    return dicty

This is quite straightforward. The first entry of the dictionary will be filled with all the columns, and so on, and I don't need to tell the dictionary how much col I will give to it. Then to read the data I use dicty.get(file_path[0]) for example. This is maybe not optimal but I can surely create variables with for loop in order to get rid of the dicty.get() method.

Tell me what you think about it, especially about calculation time. Sometimes I have 20 files with 200 000 rows 3 col. Maybe I could optimize loading.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.