4

I am trying to load data from csv by row, then create 2d array out of each row and store it inside array:

loading:

with open('data_more.csv', newline='') as csvfile:
    data = list(csv.reader(csvfile))

parsing:

def getTrainingData():
    label_data = []
    for i in range( 0 , len(data) - 1):
        y = list(data[i][1:41:1])
        y = list(map(lambda x: list(map(lambda z: int(z),x)),y))
        y = create2Darray(y)
        label_data.append(y)
    labelY = np.array(label_data,dtype=float)

create2Darray func:

def create2Darray( arr ):
    final_arr = []
    index = 0
    while( index < len(arr)):
        temp = arr[index:index+4:1]
        final_arr.append(temp)
        index+=4
    return final_arr

This is simple task, yet i keep recieving erro:

ValueError: setting an array element with a sequence.

I have read that its related to situation when the shape of elements isnt same. However when i print shape of all elements inside labelY it outputs same shape.

What is causing this problem then? The problem occurs on this line

labelY = np.array(label_data,dtype=float)

my csv has format

number, number, number

basicly N numbers in row separated by "," example thanks for help.

9
  • What does your variable data look like? Commented Apr 3, 2018 at 17:44
  • Have you tried np.genfromtxt? Commented Apr 3, 2018 at 17:45
  • my data variable looks like 2D array (atleast after printing ) Commented Apr 3, 2018 at 17:47
  • what about using pandas to read the csv and perform operations and then using the loc and iloc to slice them into series that'll directly convert to np.array. You can also use the .dropna() to drop any value that is NoneType Commented Apr 3, 2018 at 17:49
  • @iam.Carrot would you mind to put example? I am not very familiar with python. Thanks Commented Apr 3, 2018 at 17:49

2 Answers 2

1

Let's start from the beginning:

  1. You seem to want to iterate through every line of your file to create an array. The iteration should be over range(0, len(data)), not range(0, len(data) - 1): the last element of the range is exclusive, so you are currently skipping the last line. In fact, you can write simply range(len(data)), or what is even more Pythonic, do

    for y in data:
        y = y[1:41]
    
  2. Based on what comes later, you want the 40 elements of y starting with the second element. In that case y[1:41] is correct (you don't need the trailing :1). If you didn't mean to skip the first element, use y[0:40], or more Pythonically y[:40]. Remember that the indexing is zero-based and the stop index is exclusive.

  3. Each element of your y list is not a number. It is a string, which you read from a file. Normally, you could convert it to a list of numbers using

    y = [float(x) for x in y]
    

    OR

    y = list(map(float, y))
    

    Your code is instead creating a nested list for each element, splitting it by its digits. Is this really what you intend? It certainly does not seem that way from the rest of the question.

  4. create2Darray seems to expect a list of 4n numbers, and break it into a 2D list of size n-by-4. If you want to keep using pure Python at this point, you can shorten the code using range:

    def create2Darray(arr):
        return [arr[i:i + 4] for i in range(0, len(arr), 4)]
    
  5. The result of the 2D operation is appended to a 3D list with label_data.append(y). Currently, because of the digit splitting, label_data is a 4D list with a ragged 4th dimension. It is pretty inefficient to append a list that way. You would do much better to have a small function containing the statements in the body of your for loop, and use that in a list comprehension.
  6. Finally, you convert your 4D array (which should probably be 3D), into a numpy array. This operation fails because your numbers don't all have the same number of digits. Once you fix step #3, the error will go away. There still remains the question of why you want dtype=np.float when you explicitly converted everything to an int, but that is for you to figure out.
  7. Don't forget to add a return value to getTrainingData!

TL;DR

The simplest thing you can really do though, is to do all the transformations after you convert the file to a 2D numpy array. Your program could be rewritten as

with open('data_more.csv', newline='') as file:
    reader = csv.reader(file)
    data = [float(x) for x in line[1:] for line in reader]
data = np.array(data).reshape(data.shape[0], -1, 4)
Sign up to request clarification or add additional context in comments.

Comments

0

With a copy-n-paste from your link:

In [367]: txt="""frame_video_02_0.jpg,126,37,147,112,100,41,126,116,79,34,96,92,
     ...: 68,31,77,88,1
     ...: """
In [368]: txt=txt.splitlines()
In [369]: data =np.genfromtxt(txt, delimiter=',')

data is a 2d array of floats:

In [370]: data.shape
Out[370]: (3, 401)
In [371]: data[0,:10]
Out[371]: array([ nan, 126.,  37., 147., 112., 100.,  41., 126., 116.,  79.])

The first column is nan, because it's a text that can't be made into a float. I could remove it with data = data[:, 1:]

I can load the file names separately:

In [373]: labels = np.genfromtxt(txt, delimiter=',', usecols=[0],dtype=None,encoding=None)
In [374]: labels
Out[374]: 
array(['frame_video_02_0.jpg', 'frame_video_02_50.jpg',
       'frame_video_02_100.jpg'], dtype='<U22')

I haven't tried debug your code, though with a file like this, reading the numbers into a Python list of lists shouldn't be hard.

5 Comments

Why not just pass range(1,41) to usecols the first time around?
@MadPhysicist. I'm too lazy to use a complicate usecols parameter like that!
I'm not sure it would even work. Hoping you would test if for me :)
A range does work - but you have to know ahead of time how many columns there are.
@hpaulij. OP seems to have a particular number in mind. The 2D convert thing implies that they want a multiple of 4 at all times.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.