1

I'm reading in data and trying to create a NumPy array of shape (194, 1). So it should look like: [[4], [0], [9], ...]

I'm doing this:

def parse_data(file_name):
    data = []
    target = []
    with open(file_name) as f:
        for line in f:
            temp = line.split()
            x = [float(x) for x in temp[:2]]
            y = float(temp[2])
            data.append(np.array(x))
            target.append(np.array(y))
    return np.array(data), np.array(target)

x, y = parse_data("data.txt")

when I inspect y.shape, it's (194,), not (194,1) as I expected.

The x has shape (194,2) as I'd expect, however.

Any idea what I'm doing incorrectly?

Thanks!

1
  • Can you provide some lines from data.txt? Commented Apr 12, 2018 at 20:22

3 Answers 3

3

You seem to have expected np.array(y) to automatically turn your scalar y into a 1-element row. That's not how NumPy works.

np.array(y) is 0-dimensional. Putting a bunch of those in a list and calling array on the list produces a 1-dimensional result, not a 2-dimensional one.

Sign up to request clarification or add additional context in comments.

Comments

1

When np.array() is called on a list of numpy arrays built from scalars it concatenates them and then creates a numpy array, giving you your (194,) shape.

You can always reshape y to your desired shape:

def parse_data(file_name):
    data = []
    target = []
    with open(file_name) as f:
        for line in f:
            temp = line.split()
            x = [float(x) for x in temp[:2]]
            y = float(temp[2])
            data.append(np.array(x))
            target.append(y)
    return np.array(data), np.array(target).reshape(-1, 1)

x, y = parse_data("data.txt")

Of course you can also fix your problem with:

target.append(np.array([y]))

An example of the behavior I stated above:

import numpy as np
a = np.array(5)
b = np.array(4)
v = [a, b]
v
>>>[array(5), array(4)]
np.array(v)
>>>array(5, 4) #arrays are concatenated

3 Comments

return np.array(data), np.array(target).reshape(-1, 1) likely to be better, in case the amount of data varies.
I think it should be (-1, 1), not (1, -1).
You could also use return np.array(data), np.vstack(target) A bit more concise and works as long as target is 1D.
0

I'd skip the np.array in the iteration.

def parse_data(file_name):
    data = []
    target = []
    with open(file_name) as f:
        for line in f:
            temp = line.split()
            x = [float(x) for x in temp[:2]]
            y = float(temp[2])
            data.append(x)
            target.append(y)
    return np.array(data), np.array(target)

This would create data like:

 [[1.0, 2.0],[3.0, 4.0], ....]

and target like

 [1.2, 3.2, 3.1, ...]

np.array(data) then turns the list of lists into a 2d array, and the list of numbers into a 1d array.

It is then easy to reshape or add a dimension to the 1d, making it (1,n) or (n,1) or what ever you need.

Remember the basic array construction methods are:

np.array([1,2,3])             # 1d
np.array([[1,2],[3,4]])       # 2d

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.