How to build a Pandas Dataframe with a Numpy Array from Imported CSV data with multiple numbers

Question

I'm a little stumped on this one. I've created a proof of concept where I built a Pandas Dataframe with a static Numpy Array of numbers. I got this working fine, but now I'm taking it a step further and importing a CSV file to build this same Dataframe and Numpy Array. Here is the snippet of the file and what I've written. I want to take the second column of 'numbers' and build an array of 6 numbers per line. For example, [[11],[21],[27],[36],[62],[24]], [[14],[18],[36],[49],[67],[18]], etc.

CSV:

date,numbers,multiplier
09/26/2020,11 21 27 36 62 24,3
09/30/2020,14 18 36 49 67 18,2
10/03/2020,18 31 36 43 47 20,2

CODE:

data = pd.read_csv('pbhistory.csv')
data['date'] = pd.to_datetime(data.date, infer_datetime_format=True)
data.sort_values(by='date', ascending=True, inplace=True)
df = pd.DataFrame(data.numbers).to_numpy()
df2 = pd.DataFrame(df, columns=['1', '2', '3', '4', '5', '6'])
print(df2.head())

ERROR: I'm expecting 6 columns of data from df2 as I thought it was converted to an array properly after importing the 'numbers' column from the CSV, but I get the following:

ValueError: Shape of passed values is (1414, 1), indices imply (1414, 6)

So, I change the code to df2 = pd.DataFrame(df, columns=['1']) and get the following output. The problem is, I need it to be in 6 columns, not 1.

                   1
0  11 21 27 36 62 24
1  14 18 36 49 67 18
2  18 31 36 43 47 20

So, as you can see, I'm only getting one column with all numbers, instead of an array of numbers with 6 columns.

The column 'numbers' is a series of strings, eg the first element is '11 21 27 36 62 24'. Split this into separate columns first. — ndr
– ndr, Commented Jan 6, 2023 at 3:08

biyazelnut · Accepted Answer · 2023-01-06 03:26:49Z

1

data = pd.read_csv('pbhistory.csv')
data['date'] = pd.to_datetime(data.date, infer_datetime_format=True)
data.sort_values(by='date', ascending=True, inplace=True)
df = pd.DataFrame(data.numbers).to_numpy()

Then split it first

df2 = df['numbers'].str.split(' ', expand=True)

answered Jan 6, 2023 at 3:26

biyazelnut

2561 silver badge7 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Galo do Leste · Accepted Answer · 2023-01-06 03:15:56Z

0

Remember that CSV stands for Comma Separated Values, ie it reads everything between two commas as one input. If you want the numbers separated you have to put commas between them, otherwise you'll have to parse the longer text of 6 non-comma separated values and rebuild the dataframe.

answered Jan 6, 2023 at 3:15

Galo do Leste

7265 silver badges14 bronze badges

Collectives™ on Stack Overflow

How to build a Pandas Dataframe with a Numpy Array from Imported CSV data with multiple numbers

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related