1

I am trying to import a .csv file containing various stock prices into a Python script inside a getData() function but I am having trouble with indexes and can't see how to resolve the problem.

I am new to both CSV and NumPy so am unsure where the problem is exactly, but when I attempt to run this code I receive the following:

File "../StockPlot.py", line 20, in getData date[i-1] = data[0] IndexError: index 0 is out of bounds for axis 0 with size 0

import numpy as np
import matplotlib.pyplot as plt
import csv

def getData():
  date = np.array([])
  openPrice = np.array([])
  closePrice = np.array([])
  volume = np.array([])

  i = 1
  with open('aapl.csv', 'rb') as f:
      reader = csv.reader(open('aapl.csv'))
      data_as_list = list(reader)
      items = len(data_as_list)

      while i < items:
          data = data_as_list[i]
          date[i-1] = data[0]
          openPrice[i-1] = data[1]
          closePrice[i-1] = data[4]
          volume[i-1] = data[5]
          i += 1

  return date, openPrice, closePrice, volume

getData()

The AAPL.csv file I am trying to read has lines taking the form:

Date, Open, High, Low, Close, Volume

26-Jul-17,153.35,153.93,153.06,153.46,15415545

25-Jul-17,151.80,153.84,151.80,152.74,18853932

24-Jul-17,150.58,152.44,149.90,152.09,21493160

I would appreciate any help solving this problem, it seems that the data_as_list is a list of lists of each line, and after playing around with the print function it seems to be printing data[0] etc. inside the while loop but won't allow me to assign the values to the arrays I have created

1 Answer 1

4

IMO it's much more convenient to use Pandas for that:

import pandas as pd

fn = r'/path/to/AAPL.csv'    
df = pd.read_csv(fn, skipinitialspace=True, parse_dates=['Date'])

Result:

In [83]: df
Out[83]:
        Date    Open    High     Low   Close    Volume
0 2017-07-26  153.35  153.93  153.06  153.46  15415545
1 2017-07-25  151.80  153.84  151.80  152.74  18853932
2 2017-07-24  150.58  152.44  149.90  152.09  21493160

As numpy 2D array:

In [84]: df.values
Out[84]:
array([[Timestamp('2017-07-26 00:00:00'), 153.35, 153.93, 153.06, 153.46, 15415545],
       [Timestamp('2017-07-25 00:00:00'), 151.8, 153.84, 151.8, 152.74, 18853932],
       [Timestamp('2017-07-24 00:00:00'), 150.58, 152.44, 149.9, 152.09, 21493160]], dtype=object)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.