0

I have a function that works as follows to read a .csv file and store it in an array.

def read_csv(self, filename, delimiter = ',', quotechar = '"'):
    reader = csv.reader(open(filename, 'rb'), delimiter = delimiter, quotechar = quotechar)
    # read first line and extract its data 
    self.column_headings = np.array(next(reader))
    # read subsequent lines
    rows = []
    for row in reader:
        rows.append(row)
    self.data = np.array(rows)
    self.m, self.n = self.data.shape

I'm simply trying to read a .tsv file so that it will return in the same form. I have this so far :

traindata = np.array(p.read_table('train.tsv'))[:,2]

However, when I try to call :

m, n = traindata.data.shape

# Display
print m, n, traindata.column_headings

I get the error :

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-14-1f877ccb37b5> in <module>()
----> 1 m, n = traindata.data.shape

AttributeError: 'buffer' object has no attribute 'shape'

What is causing this issue and how can I fix it ?

2
  • if you're reading a .tsv file (ie. a tab separated value file) should the delimiter not be '\t'? Commented Feb 9, 2014 at 21:06
  • @superjump The read_csv function is just for reading csv files, I'd like the tsv I am reading to be in the same format but cannot seem to call the print m, n, traindata.column_headings function :) Commented Feb 9, 2014 at 21:12

1 Answer 1

1

You explicitly create a list of traindata:

traindata = list(np.array(p.read_table('train.tsv'))[:,2])
          # ^ here

If you want to use it as a numpy.array, remove the list() call:

traindata = np.array(p.read_table('train.tsv'))[:,2]

Secondly, you want the shape of the array, not its data:

m, n = traindata.shape
Sign up to request clarification or add additional context in comments.

7 Comments

Oh. God. I need a coffee. Sorry, didn't mean to spam like this, you know sometimes how your eyes get crossed from coding too long?! Sorry again, thank you :)
You have changed the question, meaning my answer is no longer correct. However, I have added an answer to your second problem.
Thank you very much for this. It is unfortunately now returning the error --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-10-e8668026004b> in <module>() ----> 1 m, n = traindata.shape ValueError: need more than 1 value to unpack however. Would you please be able to offer some guidance here?
It appears that your traindata is one-dimensional, so shape is a 1-tuple and can't be unpacked to two values.
This confuses me, as my traindata isn't one-dimensional. An extract of it I have uploaded here : pastebin.com/w9tc7tHZ and the full amount can be downloaded from kaggle.com/c/stumbleupon/data (train.tsv file). I don't understand why it should produce this error message, unless perhaps I am handling quotation marks incorrectly? Apologies for the nuisance here, but any help would be very much appreciated :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.