7

Let's have a CSV table with row and column headers, e.g.:

, "Car", "Bike", "Boat", "Plane", "Shuttle"
"Red", 1, 7, 3, 0, 0
"Green", 5, 0, 0, 0, 0
"Blue", 1, 1, 4, 0, 1

I want to get row and column headers, i.e.:

col_headers = ["Car", "Bike", "Boat", "Plane", "Shuttle"]
row_headers = ["Red", "Green", "Blue"]
data = [[1, 7, 3, 0, 0],
        [5, 0, 0, 0, 0],
        [1, 1, 4, 0, 1]]

Of course I can do something like

import csv
with open("path/to/file.csv", "r") as f:
    csvraw = list(csv.reader(f))
col_headers = csvraw[1][1:]
row_headers = [row[0] for row in csvraw[1:]]
data = [row[1:] for row in csvraw[1:]]

...but it does not look Pythonic enough.

Is there a neater way for this natural operation?

1
  • @martineau There was a mistake in my code. csvraw = list(csv.reader(f)). Commented Nov 10, 2012 at 19:25

5 Answers 5

4

Take a look at csv.DictReader.

If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as the fieldnames.

Then you can just do reader.fieldnames. This, of course, only gives you column headers. You would still have to parse the row headers manually.

I think your original solution is pretty good, however.

Sign up to request clarification or add additional context in comments.

Comments

2

Now I see that what I want is the easiest (and the most robust) to accomplish with Pandas.

import pandas as pd
df = pd.read_csv('foo.csv', index_col=0)

And if I want, it is easy to extract:

col_headers = list(df.columns)
row_headers = list(df.index)

Otherwise, in the "raw" Python, it seems that the method I wrote in the question is "good enough".

Comments

0

I am aware that this solution gives you another output format than the requested, but it is very convenient. This reads the csv line into a dictionary:

reader = csv.reader(open(parameters_file), dialect)

keys = [key.lower() for key in reader.next()]
for line in reader:
    parameter = dict(zip(keys, cells))

2 Comments

Why would you post this when I'd already posted about DictReader, which does this without the need to do it manually?
@Lattyware, because it allows you to transform the field. For example make them lower key or even filter them.
0

Without 3rd party libs (and if you can live with results being tuples from the zip op):

import csv

with open('your_csv_file') as fin:
    csvin = csv.reader(fin, skipinitialspace=True)
    col_header = next(csvin, [])[1:]
    row_header, data = zip(*((row[0], row[1:]) for row in csvin))

Gives you for col_header, row_header and data:

['Bike', 'Boat', 'Plane', 'Shuttle']
('Red', 'Green', 'Blue')
(['1', '7', '3', '0', '0'], ['5', '0', '0', '0', '0'], ['1', '1', '4', '0', '1'])

Comments

0

Agree, pandas is the best I found too. I'm interested into read specific values of my dataframe. Here is what I did:

import pandas as pd
d=pd.read_csv(pathToFile+"easyEx.csv")
print(d)
print(d.index.values)
print(d.index.values[2])
print(d.columns.values)
print(d.columns.values[2])
print(pd.DataFrame(d,index=['Blue'],columns=['Boat'])+0.333)

And this is what it returns:

         Car  Bike  Boat  Plane  Shuttle                                                                                                            
Red      1     7     3      0        0                                                                                                            
Green    5     0     0      0        0                                                                                                            
Blue     1     1     4      0        1                                                                                                            
['Red' 'Green' 'Blue']                                                                                                                            
Blue                                                                                                                                              
['Car' 'Bike' 'Boat' 'Plane' 'Shuttle']                                                                                                           
Boat                                                                                                                                              
      Boat                                                                                                                                        
Blue     4.333                              

Note that I can check row names with 'index' and column names with 'columns'. Also note that I can read a specific element of the 'dataframe' base on its row and column names, and that the values are still numeric, that why I added '+0.333' in the last print.

I chaged the data file, I removed the quote chars (" ") and spaces after commas in the first row. Here you have 'easyEx.csv' file:

Car,Bike,Boat,Plane,Shuttle
Red, 1, 7, 3, 0, 0
Green, 5, 0, 0, 0, 0
Blue, 1, 1, 4, 0, 1

Hope this can help =)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.