0

I have a csv file with 'n' columns. I need to get the rowcount of each column using the column name and give out a dictionary of the following format:

csv_dict= {col_a:10,col_b:20,col_c:30}

where 10,20 and 30 are the row count of col a, b and c respectively. I obtained a list of columns using fieldnames option of Dictreader. Now i need the row count of every column in my list.

This is what I tried:

for row in csv.DictReader(filename):
    col_count= sum(1 for row['col_a'] in re)+1

This just gets the row count of column a. How to get the row counts of all the columns in my list and put them in a dictionary in the above mentioned format? Any help appreciated. Thanks and regards.

2 Answers 2

4
You can try this:
#Save this file with FileName.csv
Name,age,DOB abhijeet,17,17/09/1990 raj,17,7/09/1990 ramesh,17,17/09/1990 rani,21,17/09/1990 mohan,21,17/09/1990 nil,25,17/09/1990
#Following is the python code. import csv
from collections import defaultdict

columns = defaultdict(list) # each value in each column is appended to a list

with open('FileName.csv') as f:
    reader = csv.DictReader(f) # read rows into a dictionary format
    for row in reader: # read a row as {column1: value1, column2: value2,...}
        for (k,v) in row.items(): # go over each column name and value
            if not v=='':
                columns[k].append(v) # append the value into the appropriate list
                                 # based on column name k

print len(columns['Name'])     #print the length of the specified column
print len(columns['age'])     #print the length of the specified column
print len(columns['DOB'])     #print the length of the specified column
Sign up to request clarification or add additional context in comments.

4 Comments

can you please come up with fewer lines of code since this isn't exactly what I was looking for.. my list is dynamic and so are the columns.
abhijeetmote's approach at least shows you how to get some hands on the counting process. if you precede the line columns[k].append(v) with something like if not v=='' then you will get the count of the actual items for each column. at least that is what i found by trying
Yes I agree that. I just wanted to know if he can provide an easier way to do that. His solution does work, anyway.
@Tania, You need to use DictReader which reads the data in dictionary, key value pair and after to check the value is empty or not you need to use DictReader. row.items() gives facility to iterate the key-value pair.
1

I would use pandas!

# FULLNAME= path/filename.extension of CSV file to read
data = pd.read_csv(FULLNAME, header=0)

# counting empty values
nan_values = data.isnull().sum()

# multiply by -1
ds = nan_values.multiply(-1)

# add total of rows from CSV
filled_rows = ds.add(len(data))

# create dict from data series
csv_dict = filled_rows.to_dict()

If you want to preserve column name order, use an OrderedDict

csv_dict_ordered = OrderedDict()
for idx in filled_rows.index:
    csv_dict_ordered[idx] = filled_rows[idx]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.