0

I'm trying to figure out how to start a loop in Python that goes through a csv file. I believe it would be a while loop (can't use pandas for this assignment) but I'm not sure how to start. The file is from Kaggle - analyzing a page from Reddit trying to get the following:

the average number of comments across all posts the average score across all posts what the highest score is and the title for that post what the lowest score is and the title for that post what the most commented post is with its title and number of comments

this is what I have so far for importing the file:

import csv  #import csv file reddit_vm.csv

def analyze(entries):
    print(f'first entry: {entries[0]}')

with open("reddit_vm.csv", "r", encoding='UTF-8', errors="ignore") as input:
    entries = [(e['id'], int(e['score']), int(e['comms_num']), e['title']) for e in csv.DictReader(input)]
    avgScore = analyze(entries)

and this is what I think I need to do:

pseudocode:

need a variable to control the loop reading the lines while loop

average the number of comments across all posts

average score across all posts

largest variable for the highest score and print title smallest variable for lowest score

most_comments

7
  • Hi, read file using Pandas such as pd.read_csv("yourcsvfile.csv"), you will get dataframe which will be much easier to handle and work with. Commented Sep 7, 2021 at 7:06
  • @user2906838 that is what I want to do as well, but this is for an assignment that hasn't started using Pandas yet so I'm trying to find a simpler way. Commented Sep 7, 2021 at 7:10
  • Can you share some example of data and what you expect to get? Commented Sep 7, 2021 at 7:12
  • 1
    Ok, if that is the case, you can read the csv file using open file and read row by row, store the information needed in dict/list, since you have to calculate the max, average etc. If you can be a little specific, I could write the needed code as an answer. Commented Sep 7, 2021 at 7:16
  • The CSV file is organized by: title, score, id, url, comms_num, created, body, timestamp. I'm supposed to use a loop to read the file and find the things I listed above: averages, highest/lowest, etc. Commented Sep 7, 2021 at 7:17

2 Answers 2

1

as we discussed in the comments, the simple way to do it would be to read the csv file line by line and use the loops to later store the data in a dict containing the values of the columns into a list such that it is easier to do the aggregation later:

with open('sample1.csv', 'r') as f:
    #read from csv line by line, rstrip helps to remove '\n' at the end of line
    lines = [line.rstrip() for line in f] 

columnslist = lines[0].split(',')
numcolumns = len(columnslist)  # the number of column

result_dict = {}

for colm in columnslist:
    result_dict[colm] = [] # this is for holding the columns values in a single list seperetely.


for line in lines[1:]:
    words = line.split(',') #get the list by comma delimited
    for i in range(numcolumns):
        result_dict[columnslist[i]].append(words[i]) # add in the result dict

print(result_dict)

For example, I've the following CSV file:

enter image description here

The print statement would give the following dict: {'name': ['Vag', 'Sam', 'Harris'], 'score': ['0.9', '0.12', '0.98'], 'roll': ['11', '12', '13']}

As you can see, we have what we wanted in list so it's easier to analyze.

max_score = max(result_dict["score"])
min_score = min(result_dict["score"])
print(max_score, min_score)
# 0.98 0.12

Now you can do much more, but ya it is quite cumbersome without pandas.

Sign up to request clarification or add additional context in comments.

Comments

1

I strongly advise you to use pandas for this. These are basic operations:

import pandas as pd

df = pd.read_csv("filename.csv") #read csv file
print(df['comms_num'].mean()) # print mean number of comments, assuming they are integers
print(df['score'].mean()) # print score mean
df.sort_values('score', ascending=False).head(10) #sort the dataframe by score and display the first 10 rows

1 Comment

I agree that pandas would be much simpler; unfortunately, I am not allowed to for this assignment.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.