So I've currently got a dataset that has a column called 'logid' which consists of 4 digit numbers. I have about 200k rows in my csv files and I would like to count each unique logid and output it something like this;
Logid | #ofoccurences for each unique id. So it might be 1000 | 10 meaning that the logid 1000 is seen 10 times in the csv file column 'logid'. The separator | is not necessary, just making it easier for you guys to read. This is my code currently:
import pandas as pd
import os, sys
import glob
count = 0
path = "C:\\Users\\cam19\\Desktop\\New folder\\*.csv"
for fname in glob.glob(path):
df = pd.read_csv(fname, dtype=None, names=['my_data'], low_memory=False)
counts = df['my_data'].value_counts()
counts
Using this I get a weird output that I dont quite understand:
4 16463
10013 490
pserverno 1
Name: my_data, dtype: int64
I know I am doing something wrong in the last line
counts = df['my_data'].value_counts()
but I am not too sure what. For reference the values I am extracting are from row C in the excel file (so I guess thats column 3?) Thanks in advance!
csvfile structure?