2

I'm attempting to write a program in python that searches ~27,000 rows for each string in a list. Each string I am searching for is in one column, and has an 'id' value in another column that I would like printed if found. The code I currently have counts the number of times that string appears in the document, but I am still unable to find a way to return specific values for each unique row in which the strings are found.

import csv
fin = open('data.csv')
words = ["happy","sad","good","bad","sunny","rainy"]
found = {}
count = 0
for line in fin:
    for word in words:
        if word in line:
            count = count + 1
    found[word] = count
print(found)

The main semantic problem with the code above is that printing the 'found' dictionary only yields one of the results and its count from the 'words' list.

3 Answers 3

2

For something like this, I think using the pandas library will keep your sanity in check. Assuming a 15,000-row CSV file with two columns, String and ID.

In [1]: import pandas as pd

In [2]: words = ['happy','sad','good','bad','sunny','rainy']

In [3]: df = pd.read_csv('data.csv')

In [4]: df.head(5)
Out[4]: 
  Strings  ID
0   happy   1
1     sad   2
2   happy   3
3     sad   4
4    good   5

In [5]: for word in words:
   ...:     print '{} : {}'.format(word, df['Strings'].str.lower().str.contains(word).sum())
   ...:     
happy : 2501
sad : 2500
good : 2500
bad : 2500
sunny : 2499
rainy : 2500

Alternatively, you can just create a pivot table and it will have similar results.

In [30]: df_pt = df.pivot_table(index='Strings',values='ID',aggfunc=len)

In [31]: df_pt
Out[31]: 
Strings
bad        2500
good       2500
happy      2501
rainy      2500
sad        2500
sunny      2499
Name: ID, dtype: int64

If you need to get the IDs for each word, you can just simply select/index/filter the data:

In [6]: df_happy = df[df['Strings'] == 'happy']

In [7]: df_happy.head(5)
Out[7]: 
   Strings  ID
0    happy   1
2    happy   3
12   happy  13
14   happy  15
18   happy  19

If you need it as a list, then:

In [8]: list_happy = df_happy['ID'].tolist()

In [9]: list_happy[:5]
Out[9]: [1, 3, 13, 15, 19]

I've truncated some parts, obviously, but the idea remains the same.

Sign up to request clarification or add additional context in comments.

Comments

1

You said that you would like to print the id of the row when you found a word. Assuming that you have a comma separated csv file with only two colums, this is how I would do it:

fin = open('data.csv')
words = ["happy","sad","good","bad","sunny","rainy"]
found = {}
for line in fin:
    str1,id=line.split(',')
    for w in words:
        if w in str1:
            print id
            found[w]=found.get(w,0)+1
            break

print(found)

If you have multiple columns in the file, you could instead do:

split_line=line.split(',')
str1=split_line[0] # Whatever columns
id=split_line[1] # Whatever columns

Comments

0
for line in fin: # loops over the lines of the file
    for word in words: # loops over your word list
        if word in line: # checks if current word is in line
            count = count + 1 # increments global variable "count". Everytime a word in the list is found in the line. With no reset or save-away inbetween whatsoever. This is the number of times any word appears in any line at the end.
    found[word] = count # assigns - after all words are looped over - current "count" value to found[current_word]

so what you are doing is assigning an arbitrary value to a dictionary key which happens to be the last word you check for in every iteration.

Does not seem very useful to me. I guess you intended to do something like:

from collections import defaultdict
found = defaultdict(int)
for line in fin:
    for word in words:
        if word in line:
           found[word] += 1

2 Comments

Awesome. I'm still a novice in Python, so thanks for the added comments and solution. This solves the counting problem, but how can I return the 'id' value I mentioned for each row that 'word' is found in?
Well, you'd have to keep track of the id's. Just store them away, as you find them.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.