0

this is my first ever Python script/question on Stackoverflow/step into real coding.

I am looking to count the number of times certain strings appear within the rows I am iterating through then print out the values. I have not set up a delimiter so there is only one column. Essentially I am saying, if the column contains said string, add to the counter.

The problem is, I get an output of 0 for all my variables. Any suggestions?

Here is the code (sorry it's long).

# read the CSV file

import csv
with open('example.csv', 'r') as csvfile:

reader = csv.reader(csvfile)

# set up counter variables
googlebot = 0
googlebot_mobile = 0
apis_google = 0
adsense = 0
adsbot_mobile_web_android = 0
adsbot_mobile_web = 0
adsbot = 0
googlebot_images = 0
googlebot_news = 0
googlebot_video = 0
mobile_adsense = 0
mobile_apps_android = 0

# set up counter identifiers
googlebot_string = 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
googlebot_mobile_string = 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
apis_google_string = 'APIs-Google (+https://developers.google.com/webmasters/APIs-Google.html)'
adsense_string = 'Mediapartners-Google'
adsbot_mobile_web_android_string = 'Mozilla/5.0 (Linux; Android 5.0; SM-G920A) AppleWebKit (KHTML, like Gecko) Chrome Mobile Safari (compatible; AdsBot-Google-Mobile; +http://www.google.com/mobile/adsbot.html)'
adbot_mobile_web_string = 'Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1 (compatible; AdsBot-Google-Mobile; +http://www.google.com/mobile/adsbot.html)'
adsbot_string = 'AdsBot-Google (+http://www.google.com/adsbot.html)'
googlebot_images_string = 'Googlebot-Image/1.0'
googlebot_news_string = 'Googlebot-News'
googlebot_video_string = 'Googlebot-Video/1.0'
mobile_adsense_string = 'compatible; Mediapartners-Google/2.1; +http://www.google.com/bot.html'
mobile_apps_android_string = 'AdsBot-Google-Mobile-Apps'

# iterate and search for/add to counter
for row in reader:
    if googlebot_string in row:
        googlebot += 1

    elif googlebot_mobile_string in row:
        googlebot_mobile += 1

    elif apis_google_string in row:
        apis_google += 1

    elif adsense_string in row:
        adsense += 1

    elif adsbot_mobile_web_android_string in row:
        adsbot_mobile_web_android += 1

    elif adbot_mobile_web_string in row:
        adsbot_mobile_web += 1

    elif adsbot_string in row:
        adsbot += 1

    elif googlebot_images_string in row:
        googlebot_images += 1

    elif googlebot_news_string in row:
        googlebot_news += 1

    elif googlebot_video_string in row:
        googlebot_video += 1

    elif mobile_adsense_string in row:
        mobile_adsense += 1

    elif mobile_apps_android_string in row:
        mobile_apps_android += 1



# print counts
print "Googlebot (Desktop): ", googlebot
print "Googlebot (Mobile): ", googlebot_mobile
print "APIs Google: ", apis_google
print "AdSense: ", adsense
print "AdsBot Mobile Web Android: ", adsbot_mobile_web_android
print "AdsBot Mobile Web: ", adsbot_mobile_web
print "AdsBot: ", adsbot
print "Googlebot Images: ", googlebot_images
print "Googlebot News: ", googlebot_news
print "Googlebot Video: ", googlebot_video
print "Mobile AdSense: ", mobile_adsense
print "Mobile Apps Android: ", mobile_apps_android
4
  • 2
    Looks like there is quite a bit of repetition in there, and the else-if-control flow doesn't seem to make any sense, because it increments at most one counter in each string. Have you considered using an arrays instead of the thousand separate variables? Commented Mar 11, 2018 at 12:51
  • There's a handy dictionary subclass for this sort of thing named collections.Counter. Commented Mar 11, 2018 at 13:16
  • 1
    @AndreyTyukin Can you elaborate a bit further on how it doesn't make sense? There will only ever be one of the variables I created (represented by the strings under #counter identifiers) present within a row. Therefore I would only expect one of them to increment in every row. Can you clarify whether or not it is correct syntax to say elif googlebot_mobile_string in row. I'll look into arrays to make things easier to read. Commented Mar 11, 2018 at 17:38
  • @Matt Ah, ok... If you are expecting at most one keyword per line, then the elif is correct, and even more efficient than checking every keyword every time. It wouldn't work as expected if there was more than one keyword per line, but that was not required, apparently. Still, quite a bit of duplication there. Commented Mar 11, 2018 at 17:41

2 Answers 2

3

You are reading the file outside the with context manager. Your code should be:

with open('example.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)

otherwise you are opening and closing the file before even reading it.

EDIT:

As @yann-vernier has pointed out, reader must be consumed inside the with block. That is, the for loop should be all indented.

Sign up to request clarification or add additional context in comments.

2 Comments

Apologies, the code must've not translated properly when I copied and pasted it. I have the code laid out exactly as you have.
That's not enough. The reader must consume the contents of the file within the with block.
1

I have not set up a delimiter so there is only one column

If you do not specify a delimiter, the default one is used, which is a comma ,. So, there still may be multiple columns and, therefore, multiple elemtns in the list row.

Now, the string in googlebot_string also contains commas, so, if this string is present in your input CSV, it never appears in row as a single element. Therefore googlebot_string in row is always false. Some of the other *_string strings have the same problem.

You can open the file just as a text file (without using the csv module) and iterate over the lines.

A dirty solution would be to specify a character that is not present in your input file as delimiter for csv.reader.

2 Comments

This solved my problem thank you! I ended up using " as the delimiter and it worked.
Cool, I've accepted and upvoted (upvote wont show because I have less than 15 rep)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.