0

Writing a script to clean up some data. Super unoptimized but this cursor is returning the number of results in the like query rather than the rows what am I doing wrong.

#!/usr/bin/python
import re
import MySQLdb
import collections

db = MySQLdb.connect(host="localhost", # your host, usually localhost
                     user="admin", # your username
                      passwd="", # your password
                      db="test") # name of the data base

# you must create a Cursor object. It will let
#  you execute all the query you need
cur = db.cursor() 

# Use all the SQL you like
cur.execute("SELECT * FROM vendor")

seen = []

# print all the first cell of all the rows
for row in cur.fetchall() :
    for word in row[1].split(' '):
        seen.append(word)

_digits = re.compile('\d')
def contains_digits(d):
    return bool(_digits.search(d))


count_word = collections.Counter(seen)
found_multi = [i for i in count_word if count_word[i] > 1 and not contains_digits(i) and len(i) > 1]

unique_multiples = list(found_multi)

groups = dict()

for word in unique_multiples:
    like_str = '%' + word + '%'
    res = cur.execute("""SELECT * FROM vendor where name like %s""", like_str)
2
  • You've got at least two execute statements here. Which one is doing the wrong thing? Commented Sep 12, 2013 at 21:34
  • the last one. the "like" statement Commented Sep 12, 2013 at 21:34

1 Answer 1

2

You are storing the result of cur.execute(), which is the number of rows. You are never actually fetching any of the results.

Use .fetchall() to get all result rows or iterate over the cursor after executing:

for word in unique_multiples:
    like_str = '%' + word + '%'
    cur.execute("""SELECT * FROM vendor where name like %s""", like_str)
    for row in cur:
        print row
Sign up to request clarification or add additional context in comments.

6 Comments

Do you need fetchall() in MySQLdb? With sqlite3, you can just do for row in cur:—and usually should. Besides being simpler, it's faster (besides the usual avoiding-an-unnecessary-list thing, looping over fetchmany() allows sqlite and/or the application to pick an ideal arraysize).
MySQLdb is somewhat archaic, and using the cursor as an iterator is only listed as optional in the DB-API 2.0 spec. Can't test it right now, but IIRC MySQLdb does not support it.
Correction: .__iter__() is supported but uses .fetchone() instead.
IIRC, MySQLdb's fetchmany is useless anyway; the underlying engine is already batching, so fetchmany just uses the same batch shared by fetchone (something like batch[row:min(row+size, batchsize)] instead of batch[row]). And it makes sense for __iter__ to use fetchone if it's simpler and just as fast.
I see two cursor types; one that always fetches all results in one go and one that fetches rows one by one. I see no batched fetching, not in the cursors.py file I linked. Default is to fetch everything, meaning there is no difference between iterating and .fetchall(), really.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.