3

I have a Python script running on a Raspberry Pi that sits waiting for user input and records the input in a SQLite database:

#!/usr/bin/env python

import logging
import db

while True:
    barcode = raw_input("Scan ISBN: ")
    if ( len(barcode) > 1 ):
        logging.info("Recording scanned ISBN: " + barcode)
        print "Recording scanned ISBN: " + barcode
        db.recordScan(barcode, 1)

That db.recordScan() method looks like this:

# Adds an item to queue
def recordScan(isbn, shop_id):
    insert = "INSERT INTO scans ( isbn, shop_id ) VALUES ( ?, ? )"
    conn = connect()
    conn.cursor().execute(insert, [isbn, shop_id])
    conn.commit()
    conn.close()

(Note: The whole code repo is available at https://github.com/martinjoiner/bookfetch-scanner-python/ if you wanna see how I'm connecting to the db and such)

My problem is that using a USB barcode scanner (which is effectively just a keyboard input that sends a series of keystrokes followed by the Enter key) it is really easy to input at such a fast rate that the command line seems to get "confused".

For example compare the following results...

When you go slow the script works well and the command looks neat like this:

Scan ISBN: 9780465031467
Recording scanned ISBN: 9780465031467
Scan ISBN: 9780141014593
Recording scanned ISBN: 9780141014593
Scan ISBN: 

But when you hammer it hard and go really fast the input prompt kind of gets ahead of itself and the messages printed by the script get written on top of the input prompt:

Recording scanned ISBN: 9780141014593
9780141014593
9780141014593
9780465031467
Recording scanned ISBN: 9780141014593
Scan ISBN: Recording scanned ISBN: 9780141014593
Scan ISBN: Recording scanned ISBN: 9780141014593
Scan ISBN: Recording scanned ISBN: 9780465031467
Scan ISBN: 9780571273188
9780141014593

It sometimes hangs in that position indefinitely, I don't know what it's doing but you can wake it back up again with another input and it carries on as normal although the input before the one it hung on doesn't get recorded which is bad because it makes the whole system unreliable.

My question is: Is this an inevitability that I just have to live with? Will I always be able to out-pace the low-powered Raspberry Pi by hitting it with too many inputs in close succession or is there some faster way of doing this? Can I push the database write operation to another thread or something along those lines? Forgive my ignorance, I am learning.

2
  • 1
    Not sure which raspberry pi you're using (and I've only worked with generation 1) but these gizmos usually have a single core CPU. If you're running a MySQL instance on the same rasp pi then you might want to look into moving that over to another machine, perhaps something with a bit more power. Commented Oct 19, 2016 at 13:05
  • Same goes for SQLite (since I'm bad at reading and just realised that it says SQLite not MySQL!) Commented Oct 19, 2016 at 13:15

4 Answers 4

1

Don't build SQL strings from user input. Ever.

Always use parameterized queries.

# Adds an item to queue
def recordScan(isbn, shop_id):
    insert = "INSERT INTO scans ( isbn, shop_id ) VALUES ( ?, ? )"
    conn = connect()
    conn.cursor().execute(insert, [isbn, shop_id])
    conn.commit()
    conn.close()

Please read https://docs.python.org/2/library/sqlite3.html, at the very least the upper part of the page, where they explain this approach.

Sign up to request clarification or add additional context in comments.

7 Comments

I agree with your suggestion, but does it address the OP's issue of multiple raw_input calls getting jumbled up by super-fast user input? I expect them to get the same mangled data regardless of whether they parameterize or not.
It is not a public system and I am the only person using it so for now, SQL injection attacks are not a concern. You may notice that the method is called with the shop_id hard-coded to 1; this is still just a proof of concept in early development. But as this SQL issue is distracting from the question I will edit the code to best practice. Thank you for your input.
That's not the point. "I don't need to use SQL parameters in this case" is not a valid position to hold. One never uses unparameterized SQL unless there is absolutely no way to avoid it. Then again, you are right, the whole SQL part of your question is a red herring. It's not the SQL part that causes the trouble here, you should edit that out of your question completely. Reduce it to the part of the code that deals with the barcode scanner.
@Tomalak that's a good point. I've just tried commenting out the call to db.recordScan() and I can still scan fast enough to make it miss inputs, although I cannot recreate the hanging behavior which leads me to believe that's caused by 2 scripts reading/writing from the same database (a subject for a separate question).
I don't think it could be a locking issue, since this is single-threaded code and no other process is accessing the database. Try to do print("Recording scanned ISBN: " + len(barcode)), i.e. don't print the scanned data itself. Maybe there's odd characters in there, not printing it might staighten out the output. Just tossing ideas around.
|
1

You appear to be opening and closing the database each and every time. That will clearly add a huge overhead, especially as you are "hammering" away at it.
Connect to the database once at the beginning and close it upon exit.
In between, simply perform your insert, update and delete statements.

Edit:
For the purposes of this I renamed db.py to be called barcode1.py so edit appropriately. Alter listen.py to be as follows:

#!/usr/bin/env python

import logging
import barcode1
DB_FILE_NAME = "scan-queue.db"
my_db = barcode1.sqlite3.connect(DB_FILE_NAME)
my_cursor = my_db.cursor()

def InsertScan(isbn, shop_id):
    insert = "INSERT INTO scans ( isbn, shop_id ) VALUES ( ?, ? )"
    my_cursor.execute(insert, [isbn, shop_id])
    my_db.commit()

while True:
    barcode = raw_input("Scan ISBN: ")
    if ( len(barcode) > 1 ):
        logging.info("Recording scanned ISBN: " + barcode)
        print "Recording scanned ISBN: " + barcode
        InsertScan(barcode, 1)
my_db.close()

For your purposes replace references to "barcode1" with "db"
As you can see all that happens here is that a separate function has been added to do the writing and only the writing.
Clearly this is a quick mock up and could be improved immeasurably, in fact I'd rewrite it as a single script. This is one of those classic examples where in an attempt to write object oriented code, you end up shooting yourself in the foot.
In fact you could do without the function and just include the insert code within the while statement.

Locking: from the sqlite3 documents:

 sqlite3.connect(database[, timeout, detect_types, isolation_level, check_same_thread, factory, cached_statements, uri])

Opens a connection to the SQLite database file database. You can use ":memory:" to open a database connection to a database that resides in RAM instead of on disk.

When a database is accessed by multiple connections, and one of the processes modifies the database, the SQLite database is locked until that transaction is committed. The timeout parameter specifies how long the connection should wait for the lock to go away until raising an exception. The default for the timeout parameter is 5.0 (five seconds).

5 Comments

Ah yes, I connect and disconnect each time because there is another script in a separate shell reading/writing to the same database. Is there a middle ground? I am actually writing a separate question to cover the "2 scripts 1 database" issue.
I have created a second question to deal with the concurrent access issue stackoverflow.com/questions/40134943/…
I have covered it above. See docs.python.org/3/library/sqlite3.html
Ooo yes good effort on the re-edits. I will play with an in-memory database now. I will want the queue to persist in memory if the device is powered off though. Am I correct in thinking the best way to do that would be to add some code that occasionally backs up the queue to a file, the in-memory queue is then populated from that file on future loads.
If you are attempting to share an in memory database between 2 or more processes, I think you will struggle. However see stackoverflow.com/questions/3315046/… I really don't see why you can't simply use a bog standard sqlite3 db.
1

After much experimenting based on helpful advice from users @tomalak, @rolf-of-saxony and @hevlastka my conclusion is that yes, this is an inevitability that I just have to live with.

Even if you strip the example down to the basics by removing the database write process and making it a simple parrot script that just repeats back inputs (See Python on Raspberry Pi user input inside infinite loop misses inputs when hit with many), it is still possible to scan items so fast that inputs get missed/skipped/ignored. The Raspberry Pi simply cannot keep up.

So my approach will now be to add an audio feedback feature such as a beep sound to indicate to the user when the device is ready to receive the next input. A route I didn't want to go down but it seems my code is the most efficient it can be and we're still able to hit the limits. Responsibility is with the user to not go at breakneck speed and the best we can do a responsible product builders is give them good feedback.

5 Comments

This is a good separate question by the way. Post your parrot script and everything else that someone in possession of the necessary hardware needs to reproduce, maybe someone comes up with an idea. Somehow I have a hard time imagining that GHz hardware is too slow for a few rapid keystrokes. Maybe it's something else.
Maybe something in here helps as well? raspberrypi.org/forums/viewtopic.php?f=45&t=55100 - as in, you might be able to use a different way of reading out data from the scanner.
That being said, here is a question with more info on making SQLite play nice with multiple write processes. stackoverflow.com/questions/1063438/…
@Tomalak I've made a separate question as suggested stackoverflow.com/questions/40156905/…
In parallel also try the alternative to raw_input() as suggested by the forum post.
0

In addition to the issues I brought up in my first answer, there is another problem which effects the speed of the updates, namely the commits.
You will find that if you commit in batches, the speed goes up exponentially. Adjust the journaling and up it goes again.
Working on a PI 3, I mocked up 5000 updates in 10 seconds with the journal on and in 0.43 seconds with the journal off.
If you change your code to store the barcodes in a list and then fire off the database updates in batches, you code will work on a Raspberry Pi.

See below for my test code:

#!/usr/bin/env python
import sqlite3
import time
DB_FILE_NAME = "scan-queue.db"
my_db = sqlite3.connect(DB_FILE_NAME)
my_cursor = my_db.cursor()
my_cursor.execute('CREATE TABLE if not exists scans(id INTEGER PRIMARY KEY AUTOINCREMENT,isbn TEXT NOT NULL,shop_id INT NOT NULL)')   
my_db.commit()
#This line turns off journaling, passing off the writes to the OS
# No rollbacks are available and corruption can occur if the machine has an issue
# but you're not NASA
my_cursor.execute("PRAGMA synchronous = OFF") #Can increase speed 20 fold
def InsertScan(isbn, shop_id):
    insert = "INSERT INTO scans ( isbn, shop_id ) VALUES ( ?, ? )"
    my_cursor.execute(insert, [isbn, shop_id])

tot_t = time.time() #Time entire run
shop_id = 1
barcode = 11111111111111
batch=[]
while shop_id < 5000:
    #barcode = raw_input("Scan ISBN: ")
    batch_cnt = 0
    while batch_cnt < 100:
        shop_id +=1
        barcode +=1
        batch_cnt +=1
        print "Recording scanned ISBN: ", barcode, shop_id
        batch.append((barcode,shop_id))
    print "Saving", str(len(batch)), "scanned ISBN's"
    t = time.time() #Time batch update
    for i in batch:
        InsertScan(i[0],i[1])
    batch=[]
    my_db.commit()
    t2 = time.time() - t
    print "Secs =", t2 #Print update time in seconds
print "Saving", str(len(batch)), "scanned ISBN's"
for i in batch: #Final update (just in case) or when program is quit
    InsertScan(i[0],i[1])
my_db.commit()
x = my_cursor.execute("select count(*) from scans")
tot_t2 = time.time() - tot_t
print "5000 Updates in ", tot_t2 #Print update time in seconds
for i in x:
    print i,"Total rows in scans table" #Print total of records in table
my_db.close() 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.