Python "in memory DB style" data types

Question

I am creating a weather station using a Raspberry Pi. I have a mySQL database setup for the different sensors (temp, humidity, pressure, rain, etc) and am now getting to processing the wind sensors.

I have a python program that watches the GPIO pins for the anemometer and counts the pulses to calculate the wind speed. It also reads from a wind vane processes through an ADC to get the direction. For the other sensors I only process them every few minutes and dump the data directly to the DB. Because I have to calculate a lot of things from the wind sensor data, I don't necessarily want to write to the DB every 5 seconds and then have to read back the past 5 minutes of data to calculate the current speed and direction. I would like to collect the data in memory, do the processing, then write the finalized data to the DB. The sensor reading is something like:

datetime, speed, direction
2013-6-20 09:33:45, 4.5, W
2013-6-20 09:33:50, 4.0, SW
2013-6-20 09:33:55, 4.3, W

The program is calculating data every 5 seconds from the wind sensors. I would like to write data to the DB every 5 minutes. Because the DB is on an SD card I obviously don't want to write to the DB 60 times, then read it back to process it, then write it to the permanent archival DB every 5 minutes.

Would I be better off using a list of lists? Or a dictionary of tuples keyed by datetime?
{datetime.datetime(2013, 6, 20, 9, 33, 45, 631816): ('4.5', 'W')}
{datetime.datetime(2013, 6, 20, 9, 33, 50, 394820): ('4.0', 'SW')}
{datetime.datetime(2013, 6, 20, 9, 33, 55, 387294): ('4.3', 'W')}

For the latter, what is the best way to update a dictionary? Should I just dump it to a DB and read it back? That seems like an excessive amount of read/writes a day for so little data.

I would suggest trying the easy way first. 5 seconds is an eternity in computer time, so writing everything to the database shouldn't be a problem. If you're worried about wearing out the SD card, then don't. The read/write limits on flash memory have been greatly exaggerated. — Brendan Long
– Brendan Long, Commented Jun 20, 2013 at 16:05
Another option is to use transactions. I'm not familiar with MySQL's transactions, but in SQLite, holding a transaction open and then commiting every couple minutes would be significantly more efficient (not that it really matters in your case). — Brendan Long
– Brendan Long, Commented Jun 20, 2013 at 16:08
early optimization is a cardinal sin in python... I agree w/ BrendanLong — Joran Beasley
– Joran Beasley, Commented Jun 20, 2013 at 16:11
I agree BrendanLong, the problem in my head is the efficiency. To calculate a 3 minute average, I need to average 36 readings, that is something that my mind wants to be done locally in memory, not written to disk and read back. Also, if I dump it to a table in mySQL, I need another procedure to clean it out periodically. Once the data is processed, I don't need it anymore, so there is no historical value in storing it. I guess I could sort the table and dump anything over 15 minutes old while I was pulling it to get the average.... — Christopher McDonald
– Christopher McDonald, Commented Jun 20, 2013 at 16:23
So you are worrying about less than a kilobyte of data that you don't even want to store? Look up ring-buffers, Python list operations, and moving averages. This is some extreme premature optimization. — msw
– msw, Commented Jun 20, 2013 at 16:30

msw · Accepted Answer · 2013-06-20 16:19:10Z

1

There are multiple cache layers between a Python program and a database. In particular, the Linux disk block cache may keep your database in core depending on patterns of usage. Therefore, you should not assume that writing to a database and reading back is necessarily slower than some home-brew cache that you'd put in your application. And code that you write to prematurely optimize your DB is going to be infinitely more buggy than code you don't write.

For the workload as you've specified it, MySQL strikes me as a little heavyweight relative to SQLite, but you may have unstated reasons to require it.

answered Jun 20, 2013 at 16:19

msw

43.7k9 gold badges92 silver badges117 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Christopher McDonald Over a year ago

I'm not worried as much about speed as efficiency. If I give you three numbers between 1 and 5, you can probably average them in your head, but if I give you 50 numbers, you probably want to write them down. In the spirit of Python, i'm just trying to be efficient. As far as mySQL instead of SQLite, it's just lazyness on my part. I usually use mySQL on my work Linux boxes, so that's what I'm comfortable/familiar with. I'll take a look at SQLite.

oathead · Accepted Answer · 2013-06-20 16:08:41Z

0

One option is to use redis to store your data. It's a key value store that's really good for storing data like what you're talking about. It operates in memory and writes data to disk for persistence, this is configurable so you could make it only write to disk every few hours or once a day. The redis-py library is very easy to use.

answered Jun 20, 2013 at 16:08

oathead

4622 silver badges5 bronze badges

1 Comment

msw Over a year ago

When the data are already tabular, key-value stores bring no advantages over a traditional RDBM.

CodeMouse92 · Accepted Answer · 2013-06-20 16:09:51Z

0

In my general experience, it is easier to work with a dictionary keyed by datetime. A list of lists can get very confusing, very quickly.

I'm not certain, however, how to best update a dictionary. It could be that my Python is rusty, but it would seem to me that dumping to a DB and reading back is a bit redundant, though it may just be that your statement was a smidgen unclear. Is there any way you can dump to a variable inside of your program?

If not, I think dumping to DB and reading back may be your only option...but again, my Python is a bit rusty.

That said, while I don't want to be a Programmaticus Takeitovericus, but I was wondering if you've ever looked into XML for data storage? I ended up swapping to it because I found it was easier to work with than a database, and involved far less reading and writing. I don't know your project specs, so this suggestion may be pointless to you altogether.

answered Jun 20, 2013 at 16:09

CodeMouse92

6,91814 gold badges76 silver badges132 bronze badges

2 Comments

msw Over a year ago

XML as a RDBM replacement? Please don't suggest that: it has all the disadvantages of flat text files with all the disadvantages of requiring a parser to get at your stuff.

CodeMouse92 Over a year ago

Hm, guess it really depends on your platform (that, and, I don't know a lot about RDBM). It's been about a year since I really did any heavy work in Python, so I just threw that out there as a random possibility vis-a-vis SQL. I work more heavily with C++ and ActionScript nowadays.

Collectives™ on Stack Overflow

Python "in memory DB style" data types

3 Answers 3

1 Comment

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related