simple python script using too much cpu

Question

I was recently told off by my vps as my python script was using too much cpu (apparently the script was utilising the entire core for a few hours).

my script uses the twython library to stream tweets

def on_success(self, data):

    if 'text' in data:
        self.counter += 1
        self.tweetDatabase.save(Tweet(data))

        #we only want to commit when we have a batch
        if self.counter >= 1000:
            print("{0}: commiting {1} tweets".format(datetime.now(), self.counter))
            self.counter = 0
            self.tweetDatabase.commit()

Tweet is a class that's job is to throw away meta data about the tweet I do not need:

class Tweet():

    def __init__(self, json):

        self.user = {"id" : json.get('user').get('id_str'), "name" : json.get('user').get('name')}
        self.timeStamp = datetime.datetime.strptime(json.get('created_at'), '%a %b %d %H:%M:%S %z %Y')
        self.coordinates  = json.get('coordinates')
        self.tweet = {
                        "id" : json.get('id_str'),
                        "text" : json.get('text').split('#')[0],
                        "entities" : json.get('entities'),
                        "place" :  json.get('place')
                     }

        self.favourite = json.get('favorite_count')
        self.reTweet = json.get('retweet_count')

it also has a __str__ method that will return a super compact string representation of the object

the tweetDatabase.commit() just saves the tweets to a file while the tweetDatabase.Save() just saves the tweet to a list:

def save(self, tweet):
    self.tweets.append(tweet.__str__())

def commit(self):
    with open(self.path, mode='a', encoding='utf-8') as f:
        f.write('\n'.join(self.tweets))

    self.tweets = []

whats the best way to keep the cpu low? if I sleep I will be losing tweets as that will be time the program is spent not listening to twitters api. Dispite this I tried sleeping for a second after the program writes to file however this did nothing to bring the cpu down. For record saving to file every 1000 tweets is just over once a Minute.

many thanks

You should profile your code and find where are the hotspots. Perhaps you need a DBMS here instead of a plaintext file... — Stefano Sanfilippo
– Stefano Sanfilippo, Commented Feb 5, 2014 at 23:59

tjborromeo · Accepted Answer · 2014-02-06 05:30:20Z

1

Try checking if you need to commit first in on_success(). Then, check if the tweet has data you want to save. You also might want to consider race conditions on the self.counter variable, and should probably have the update to the self.count be wrapped in a mutex or something similar.

answered Feb 6, 2014 at 5:30

tjborromeo

1816 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

XrXr · Accepted Answer · 2014-02-06 00:15:15Z

1

You can try profiling your program with

import cProfile
command = """<whatever line that starts your program>"""
cProfile.runctx( command, globals(), locals(), filename="OpenGLContext.profile" )

and then viewing the OpenGLContext.profile with RunSnakeRun (http://www.vrplumber.com/programming/runsnakerun/)

The bigger a block is, the more CPU time that function takes. This will help you to locate exactly which part of your program is taking a lot of CPU

answered Feb 6, 2014 at 0:15

XrXr

2,0671 gold badge14 silver badges22 bronze badges

Collectives™ on Stack Overflow

simple python script using too much cpu

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related