I was recently told off by my vps as my python script was using too much cpu (apparently the script was utilising the entire core for a few hours).
my script uses the twython library to stream tweets
def on_success(self, data):
if 'text' in data:
self.counter += 1
self.tweetDatabase.save(Tweet(data))
#we only want to commit when we have a batch
if self.counter >= 1000:
print("{0}: commiting {1} tweets".format(datetime.now(), self.counter))
self.counter = 0
self.tweetDatabase.commit()
Tweet is a class that's job is to throw away meta data about the tweet I do not need:
class Tweet():
def __init__(self, json):
self.user = {"id" : json.get('user').get('id_str'), "name" : json.get('user').get('name')}
self.timeStamp = datetime.datetime.strptime(json.get('created_at'), '%a %b %d %H:%M:%S %z %Y')
self.coordinates = json.get('coordinates')
self.tweet = {
"id" : json.get('id_str'),
"text" : json.get('text').split('#')[0],
"entities" : json.get('entities'),
"place" : json.get('place')
}
self.favourite = json.get('favorite_count')
self.reTweet = json.get('retweet_count')
it also has a __str__ method that will return a super compact string representation of the object
the tweetDatabase.commit() just saves the tweets to a file while the tweetDatabase.Save() just saves the tweet to a list:
def save(self, tweet):
self.tweets.append(tweet.__str__())
def commit(self):
with open(self.path, mode='a', encoding='utf-8') as f:
f.write('\n'.join(self.tweets))
self.tweets = []
whats the best way to keep the cpu low? if I sleep I will be losing tweets as that will be time the program is spent not listening to twitters api. Dispite this I tried sleeping for a second after the program writes to file however this did nothing to bring the cpu down. For record saving to file every 1000 tweets is just over once a Minute.
many thanks