I have a log file that is full of tweets. Each tweet is on its own line so that I can iterate though the file easily.
An example tweet would be like this:
@ sample This is a sample string $ 1.00 # sample
I want to be able to clean this up a bit by removing the white space between the special character and the following alpha-numeric character. "@ s", "$ 1", "# s"
So that it would look like this:
@sample This is a sample string $1.00 #sample
I'm trying to use regular expressions to match these instances because they can be variable, but I am unsure of how to go about doing this.
I've been using re.sub() and re.search() to find the instances, but am struggling to figure out how to only remove the white space while leaving the string intact.
Here is the code I have so far:
#!/usr/bin/python
import csv
import re
import sys
import pdb
import urllib
f=open('output.csv', 'w')
with open('retweet.csv', 'rb') as inputfile:
read=csv.reader(inputfile, delimiter=',')
for row in read:
a = row[0]
matchObj = re.search("\W\s\w", a)
print matchObj.group()
f.close()
Thanks for any help!