If I understand your task correctly, you want to replace all unicode control characters with spaces except \t, \n and \r.
Here's how to do this more efficiently with regular expressions instead of loops.
import re
# make a string of all unicode control characters
# EXCEPT \t - chr(9), \n - chr(10) and \r - chr(13)
control_chars = ''.join(map(unichr, range(0,9) + \
range(11,13) + \
range(14,32) + \
range(127,160)))
# build your regular expression
cc_regex = re.compile('[%s]' % re.escape(control_chars))
def cleanup(s):
# substitute all control characters in the regex
# with spaces and return the new string
return cc_regex.sub(' ', s)
You can control which characters to include or exclude by manipulating the ranges that make up the control_chars variable. Refer to the List of Unicode characters.
EDIT: Timing results.
Just out of curiosity I ran some timing tests to see which of the three current methods are fastest.
I made three methods named cleanup_op(s) that was a copy of the OP's code; cleanup_loop(s) which is Cristian Ciupitu's answer; cleanup_regex(s) which is my code.
Here's what I ran:
from timeit import default_timer as timer
sample = u"this is a string with some characters and \n new lines and \t tabs and \v and other stuff"*1000
start = timer();cleanup_op(sample);end = timer();print end-start
start = timer();cleanup_loop(sample);end = timer();print end-start
start = timer();cleanup_regex(sample);end = timer();print end-start
The results:
cleanup_op finished in about 1.1 seconds
cleanup_loop finished in about 0.02 seconds
cleanup_regex finished in about 0.004 seconds
So, either one of the answers is a significant improvement over the original code. I think @CristianCiupitu gives a more elegant and pythonic answer while regex is still faster.
elif char not in good:toelse:. If you want someone to maybe find a better way then add examplestring,unicodedata.categoryand explain more what you are doing.some_string += some_other_stringin a loop will be slow. It has quadratic complexity (although the interpreter will try to optimize it), however, you should refractor it to use alistwith.appendthen''.joinat the end.