@Random despite the already very useful answer, I would like to give some more insight regarding the issue you have faced, which is very usual, especially because of how the function strip expectations are.
Logically one may think that it should remove all items within a string, but it does only at the 2 extremity and not 'within' the string, but why?
You should first understand how string a really stored inside Python, they are like 'arrays' (kind of list) really for instance:
string = " Hello world "
In reality is:
string = [" ", " ", "H", 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', ' ']
So the strip function go and check whatever character it finds from the left and right of the string and stops until encounters a different character.
Hence it won't loop at the whole string (array) but only from index 0 or index -1!
For more regarding this, I suggest to have a look at array-of-strings-in-c, despite is not Python, Python is written in C and internally has the same implementation.
Some Solutions (Not yet given)
Use string.whitespace
import string
def string_cleaner(str):
cleaned_string = []
string_separated = str.split(' ')
for word in string_separated:
if word:
if word in string.whitespace:
del word
else:
cleaned_string.append(word)
ready_baby = ' '.join(cleaned_string)
return ready_baby
result = string_cleaner(test_string)
--> Short form using list comprehension
print(' '.join([s for s in test_string.split(' ') if s and s not in string.whitespace]))
Use Function isspace (NO IMPORT NEDDED, pure built-in)
def string_cleaner(str):
cleaned_string = []
string_separated = str.split(' ')
for word in string_separated:
if word:
if word.isspace():
del word
else:
cleaned_string.append(word)
ready_baby = ' '.join(cleaned_string)
return ready_baby
result = string_cleaner(test_string)
print(result)
--> Short form using list comprehension
print(' '.join([string for string in test_string.split(' ') if string and not string.isspace()]))
Re-visit re.sub function
Here is a reproduction of what re.sub does, note that I added many "useless" variables in order to make the code more explicit:
def string_cleaner(str):
cleaned_string = []
string_separated = str.split(' ')
for word in string_separated:
if word: # Not Blank line
remove_whitespace_from_side = word.strip().replace('\n', ' ') # NOTE: we do this because there are multiple \n\n in some string
separate_each_string = remove_whitespace_from_side.split()
if separate_each_string: # NOTE: If empty means that is a useless white spece within a string
rejoined_sub_string = ' '.join(separate_each_string)
ready_string = rejoined_sub_string.replace(' ', '\n') # Add again \n
cleaned_string.append(ready_string)
ready_to_go = ' '.join(cleaned_string)
return ready_to_go
result = string_cleaner(test_string)
print(result)
Documentation
"Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl.".
What it does is similar to C function scanf, more information here.