How do I remove empty spaces and lines within a string?

Question

This is what I have

S = """Missing Since 06/01/1976

Missing From 
                                Napa,                               California                          
Classification Endangered Missing
Sex Female
Race 
                                    White
Date of Birth 02/06/1957 (63)
Age 19 years old
Height and Weight 5'2, 130 pounds
Distinguishing Characteristics Caucasian female. Brown hair, hazel eyes."""

Which I want to get to

S = """Missing Since 06/01/1976
Missing From Napa,California                            
Classification Endangered Missing
Sex Female
Race White
Date of Birth 02/06/1957 (63)
Age 19 years old
Height and Weight 5'2, 130 pounds
Distinguishing Characteristics Caucasian female. Brown hair, hazel eyes."""

I tried using S.strip(), but that only removed the spaces at the beginning and the end.

I was wondering if there was any implementation (I couldn't find any) that would work.

I also tried using S.replace(" ","") for the bigger spaces but that also got me nowhere.

Please repeat on topic and how to ask from the intro tour. "Show me how to solve this coding problem?" is off-topic for Stack Overflow. You have to make an honest attempt at the solution, and then ask a specific question about your implementation. Stack Overflow is not intended to replace existing tutorials and documentation. — Prune
– Prune, Commented Dec 30, 2020 at 20:01
Please see How to Ask a Homework Question. Simply dumping your assignment here is not acceptable. — Prune
– Prune, Commented Dec 30, 2020 at 20:01
What does "based on input" mean? How are you getting the string? — user5386938
– user5386938, Commented Dec 30, 2020 at 20:05
@Random, I added some more explanation of the topic, Also as future reference. — Federico Baù
– Federico Baù, Commented Dec 31, 2020 at 11:13

iqmaker · Accepted Answer · 2020-12-30 20:14:30Z

1

removes multiple spaces (not lines):

print( ' '.join([s for s in S.split(' ') if s.strip()]) )

answered Dec 30, 2020 at 20:14

iqmaker

2,27227 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Terence Barrett · Accepted Answer · 2020-12-30 21:21:45Z

1

Here is a way to produce the requested string literal output using only the string literal input, as (was originally) requested:

from re import sub

print(sub('Race\n', 'Race ',
          sub('Missing From\n', 'Missing From ', '\n'.join(
              [sub(' \s', '', line) for line in [line.strip() for line in """
"Missing Since 06/01/1976

Missing From
                                Napa,                               California
Classification Endangered Missing
Sex Female
Race
                                    White
Date of Birth 02/06/1957 (63)
Age 19 years old
Height and Weight 5'2, 130 pounds
Distinguishing Characteristics Caucasian female. Brown hair, hazel eyes."
""".split('\n') if line.strip()]]))))

Output:

"Missing Since 06/01/1976
Missing From Napa, California
Classification Endangered Missing
Sex Female
Race White
Date of Birth 02/06/1957 (63)
Age 19 years old
Height and Weight 5'2, 130 pounds
Distinguishing Characteristics Caucasian female. Brown hair, hazel eyes."

edited Dec 30, 2020 at 21:21

answered Dec 30, 2020 at 21:14

Terence Barrett

264 bronze badges

2 Comments

Terence Barrett Over a year ago

It's not the way I would implement in my own code, but I tackled it as a puzzle to solve as requested :-)

Random Over a year ago

Thanks, I was stuck and this helped!

DRPK · Accepted Answer · 2020-12-31 14:20:22Z

1

Try this:

import re


def normalize_text(get_text):
    saved_new_lines = []
    counter = 0
    for each_line in get_text.split("\n"):
        if not each_line == "":
            normalize_each_line = re.sub(r'\s+', ' ', each_line.strip())
            if each_line.startswith(" "):
                saved_new_lines[counter-1] += " " + normalize_each_line
            else:
                saved_new_lines.append(normalize_each_line)
                counter += 1
    return "\n".join(saved_new_lines)


print(normalize_text(S))

Output:

Missing Since 06/01/1976
Missing From Napa, California
Classification Endangered Missing
Sex Female
Race White
Date of Birth 02/06/1957 (63)
Age 19 years old
Height and Weight 5'2, 130 pounds
Distinguishing Characteristics Caucasian female. Brown hair, hazel eyes.

@FedericoBaù gave me the hint; so i updated my code (this version dose not have any empty-line checker so it will be much faster than its current status)

Updated:

import re


def normalize_text(get_text):
    saved_new_lines = []
    counter = 0
    for each_line in re.sub(r'\n+', '\n', get_text.strip()).splitlines():
        normalize_each_line = re.sub(r'\s+', ' ', each_line.strip())
        if each_line.startswith(" "):
            saved_new_lines[counter-1] += " {}".format(normalize_each_line)
        else:
            saved_new_lines.append(normalize_each_line)
            counter += 1
    return "\n".join(saved_new_lines)


print(normalize_text(test_string))

edited Dec 31, 2020 at 14:20

answered Dec 30, 2020 at 20:24

DRPK

2,0912 gold badges16 silver badges28 bronze badges

2 Comments

Random Over a year ago

Thank you so much! I finally see it now

DRPK Over a year ago

@Random so i appreciate if you accept and upvote my answer :)

stillKonfuzed · Accepted Answer · 2020-12-30 20:16:04Z

0

I am not into python but as far as i know in php those spaces appear because of \n and \r, To fix you can simply do string.replace("<double-space-here>","") then get the variable and redo the process but this time replace "\n" then "\r". I would be better if you could find a inbuilt method to filter html characters in python.

answered Dec 30, 2020 at 20:16

stillKonfuzed

4123 silver badges10 bronze badges

Comments

Federico Baù · Accepted Answer · 2020-12-31 13:09:33Z

0

@Random despite the already very useful answer, I would like to give some more insight regarding the issue you have faced, which is very usual, especially because of how the function strip expectations are.

Logically one may think that it should remove all items within a string, but it does only at the 2 extremity and not 'within' the string, but why?

You should first understand how string a really stored inside Python, they are like 'arrays' (kind of list) really for instance:

string = "  Hello world "

In reality is:

string = [" ", " ", "H", 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', ' ']

So the strip function go and check whatever character it finds from the left and right of the string and stops until encounters a different character.

Hence it won't loop at the whole string (array) but only from index 0 or index -1!

For more regarding this, I suggest to have a look at array-of-strings-in-c, despite is not Python, Python is written in C and internally has the same implementation.

Some Solutions (Not yet given)

Use string.whitespace

     import string

     def string_cleaner(str):
         cleaned_string = []
         string_separated = str.split(' ')
         for word in string_separated:
             if word:
                 if word in string.whitespace:
                     del word
                 else:
                     cleaned_string.append(word)

         ready_baby = ' '.join(cleaned_string)
         return ready_baby


     result = string_cleaner(test_string)

--> Short form using list comprehension

print(' '.join([s for s in test_string.split(' ') if s and s not in string.whitespace]))

Use Function isspace (NO IMPORT NEDDED, pure built-in)

 def string_cleaner(str):
     cleaned_string = []
     string_separated = str.split(' ')
     for word in string_separated:
         if word:
             if word.isspace():
                 del word
             else:
                 cleaned_string.append(word)

     ready_baby = ' '.join(cleaned_string)
     return ready_baby


 result = string_cleaner(test_string)

 print(result)

--> Short form using list comprehension

print(' '.join([string for string in test_string.split(' ') if string and not string.isspace()]))

Re-visit re.sub function

Here is a reproduction of what re.sub does, note that I added many "useless" variables in order to make the code more explicit:

def string_cleaner(str):

    cleaned_string = []
    string_separated = str.split(' ')
    for word in string_separated:
        if word: # Not Blank line
            remove_whitespace_from_side = word.strip().replace('\n', ' ') # NOTE: we do this because there are multiple \n\n in some string
            separate_each_string = remove_whitespace_from_side.split()
            if separate_each_string:  # NOTE: If empty means that is a useless white spece within a string
                rejoined_sub_string = ' '.join(separate_each_string)
                ready_string = rejoined_sub_string.replace(' ', '\n') # Add again \n
                cleaned_string.append(ready_string)
    ready_to_go = ' '.join(cleaned_string)
    return ready_to_go

result = string_cleaner(test_string)

print(result)

Documentation

Python Re Sub function

"Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl.".

What it does is similar to C function scanf, more information here.

edited Dec 31, 2020 at 13:09

answered Dec 31, 2020 at 11:11

Federico Baù

7,9755 gold badges44 silver badges49 bronze badges

5 Comments

greybeard Over a year ago

(SO isn't CodeReview@SE.)

Federico Baù Over a year ago

@greybeard if you assume this because I added other users answers and changed it then isn't a review but a needed clarification of it (as they haven't provide) and not a review especially based on the nature of issue which is the mistaken expectation of how strip function works. I realized now that it look like my intention was to review others code but it wasn't . I edit my answer to avoid this disambiguation

DRPK Over a year ago

@FedericoBaù cheers; i saw your comment; tnx for your pythonic version i updated my answer and changed it a little bit.

Federico Baù Over a year ago

@DRPK you welcome, I see some code and i couldn't resist and I'm glad you've taken it in a way that makes you improve it and not as a mere critic, the algorithm it's self is really good! Thumbed up your answer and happy new year ;)

DRPK Over a year ago

@FedericoBaù happy new year bro/sis :); i suggest to do some code comparison with timit module for each function (include my code) and report it on your answer; iam curoius about their speed or memory usage ... your answer well explained enough. this answer should be accepted

Collectives™ on Stack Overflow

How do I remove empty spaces and lines within a string?

5 Answers 5

Comments

2 Comments

2 Comments

Comments

Some Solutions (Not yet given)

Re-visit re.sub function

Documentation

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

2 Comments

2 Comments

Comments

Some Solutions (Not yet given)

Re-visit re.sub function

Documentation

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related