0

This is what I have

S = """Missing Since 06/01/1976

Missing From 
                                Napa,                               California                          
Classification Endangered Missing
Sex Female
Race 
                                    White
Date of Birth 02/06/1957 (63)
Age 19 years old
Height and Weight 5'2, 130 pounds
Distinguishing Characteristics Caucasian female. Brown hair, hazel eyes."""

Which I want to get to

S = """Missing Since 06/01/1976
Missing From Napa,California                            
Classification Endangered Missing
Sex Female
Race White
Date of Birth 02/06/1957 (63)
Age 19 years old
Height and Weight 5'2, 130 pounds
Distinguishing Characteristics Caucasian female. Brown hair, hazel eyes."""

I tried using S.strip(), but that only removed the spaces at the beginning and the end.

I was wondering if there was any implementation (I couldn't find any) that would work.

I also tried using S.replace(" ","") for the bigger spaces but that also got me nowhere.

5
  • Is it in a file? Commented Dec 30, 2020 at 20:00
  • 2
    Please repeat on topic and how to ask from the intro tour. "Show me how to solve this coding problem?" is off-topic for Stack Overflow. You have to make an honest attempt at the solution, and then ask a specific question about your implementation. Stack Overflow is not intended to replace existing tutorials and documentation. Commented Dec 30, 2020 at 20:01
  • Please see How to Ask a Homework Question. Simply dumping your assignment here is not acceptable. Commented Dec 30, 2020 at 20:01
  • What does "based on input" mean? How are you getting the string? Commented Dec 30, 2020 at 20:05
  • @Random, I added some more explanation of the topic, Also as future reference. Commented Dec 31, 2020 at 11:13

5 Answers 5

1

removes multiple spaces (not lines):

print( ' '.join([s for s in S.split(' ') if s.strip()]) )
Sign up to request clarification or add additional context in comments.

Comments

1

Here is a way to produce the requested string literal output using only the string literal input, as (was originally) requested:

from re import sub

print(sub('Race\n', 'Race ',
          sub('Missing From\n', 'Missing From ', '\n'.join(
              [sub(' \s', '', line) for line in [line.strip() for line in """
"Missing Since 06/01/1976

Missing From
                                Napa,                               California
Classification Endangered Missing
Sex Female
Race
                                    White
Date of Birth 02/06/1957 (63)
Age 19 years old
Height and Weight 5'2, 130 pounds
Distinguishing Characteristics Caucasian female. Brown hair, hazel eyes."
""".split('\n') if line.strip()]]))))

Output:

"Missing Since 06/01/1976
Missing From Napa, California
Classification Endangered Missing
Sex Female
Race White
Date of Birth 02/06/1957 (63)
Age 19 years old
Height and Weight 5'2, 130 pounds
Distinguishing Characteristics Caucasian female. Brown hair, hazel eyes."

2 Comments

It's not the way I would implement in my own code, but I tackled it as a puzzle to solve as requested :-)
Thanks, I was stuck and this helped!
1

Try this:

import re


def normalize_text(get_text):
    saved_new_lines = []
    counter = 0
    for each_line in get_text.split("\n"):
        if not each_line == "":
            normalize_each_line = re.sub(r'\s+', ' ', each_line.strip())
            if each_line.startswith(" "):
                saved_new_lines[counter-1] += " " + normalize_each_line
            else:
                saved_new_lines.append(normalize_each_line)
                counter += 1
    return "\n".join(saved_new_lines)


print(normalize_text(S))

Output:

Missing Since 06/01/1976
Missing From Napa, California
Classification Endangered Missing
Sex Female
Race White
Date of Birth 02/06/1957 (63)
Age 19 years old
Height and Weight 5'2, 130 pounds
Distinguishing Characteristics Caucasian female. Brown hair, hazel eyes.

@FedericoBaù gave me the hint; so i updated my code (this version dose not have any empty-line checker so it will be much faster than its current status)

Updated:

import re


def normalize_text(get_text):
    saved_new_lines = []
    counter = 0
    for each_line in re.sub(r'\n+', '\n', get_text.strip()).splitlines():
        normalize_each_line = re.sub(r'\s+', ' ', each_line.strip())
        if each_line.startswith(" "):
            saved_new_lines[counter-1] += " {}".format(normalize_each_line)
        else:
            saved_new_lines.append(normalize_each_line)
            counter += 1
    return "\n".join(saved_new_lines)


print(normalize_text(test_string))

2 Comments

Thank you so much! I finally see it now
@Random so i appreciate if you accept and upvote my answer :)
0

I am not into python but as far as i know in php those spaces appear because of \n and \r, To fix you can simply do string.replace("<double-space-here>","") then get the variable and redo the process but this time replace "\n" then "\r". I would be better if you could find a inbuilt method to filter html characters in python.

Comments

0

@Random despite the already very useful answer, I would like to give some more insight regarding the issue you have faced, which is very usual, especially because of how the function strip expectations are.

Logically one may think that it should remove all items within a string, but it does only at the 2 extremity and not 'within' the string, but why?

You should first understand how string a really stored inside Python, they are like 'arrays' (kind of list) really for instance:

string = "  Hello world "

In reality is:

string = [" ", " ", "H", 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', ' ']

So the strip function go and check whatever character it finds from the left and right of the string and stops until encounters a different character.

Hence it won't loop at the whole string (array) but only from index 0 or index -1!

For more regarding this, I suggest to have a look at array-of-strings-in-c, despite is not Python, Python is written in C and internally has the same implementation.

Some Solutions (Not yet given)

  1. Use string.whitespace

         import string
    
         def string_cleaner(str):
             cleaned_string = []
             string_separated = str.split(' ')
             for word in string_separated:
                 if word:
                     if word in string.whitespace:
                         del word
                     else:
                         cleaned_string.append(word)
    
             ready_baby = ' '.join(cleaned_string)
             return ready_baby
    
    
         result = string_cleaner(test_string)
    

--> Short form using list comprehension

print(' '.join([s for s in test_string.split(' ') if s and s not in string.whitespace]))
  1. Use Function isspace (NO IMPORT NEDDED, pure built-in)

     def string_cleaner(str):
         cleaned_string = []
         string_separated = str.split(' ')
         for word in string_separated:
             if word:
                 if word.isspace():
                     del word
                 else:
                     cleaned_string.append(word)
    
         ready_baby = ' '.join(cleaned_string)
         return ready_baby
    
    
     result = string_cleaner(test_string)
    
     print(result)
    

--> Short form using list comprehension

print(' '.join([string for string in test_string.split(' ') if string and not string.isspace()]))

Re-visit re.sub function

Here is a reproduction of what re.sub does, note that I added many "useless" variables in order to make the code more explicit:

def string_cleaner(str):

    cleaned_string = []
    string_separated = str.split(' ')
    for word in string_separated:
        if word: # Not Blank line
            remove_whitespace_from_side = word.strip().replace('\n', ' ') # NOTE: we do this because there are multiple \n\n in some string
            separate_each_string = remove_whitespace_from_side.split()
            if separate_each_string:  # NOTE: If empty means that is a useless white spece within a string
                rejoined_sub_string = ' '.join(separate_each_string)
                ready_string = rejoined_sub_string.replace(' ', '\n') # Add again \n
                cleaned_string.append(ready_string)
    ready_to_go = ' '.join(cleaned_string)
    return ready_to_go

result = string_cleaner(test_string)

print(result)

Documentation

"Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl.".

What it does is similar to C function scanf, more information here.

5 Comments

(SO isn't CodeReview@SE.)
@greybeard if you assume this because I added other users answers and changed it then isn't a review but a needed clarification of it (as they haven't provide) and not a review especially based on the nature of issue which is the mistaken expectation of how strip function works. I realized now that it look like my intention was to review others code but it wasn't . I edit my answer to avoid this disambiguation
@FedericoBaù cheers; i saw your comment; tnx for your pythonic version i updated my answer and changed it a little bit.
@DRPK you welcome, I see some code and i couldn't resist and I'm glad you've taken it in a way that makes you improve it and not as a mere critic, the algorithm it's self is really good! Thumbed up your answer and happy new year ;)
@FedericoBaù happy new year bro/sis :); i suggest to do some code comparison with timit module for each function (include my code) and report it on your answer; iam curoius about their speed or memory usage ... your answer well explained enough. this answer should be accepted

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.