Python: Removing whitespace from multiple lines of a string

Question

So I need the output of my program to look like:

ababa
ab ba 
 xxxxxxxxxxxxxxxxxxx
that is it followed by a lot of spaces .
 no dot at the end
The largest run of consecutive whitespace characters was 47.

But what I am getting is:

ababa

ab ba

xxxxxxxxxxxxxxxxxxx
that is it followed by a lot of spaces .
no dot at the end
The longest run of consecutive whitespace characters was 47.

When looking further into the code I wrote, I found with the print(c) statement that this happens:

['ababa', '', 'ab           ba ', '', '                                      xxxxxxxxxxxxxxxxxxx', 'that is it followed by a lot of spaces                         .', '                                               no dot at the end']

Between some of the lines, theres the , '',, which is probably the cause of why my print statement wont work.

How would I remove them? I've tried using different list functions but I keep getting syntax errors.

This is the code I made:

  a = '''ababa

    ab           ba 

                                      xxxxxxxxxxxxxxxxxxx
that is it followed by a lot of spaces                         .
                                               no dot at the end'''


c = a.splitlines()
print(c)

#d = c.remove(" ") #this part doesnt work
#print(d)

for row in c:
    print(' '.join(row.split()))

last_char = ""
current_seq_len = 0
max_seq_len = 0

for d in a:
    if d == last_char:
        current_seq_len += 1
        if current_seq_len > max_seq_len:
            max_seq_len = current_seq_len
    else:
        current_seq_len = 1
        last_char = d
    #this part just needs to count the whitespace

print("The longest run of consecutive whitespace characters was",str(max_seq_len)+".")

What kind of logic creates " xxxxxxxx" out of " xxxxxxxx" ?? — Alfe
– Alfe, Commented Sep 20, 2013 at 11:44
Side note: the remove method modifies the list and returns None. Hence you should not do d = c.remove('') but simply: c.remove('') and afterwards c will have one less empty string. To remove all empty strings via remove do: for _ in range(c.count('')): c.remove('') (By the way: the empty string is '', i.e. quote-quote, without any space. In your case you where removing a single space string: ' ' quote-space-quote and you probably got some ValueErrors) — Bakuriu
– Bakuriu, Commented Sep 20, 2013 at 11:51

Veedrac · Accepted Answer · 2013-09-20 11:57:26Z

2

Regex time:

import re

print(re.sub(r"([\n ])\1*", r"\1", a))
#>>> ababa
#>>>  ab ba 
#>>>  xxxxxxxxxxxxxxxxxxx
#>>> that is it followed by a lot of spaces .
#>>>  no dot at the end

re.sub(matcher, replacement, target_string)

Matcher is r"([\n ])\1* which means:

([\n ]) → match either "\n" or " " and put it in a group (#1)
\1*     → match whatever group #1 matched, 0 or more times

And the replacement is just

\1 → group #1

You can get the longest whitespace sequence with

max(len(match.group()) for match in re.finditer(r"([\n ])\1*", a))

Which uses the same matcher but instead just gets their lengths, and then maxs it.

edited Sep 20, 2013 at 11:57

answered Sep 20, 2013 at 11:51

Veedrac

60.7k15 gold badges120 silver badges177 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Michael Aquilina · Accepted Answer · 2013-09-20 12:03:18Z

2

From what I can tell, your easiest solution would be using list comprehension:

c= [item for item in a.splitlines() if item != '']

If you wish to make it slightly more robust by also removing strings that only contain whitespace such as ' ', then you can alter it as follows:

c= [item for item in a.splitlines() if item.strip() != '']

You can then also join it the list back together as follows:

output = '\n'.join(c)

edited Sep 20, 2013 at 12:03

answered Sep 20, 2013 at 11:48

Michael Aquilina

5,5705 gold badges36 silver badges38 bronze badges

2 Comments

Matthias Over a year ago

if item.strip() is enough. No need to add != "".

Michael Aquilina Over a year ago

While its true, I prefer to use the explicit form for readability's sake.

jbaiter · Accepted Answer · 2013-09-20 11:50:16Z

1

This can be easily solved with the built-in filter function:

c = filter(None, a.splitlines())
# or, more explicit
c = filter(lambda x: x != "", a.splitlines())

The first variant will create a list with all elements from the list returned by a.splitlines() that do not evaluate to False, like the empty string. The second variant creates a small anonymous function (using lambda) that checks if a given element is the empty string and returns False if that is the case. This is more explicit than the first variant.

Another option would be to use a list comprehension that achieves the same thing:

c = [string for string in a.splitlines if string]
# or, more explicit
c = [string for string in a.splitlines if string != ""]

edited Sep 20, 2013 at 11:50

answered Sep 20, 2013 at 11:45

jbaiter

7,1494 gold badges33 silver badges40 bronze badges

4 Comments

Michael Aquilina Over a year ago

This would work. However if one of the items in the list is an empty string i.e. just white space such as ' ', then it would not be filtered out.

Bakuriu Over a year ago

@MichaelAquilina If a string contains white-space then it is not an empty string. To check whether a string is empty or space-only simply use lambda x: x.strip()). strip() without arguments removes all the consecutive spaces from the left and right of the string, resulting in an empty string if the string is space-only.

Michael Aquilina Over a year ago

@Bakuriu this in fact the approach I suggested in my answer.

jbaiter Over a year ago

But from what I could gather from the OP's question he was only dealing with true empty strings (""), that's why I didn't include strip here.

Collectives™ on Stack Overflow

Python: Removing whitespace from multiple lines of a string

3 Answers 3

Comments

2 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related