Python loop through string in nested for loops

Question

I'm just wondering, I'm trying to make a very simple text processing or reduction. I want to replace all spaces (without these in " ") by one. I also have some semantic action dependent on each character read, so I that's why I don't want to use any regex. It's some kind of pseudo FSM model.

So here's the the deal:

s = '''that's my     string, "   keep these spaces     "    but reduce these '''

Desired ouput:

that's my string, "   keep these spaces    " but reduce these

What I would like to do is something like this: (I don't mention the '"' case to keep the example simple)

out = ""
for i in range(len(s)):

  if s[i].isspace():
    out += ' '
    while s[i].isspace():
      i += 1

  else:
    out += s[i]

I don't quite understand how the scopes are created or shared in this case.

Thank you for advice.

The problem is that once you skipped all of the parenthesis on the while loop, the i variable will take the next value after the last "space" that meet the s[i].isspace() condition...so you will not delete all of the parenthesis, you will just iterate over them again... — avenet
– avenet, Commented Jan 10, 2014 at 20:16
aah, sorry, I've missed them, they are both the s string, I'm blind I guess. — dakov
– dakov, Commented Jan 10, 2014 at 20:17

Robᵩ · Accepted Answer · 2014-01-10 21:05:54Z

1

I also have some semantic action dependent on each character read ... It's some kind of pseudo FSM model.

You could actually implement an FSM:

s = '''that's my     string, "   keep these spaces     "    but reduce these '''


normal, quoted, eating = 0,1,2
state = eating
result = ''
for ch in s:
  if (state, ch) == (eating, ' '):
    continue
  elif (state,ch) == (eating, '"'):
    result += ch
    state = quoted
  elif state == eating:
    result += ch
    state = normal
  elif (state, ch) == (quoted, '"'):
    result += ch
    state = normal
  elif state == quoted:
    result += ch
  elif (state,ch) == (normal, '"'):
    result += ch
    state = quoted
  elif (state,ch) == (normal, ' '):
    result += ch
    state = eating
  else: # state == normal
    result += ch

print result

Or, the data-driven version:

actions = {
    'normal' : {
        ' ' : lambda x: ('eating', ' '),
        '"' : lambda x: ('quoted', '"'),
        None: lambda x: ('normal', x)
    },
    'eating' : {
        ' ' : lambda x: ('eating', ''),
        '"' : lambda x: ('quoted', '"'),
        None: lambda x: ('normal', x)
    },
    'quoted' : {
        '"' : lambda x: ('normal', '"'),
        '\\': lambda x: ('escaped', '\\'),
        None: lambda x: ('quoted', x)
    },
    'escaped' : {
        None: lambda x: ('quoted', x)
    }
}

def reduce(s):
    result = ''
    state = 'eating'
    for ch in s:
        state, ch = actions[state].get(ch, actions[state][None])(ch)
        result += ch
    return result

s = '''that's my     string, "   keep these spaces     "    but reduce these '''
print reduce(s)

edited Jan 10, 2014 at 21:05

answered Jan 10, 2014 at 20:37

Robᵩ

170k20 gold badges251 silver badges323 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

dakov Over a year ago

I've started doing that :)

dakov Over a year ago

Works nice, I only add escape \" sequence check and it should be enough for my purpose.

Robᵩ Over a year ago

Or see the data-driven version for a more explicit state machine, with \" escaping.

Filip Malczak · Accepted Answer · 2014-01-10 20:20:03Z

1

Use shlex to parse your string to quoted and unquoted parts, then in unquoted parts use regex to replace sequence of whitespace with one space.

answered Jan 10, 2014 at 20:20

Filip Malczak

3,2221 gold badge26 silver badges45 bronze badges

1 Comment

kalhartt Over a year ago

That's rather ingenious actually, but its not going to work. It should fail on the single quote in that's in his example, and other similar cases. I wonder if there is an appropriate parser somewhere in the standard library though. EDIT: looks like shlex might be configurable to do this though. I leave it to you to sort this out :)

bereal · Accepted Answer · 2014-01-10 20:51:21Z

1

As already suggested, I'd use the standard shlex module instead, with some adjustments:

import shlex

def reduce_spaces(s):
    lex = shlex.shlex(s)
    lex.quotes = '"'             # ignore single quotes
    lex.whitespace_split = True  # use only spaces to separate tokens
    tokens = iter(lex.get_token, lex.eof)  # exhaust the lexer
    return ' '.join(tokens)

>>> s = '''that's my   string, "   keep these spaces     "   but reduce these '''
>>> reduce_spaces(s)
'that\'s my string, "   keep these spaces     " but reduce these'

edited Jan 10, 2014 at 20:51

answered Jan 10, 2014 at 20:36

bereal

34.7k8 gold badges65 silver badges111 bronze badges

Comments

inspectorG4dget · Accepted Answer · 2014-01-10 20:17:58Z

i = iter((i for i,char in enumerate(s) if char=='"'))
zones = list(zip(*[i]*2))  # a list of all the "zones" where spaces should not be manipulated
answer = []
space = False
for i,char in enumerate(s):
    if not any(zone[0] <= i <= zone[1] for zone in zones):
        if char.isspace():
            if not space:
                answer.append(char)
        else:
            answer.append(char)
    else:
        answer.append(char)
    space = char.isspace()

print(''.join(answer))

And the output:

>>> s = '''that's my     string, "   keep these spaces     "    but reduce these '''
>>> i = iter((i for i,char in enumerate(s) if char=='"'))
>>> zones = list(zip(*[i]*2))
>>> answer = []
>>> space = False
>>> for i,char in enumerate(s):
...     if not any(zone[0] <= i <= zone[1] for zone in zones):
...         if char.isspace():
...             if not space:
...                 answer.append(char)
...         else:
...             answer.append(char)
...     else:
...         answer.append(char)
...     space = char.isspace()
... 
>>> print(''.join(answer))
that's my string, "   keep these spaces     " but reduce these

user1969453 · Accepted Answer · 2014-01-10 20:48:48Z

0

It is a bit of a hack but you could do reducing to a single space with a one-liner.

one_space = lambda s : ' '.join([part for part in s.split(' ') if part]

This joins the parts that are not empty, that is they have not space characters, together separated by a single space. The harder part of course is separating out the exceptional part in double quotes. In real production code you would want to be careful of cases like escaped double quotes as well. But presuming that you have only well mannered case you could separate those out as well. I presume in real code you may have more than one double quoted section.

You can do this making a list from your string separated by double quote and using only once one the even indexed items and directly appending the even indexed items I believe from working some examples.

def fix_spaces(s):
  dbl_parts = s.split('"')
  normalize = lambda i: one_space(' ', dbl_parts[i]) if not i%2 else dbl_parts[i]
  return ' '.join([normalize(i) for i in range(len(dbl_parts))])

answered Jan 10, 2014 at 20:48

user1969453

1 Comment

dakov Over a year ago

As I said I need also to assign semantic actions to several characters, so I don't think this approach would be transparent enough to do this.

Pulimon · Accepted Answer · 2014-01-11 13:56:02Z

A bit concerned whether this solution will be readable or not. Modified the string OP suggested to include multiple double quote pairs in the given string.

s = '''that's my     string,   "   keep these spaces     "" as    well    as these    "    reduce these"   keep these spaces too   "   but not these  '''
s_split = s.split('"')

# The substrings in odd positions of list s_split should retain their spaces.
# These elements have however lost their double quotes during .split('"'),
# so add them for new string. For the substrings in even postions, remove 
# the multiple spaces in between by splitting them again using .split() 
# and joining them with a single space. However this will not conserve 
# leading and trailing spaces. In order conserve them, add a dummy 
# character (in this case '-') at the start and end of the substring before 
# the split. Remove the dummy bits after the split.
#
# Finally join the elements in new_string_list to create the desired string.

new_string_list = ['"' + x + '"' if i%2 == 1
                   else ' '.join(('-' + x + '-').split())[1:-1]                   
                   for i,x in enumerate(s_split)]
new_string = ''.join(new_string_list)
print(new_string)

Output is

>>>that's my string, "   keep these spaces     "" as    well    as these    " reduce these"   keep these spaces too   " but not these

Collectives™ on Stack Overflow

Python loop through string in nested for loops

6 Answers 6

3 Comments

1 Comment

Comments

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

3 Comments

1 Comment

Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related