Replace multiple lines with regex

Question

Given the following text, I want to remove everything in data_augmentation_options{..}

i.e., input is :

  batch_size: 4
  num_steps: 30
  data_augmentation_options {
    random_horizontal_flip {
      keypoint_flip_permutation: 0
      keypoint_flip_permutation: 2
      keypoint_flip_permutation: 1
      keypoint_flip_permutation: 4
      keypoint_flip_permutation: 3
      keypoint_flip_permutation: 6
      keypoint_flip_permutation: 5
      keypoint_flip_permutation: 8
      keypoint_flip_permutation: 7
      keypoint_flip_permutation: 10
      keypoint_flip_permutation: 9
      keypoint_flip_permutation: 12
      keypoint_flip_permutation: 11
      keypoint_flip_permutation: 14
      keypoint_flip_permutation: 13
      keypoint_flip_permutation: 16
      keypoint_flip_permutation: 15
    }
  }

  data_augmentation_options {
    random_crop_image {
      min_aspect_ratio: 0.5
      max_aspect_ratio: 1.7
      random_coef: 0.25
    }
  }

expected output is:

  batch_size: 4
  num_steps: 30

I tried

s='''
      batch_size: 4
      num_steps: 30
      data_augmentation_options {
        random_horizontal_flip {
          keypoint_flip_permutation: 0
          keypoint_flip_permutation: 2
          keypoint_flip_permutation: 1
          keypoint_flip_permutation: 4
          keypoint_flip_permutation: 3
          keypoint_flip_permutation: 6
          keypoint_flip_permutation: 5
          keypoint_flip_permutation: 8
          keypoint_flip_permutation: 7
          keypoint_flip_permutation: 10
          keypoint_flip_permutation: 9
          keypoint_flip_permutation: 12
          keypoint_flip_permutation: 11
          keypoint_flip_permutation: 14
          keypoint_flip_permutation: 13
          keypoint_flip_permutation: 16
          keypoint_flip_permutation: 15
        }
      }
    
      data_augmentation_options {
        random_crop_image {
          min_aspect_ratio: 0.5
          max_aspect_ratio: 1.7
          random_coef: 0.25
        }
      }
'''
print(re.sub('data_augmentation_options \{*\}','',s,flags=re.S))

It does not seem to work, what's the right way to achieve this?

dawg · Accepted Answer · 2020-12-28 14:46:12Z

1

Rather than deleting what you don't want you could capture what you do want:

>>> re.findall(r'batch_size: *\d+|num_steps: *\d+',s)
['batch_size: 4', 'num_steps: 30']

Or if you want to capture the leading spaces:

>>> re.findall(r'^[ \t]*(?:batch_size:|num_steps:)[ \t]*\d+',s, flags=re.M)
['\t\t\tbatch_size: 4', '\t\t\tnum_steps: 30']

Then print the result:

>>> print('\n'.join(re.findall(r'^[ \t]*(?:batch_size:|num_steps:)[ \t]*\d+',s, flags=re.M))
        batch_size: 4
        num_steps: 30

If you want to use re.sub you can use a conflicted character class that will match any and all characters after the match. A conflicted character class is something like [\s\S] which is a space or non-space character:

>>> re.sub(r'data_augmentation_options[\s\S]*','',s)

        batch_size: 4
        num_steps: 30

Perhaps even easier is to just use Python's str.partition with the string that you want to use as a separator:

>>> s.partition('data_augmentation_options')[0]

        batch_size: 4
        num_steps: 30

edited Dec 28, 2020 at 14:46

answered Dec 28, 2020 at 14:38

dawg

105k24 gold badges142 silver badges217 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

william007 Over a year ago

Thanks, could you help looking at stackoverflow.com/questions/65479947/…

Jarvis · Accepted Answer · 2020-12-28 14:41:39Z

0

This will work:

s = re.sub("\W+data_augmentation_options {(?:.|\n)*}", "", s).strip()

answered Dec 28, 2020 at 14:41

Jarvis

8,5923 gold badges33 silver badges61 bronze badges

Collectives™ on Stack Overflow

Replace multiple lines with regex

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related