0

Given the following text, I want to remove everything in data_augmentation_options{..}

i.e., input is :

  batch_size: 4
  num_steps: 30
  data_augmentation_options {
    random_horizontal_flip {
      keypoint_flip_permutation: 0
      keypoint_flip_permutation: 2
      keypoint_flip_permutation: 1
      keypoint_flip_permutation: 4
      keypoint_flip_permutation: 3
      keypoint_flip_permutation: 6
      keypoint_flip_permutation: 5
      keypoint_flip_permutation: 8
      keypoint_flip_permutation: 7
      keypoint_flip_permutation: 10
      keypoint_flip_permutation: 9
      keypoint_flip_permutation: 12
      keypoint_flip_permutation: 11
      keypoint_flip_permutation: 14
      keypoint_flip_permutation: 13
      keypoint_flip_permutation: 16
      keypoint_flip_permutation: 15
    }
  }

  data_augmentation_options {
    random_crop_image {
      min_aspect_ratio: 0.5
      max_aspect_ratio: 1.7
      random_coef: 0.25
    }
  }

expected output is:

  batch_size: 4
  num_steps: 30  

I tried

s='''
      batch_size: 4
      num_steps: 30
      data_augmentation_options {
        random_horizontal_flip {
          keypoint_flip_permutation: 0
          keypoint_flip_permutation: 2
          keypoint_flip_permutation: 1
          keypoint_flip_permutation: 4
          keypoint_flip_permutation: 3
          keypoint_flip_permutation: 6
          keypoint_flip_permutation: 5
          keypoint_flip_permutation: 8
          keypoint_flip_permutation: 7
          keypoint_flip_permutation: 10
          keypoint_flip_permutation: 9
          keypoint_flip_permutation: 12
          keypoint_flip_permutation: 11
          keypoint_flip_permutation: 14
          keypoint_flip_permutation: 13
          keypoint_flip_permutation: 16
          keypoint_flip_permutation: 15
        }
      }
    
      data_augmentation_options {
        random_crop_image {
          min_aspect_ratio: 0.5
          max_aspect_ratio: 1.7
          random_coef: 0.25
        }
      }
'''
print(re.sub('data_augmentation_options \{*\}','',s,flags=re.S))

It does not seem to work, what's the right way to achieve this?

2 Answers 2

1

Rather than deleting what you don't want you could capture what you do want:

>>> re.findall(r'batch_size: *\d+|num_steps: *\d+',s)
['batch_size: 4', 'num_steps: 30']

Or if you want to capture the leading spaces:

>>> re.findall(r'^[ \t]*(?:batch_size:|num_steps:)[ \t]*\d+',s, flags=re.M)
['\t\t\tbatch_size: 4', '\t\t\tnum_steps: 30']

Then print the result:

>>> print('\n'.join(re.findall(r'^[ \t]*(?:batch_size:|num_steps:)[ \t]*\d+',s, flags=re.M))
        batch_size: 4
        num_steps: 30

If you want to use re.sub you can use a conflicted character class that will match any and all characters after the match. A conflicted character class is something like [\s\S] which is a space or non-space character:

>>> re.sub(r'data_augmentation_options[\s\S]*','',s)

        batch_size: 4
        num_steps: 30

Perhaps even easier is to just use Python's str.partition with the string that you want to use as a separator:

>>> s.partition('data_augmentation_options')[0]

        batch_size: 4
        num_steps: 30
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, could you help looking at stackoverflow.com/questions/65479947/…
0

This will work:

s = re.sub("\W+data_augmentation_options {(?:.|\n)*}", "", s).strip()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.