1
import sys
import os
import re
import numpy as np
#Tags to remove, sample line:  1:one:2:two:....:122:twentytwo:....:194:ninetyfour:....
r122 = re.compile(':122:(.):')
r194  = re.compile(':194:(.):')

if len(sys.argv) < 2 :
    sys.exit('Usage: python %s <file2filter>' % sys.argv[0])
if not os.path.exists(sys.argv[1]):
    sys.exit('ERROR: file %s not found!' % sys.argv[1])
with open (sys.argv[1]) as f:
    for line in f:
        line = re.sub(r':122:(.):', '', str(line))
        line = re.sub(r':194:(.):', '', str(line))
        print(line,end=" ")

In

1:one:2:two:....:122:twentytwo:....:194:ninetyfour:....

Out

1:one:2:two:....:122:twentytwo:....:194:ninetyfour:....

the tags 122 and 194 are not removed. what am i missing here ?

12
  • What is this code supposed to do? Commented Apr 30, 2020 at 17:30
  • I want to remove :122:twentytwo: and :194:ninetyfour: from the lines in the file Commented Apr 30, 2020 at 17:31
  • 1
    So, you need to replace (.) with [^:]+ in your patterns. And you need just one, with open (sys.argv[1], 'r') as f: and then for line in f: print(re.sub(r':1(?:22|94):[^:]+:', '', line)) Commented Apr 30, 2020 at 17:33
  • (.) only matches one character between :122: and :, but twentytwo is longer than 1 character. Commented Apr 30, 2020 at 17:33
  • 1
    @WiktorStribiżew Post that as an answer. Commented Apr 30, 2020 at 17:36

1 Answer 1

1

Your patterns contain (.) that matches and captures any single char other than a line break char. What you want is to match any chars other than :, so you need to use [^:]+.

You do not need to compile separate regex objects if only a part of your regex changes. You may build you regex dynamically abd compile once before reading the file. E.g. you have 122, 194 and 945 values to use in :...:[^:]+: pattern in place of ..., then you may use

vals = ["122", "194", "945"]
r = re.compile(r':(?:{}):[^:]+:'.format("|".join(vals)))
# Or, using f-strings
# r = re.compile(rf':(?:{"|".join(vals)}):[^:]+:')

The regex will look like :(?:122|194|945):[^:]+::

  • : - a colon
  • (?:122|194|945) - a non-capturing group matching 122, 194 or 945
  • : - a colon
  • [^:]+ - 1+ chars other than a :
  • : - a colon

Then use

with open (sys.argv[1], 'r') as f: 
    for line in f:
        print(r.sub('', line))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.