Regex to find strings not containing a specified value

Question

I'm using notepad++'s regular expression search function to find all strings in a .txt document that do not contain a specific value (HIJ in the below example), where all strings begin with the same value (ABC in the below example).

How would I go about doing this?

Example

Every String starts with ABC
ABC is never used in a string other than at the beginning, ABCABC123 would be two strings --"ABC" and "ABC123"
HIJ may appear multiple times in a string
I need to find the strings that do not contain HIJ
Input is one long file with no line breaks, but does contain special characters (*, ^, @, ~, :) and spaces

Example Input:

ABC1234HIJ56ABC7@HIJABC89ABCHIJ0ABE:HIJABC12~34HI456J

Example Input would be viewed as the following strings

ABC1234HIJ56
ABC7@HIJ
ABC89
ABCHIJ0ABE:HIJ
ABC12%34HI456J

The Third and Fifth strings both lack "HIJ" and therefore are included in the output, all others are not included in the output.

Example desired output:

ABC89
ABC12~34HI456J

I am 99% new to RegEx and will be looking more into it in the future, as my job description suddenly changed earlier this week when someone else in the company left suddenly, and therefore I have been doing this manually by searching (ABC|HIJ) and going through the search function's results looking for "ABC" appearing twice in a row. Supposedly the former employee was able to do this in an automated way, but left no documentation.

Any help would be appreciated!

This question is a repeat of a prior question I asked, but I was very very bad at formatting a question and it seems to have sunk beyond noticeable levels.

Casimir et Hippolyte · Accepted Answer · 2014-12-04 16:32:35Z

2

You can find the items you want with:

ABC(?:[^HA]+|H(?!IJ)|A(?!BC))*+(?=ABC|$)

Note: in this first pattern, you can replace (?=ABC|$) with (?!HIJ)

pattern details:

ABC
(?:            # non-capturing group
    [^HA]+     # all that is not a H or an A
  |            # OR
    H(?!IJ)    # an H not followed by IJ
  |
    A(?!BC)    # an A not followed by BC
)*+            # repeat the group
(?=ABC|$)      # followed by "ABC" or the end of the string

Note: if you want to remove all that is not the items you want you can make this search replace:

search: (?:ABC(?:[^HA]+|H(?!IJ)|A(?!BC))*+HIJ.*?(?=ABC|$))+|(?=ABC)
replace: \r\n

edited Dec 4, 2014 at 16:32

answered Dec 4, 2014 at 16:02

Casimir et Hippolyte

90k5 gold badges102 silver badges131 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

blackmind Over a year ago

you need the /g modifier to get multiple

Casimir et Hippolyte Over a year ago

@blackmind: No, since it is to be used in Notepad++, the user have the choice to click on a "find" or a "findall" button.

blackmind Over a year ago

missed that part, well FYI to anyone else not using notepad++

Casimir et Hippolyte Over a year ago

@blackmind: note that the g flag doesn't always exist (PHP, Python), in this case, the global research is determined by the function you use.

Andrew Che Over a year ago

THANK YOU! This works perfectly! Plus I understand how it works! Great answer!

alpha bravo · Accepted Answer · 2014-12-04 16:44:15Z

0

you could use this pattern

(ABC(?:(?!HIJ).)*?)(?=ABC|\R)

Demo

(               # Capturing Group (1)
  ABC           # "ABC"
  (?:           # Non Capturing Group
    (?!         # Negative Look-Ahead
      HIJ       # "HIJ"
    )           # End of Negative Look-Ahead
    .           # Any character except line break
  )             # End of Non Capturing Group
  *?            # (zero or more)(lazy)
)               # End of Capturing Group (1)
(?=             # Look-Ahead
  ABC           # "ABC"
  |             # OR
  \R            # <line break>
)               # End of Look-Ahead

answered Dec 4, 2014 at 16:44

alpha bravo

7,9681 gold badge24 silver badges25 bronze badges

Comments

nitishagar · Accepted Answer · 2014-12-04 16:46:14Z

0

You can use the following expression to match your criterion:

(^ABC(?:(?!HIJ).)*$)

This starts with ABC and looks ahead (negative) for HIJ pattern. The pattern works for the separated strings.

For a single line pattern (as provided in your question), a slight modification of this works (as follows):

(ABC(?:(?!HIJ).)*?)(?=ABC|$)

edited Dec 4, 2014 at 16:46

answered Dec 4, 2014 at 16:38

nitishagar

9,4313 gold badges32 silver badges41 bronze badges

Collectives™ on Stack Overflow

Regex to find strings not containing a specified value

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related