2

A question out of curiosity, is there a way to do pattern matching on a bit level ? Currently all the regex systems I have seen operate on a byte or character based representation, but I haven't seen, any that will let you match on a bit level.

For example, if I have a bit field like this :

011101100011100110110001

(24 bits!) can I check that bits 7,8 & 9 are the pattern 100 ?

Language agnostic answers are preferable, but as I know of nowhere that does it, I would appreciate any insight.

NOTE: I wish to do this on an arbitrary number of bits so converting to bytes (or padding to a byte size) and applying a convoluted normal regexp is NOT what I want !

Thanks,

5
  • What do you mean by bitfield? What language are you using? Is bitfield a special type or just a string of 0 and 1s? Commented May 26, 2012 at 20:24
  • A better example would be to find the pattern 100 anywhere in the stream. You don't want or need regex to check a known fixed position. Commented May 26, 2012 at 20:51
  • @tripleee ok poor example... just imagine a more complex pattern anywhere in a stream of bits. Commented May 26, 2012 at 20:53
  • @MarkByers Bit field as in field of bits. ones and zeros, binary data. Also See "The C programming Language" By K&R for where i may come across this in a programming context. - Note, the question is not C specific. Commented May 26, 2012 at 20:56
  • @NWS : please check out my answer and if you are interested in Python code written in a self explaining way providing a kind of meta-programming instructions which create required regex pattern out of any bit pattern of form "111xxx000" where the "x" marks bits which do not matter in pattern matching, let me know and I will extend my answer with it. Commented Nov 18, 2023 at 14:29

3 Answers 3

0

Certainly there is no theoretical limit which would make it impossible. In fact, the associated theory can apply to any alphabet, and examples often use quite small alphabets, though not usually the one consisting of the symbols 0 and 1. You might want to read a book about computational theory.

Sign up to request clarification or add additional context in comments.

3 Comments

I suspected the theory would allow it, which is why I posed the question to see if there was an application of it also.
As a baseline inplementation, converting a bit vector to a string of ones and zeros, and applying a normal textual regex sounds like two to three lines of Perl.
Ill mark this as correct, as it is probably the best solution.
0

Assuming that you're trying to check actual bits and not a string of 1s and 0s, I don't believe you can do this with regex per se, but you can apply a bit mask to check the status of certain bits. For example, to check the LMB is 1:

11000100

AND

10000000

= 10000000

4 Comments

Yes Bits, but a bit mask was not quite what i has in mind (albeit implied by my poor example!).
@NWS I have limited experience with regex as an engine, but from my experience I would have to say that no, it is not possible.
And as Tripleee said, the theory should allow it. Just the current applications dont support it.
@NWS Definitely- but I stick with my main point that implementing a bit mask will solve your issues. Checking if three bits are 100 is as simple is applying a bit mask of 111 and seeing if you get 100 returned.
0

Is there a way to do pattern matching on a bit level ?

Sure there is a way and already in wide use. Just take a look at regex patterns for checking if a byte string represents valid utf-8 coded Unicode text.

Here how it goes:

Let's assume that the bit stream is given by a byte string of ASCII characters with values hex: 00-FF (dec: 0-255) and you are searching for the occurrence of the bit pattern bin: 00111111 which is hex: 3F (dec: 77).

The solution for a regular expression able to find this bit pattern in a string across byte boundaries is to search for the byte pairs delivering this bit pattern.

To achieve this you need to define all appropriate two character long combinations the regular expression pattern has to search for.

Let's show here only some of them to give an example:

First regex pattern in a chain of alternatives:

\0x3F

Second regex pattern in a chain of alternatives:

[\x1F\x9F][\x80-\xFF]

... and so on for all of the 8 possible bit patterns across byte boundaries ...

The resulting regex will look like: (\x3F)|([\x1F\x9F][\x80-\xFF])| ...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.