0

I am parsing input from a stream from a 3rd party piece of hardware. The thing prints messages out meant for a human. It includes keywords and other characters I don't care about. I want to take a stream and find the next occurrence of one of these keywords with regex. Then I can do a switch statement and figure out what command was sent.

I cannot use the Scanner class because reading is blocked and I cannot interrupt it to stop the thread. I cannot close the stream either as a work around.

Are there any libraries I could use to do what I am looking to do here? I found Streamflyer, but that seems to be overkill and maybe not what I am looking for. It also suggested FilterInputStream, and FilterReader, but I don't think those are what I am looking for.

4
  • I would wrap the InputStream with a Scanner, and use either findWithinHorizon(myPattern, 0) or next(myPattern). You would write myPattern a bit differently for those two methods, as the second assumes the patterns starts at the current position. Commented Oct 26, 2015 at 22:51
  • If it "prints messages out meant for a human", then it likely prints them as separate lines, so use a BufferedReader and call readLine(), then run the regex on the line. Commented Oct 26, 2015 at 23:05
  • @YangYing The findWithinHorizion method blocks if nothing is found, as far as I know, you cannot interrupt that block. Am I wrong on that? Commented Oct 27, 2015 at 0:39
  • @Andreas It sort of prints stuff out on it's own lines, there is one command, that doesn't print a new line when it is sent. Commented Oct 27, 2015 at 0:39

1 Answer 1

1

I have an open source project that can help with this, and it's much faster than a regex-based solution:

http://mtimmerm.github.io/dfalex/

In outline:

  • use DfaBuilder to make a DFA that matches .*KEYWORD for each keyword. The easiest way to specify that pattern is Pattern.maybeRepeat(CharRange.ALL).then("KEYWORD");

  • Call build() and you'll get a DfaState out. call state=state.getNextState(c) for each character of your input in turn, and whenever you're at the end of a keyword, state.getMatch() will tell you which keyword you've matched.

EDIT: The building is like this:

//The <Integer> here means you want integer results
DfaBuilder<Integer> builder = new DfaBuilder<>();

//Lets say you have a list of keywords:
for (int i=0; i<keywords.size(); ++i)
{
    Pattern pat = Pattern.maybeRepeat(CharRange.ALL)
        .then(keywords.get(i));
    builder.addPattern(pat, i);  //when this pattern matches, we get i out
}
DfaState<Integer> startState = builder.build(null);

And then use it like this:

DfaState<Integer> st = startState;
for (... each input character c ...)
{
    st = st.getNextState(c);
    //if this is non-null, then it's the index of the matched keyword
    //in the keywords list
    Integer match = st.getMatch();
}
Sign up to request clarification or add additional context in comments.

7 Comments

Could you provide an example? I am having trouble understanding what you are saying.
I have multiple keywords as well. Would I need multiple calls to then?
I added an example to the answer
OK, so I understand that it can find patterns and return an object that you can associate with a command. In addition to normal command, I also need to do is retrieve data from the input as well. I can use the Pattern to find a regex pattern of the data, it must match Lane\s\d\s*\d.\d\d\d\d which would find something like this, Lane 5 1.2345. I need to get the lane number (in this case 5) and the time (in this case 1.2345). As far as I can tell, I can only match a pattern and return a static value.
If your parsing requirements are all this simple, then you'll be OK, but if they get too much more complicated then you might have to look at building a real parser. For this kind of thing, though, what you would normally do is associate the pattern for each command with the procedure for parsing out the associated values. Instead of using Integer for the MATCHRESULT, I would typically use an Enum with an abstrac method the for the parsing procedure, which each enum value would implement. The LANE procedure could go back 2 whitespace sequences, make a string, and split it.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.