3

This is a sample text: \1f\1e\1d\020028. I cannot modify the input text, I am reading long string of texts from a file.


I want to extract the following: \1f, \1e, \1d, \02

For this, I have written the following regular expression pattern: "\\[a-fA-F0-9]"

I am using Pattern and Matcher classes, but my matcher is not able find the pattern using the mentioned regular expression. I have tested this regex with the text on some online regex websites and surprisingly it works there.

Where am I going wrong?

Original code:

public static void main(String[] args) {
    String inputText = "\1f\1e\1d\02002868BF03030000000000000000S023\1f\1e\1d\03\0d";
    inputText        = inputText.replace("\\", "\\\\");

    String regex     = "\\\\[a-fA-F0-9]{2}";

    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(inputText);

    while (m.find()) {
        System.out.println(m.group());
    }
}

Output: Nothing is printed

6
  • 1
    I'd guess that some of your backslashes are escaping things you don't intend them to. You'd have to show us your actual code for me to be sure, though. Commented Nov 5, 2014 at 22:03
  • \\[a-fA-F0-9] looks for backslash followed by one letter or digit. I think you want to look for backslash followed by two letters or digits. I suspect you can figure out how to fix this. Commented Nov 5, 2014 at 22:03
  • 1
    Did you format input String properly? It should be '\\1f\\1e\\1d\\020028' i think. Commented Nov 5, 2014 at 22:04
  • 1
    To make helping you easier post code example of how you are using this regex. Commented Nov 5, 2014 at 22:11
  • 1
    Is this the text from your input file? Can we see how you read it? Also what do you see when you print what you red? Commented Nov 6, 2014 at 14:26

4 Answers 4

2

(answer changed after OP added more details)

Your string

String inputText = "\1f\1e\1d\02002868BF03030000000000000000S023\1f\1e\1d\03\0d";

Doesn't actually contains any \ literals because according to Java Language Specification in section 3.10.6. Escape Sequences for Character and String Literals \xxx will be interpreted as character indexed in Unicode Table with octal (base/radix 8) value represented by xxx part.

Example \123 = 1*82 + 2*81 + 3*80 = 1*64 + 2*8 + 3*1 = 64+16+3 = 83 which represents character S

If string you presented in your question is written exactly the same in your text file then you should write it as

String inputText = "\\1f\\1e\\1d\\02002868BF03030000000000000000S023\\1f\\1e\\1d\\03\\0d";

(with escaped \ which now will represent literal).


(older version of my answer)

It is hard to tell what exactly you did wrong without seeing your code. You should be able to find at least \1, \1, \1, \0 since your regex can match one \ and one hexadecimal character placed after it.

Anyway this is how you can find results you mentioned in question:

String text = "\\1f\\1e\\1d\\020028";
Pattern p = Pattern.compile("\\\\[a-fA-F0-9]{2}");
//                                          ^^^--we want to find two hexadecimal 
//                                               characters after \
Matcher m = p.matcher(text);
while (m.find())
    System.out.println(m.group());

Output:

\1f
\1e
\1d
\02
Sign up to request clarification or add additional context in comments.

1 Comment

The code you mentioned works. But, when I did something similar as you can see above, it is not working.
1

You need to read the file properly and replace '\' characters with '\\'. Assume that there is file called test_file in your project with this content:

\1f\1e\1d\02002868BF03030000000000000000S023\1f\1e\1d\03\0d

Here is the code to read the file and extract values:

public static void main(String[] args) throws IOException, URISyntaxException {        
    Test t = new Test();
    t.test();
}

public void test() throws IOException {        
    BufferedReader br =
        new BufferedReader(
            new InputStreamReader(
                getClass().getResourceAsStream("/test_file.txt"), "UTF-8"));
    String inputText;

    while ((inputText = br.readLine()) != null) {
        inputText = inputText.replace("\\", "\\\\");

        Pattern pattern = Pattern.compile("\\\\[a-fA-F0-9]{2}");
        Matcher match = pattern.matcher(inputText);

        while (match.find()) {
            System.out.println(match.group());
        }
    }
}

6 Comments

Your code indeed works. But, when I did something similar as you can see above, it is not working.
The problem is escaping the input String. Check an update. I used StringEscapeUtils from apache commons lang.
@bullzeye explanation escapeJava will return Unicode representation instead of octal one, so instead of \1 or \0 you will get \u0001 or \u0000 that is why replace("\\u000", "\\") is needed (to convert \u0001 to \1 like in your string).
@bullzeye Anyway this method fails for instance in case of \03 because it relays on assumption that you will only have \x for of octal values, not \xx ones which could represent value greater than 15 which would need to be written using two hexadecimal character which would make escaping it return \u00XX.
@bullzeye Also this method will not escape characters represented by \123 (83 in decimal -> 'S' character) because it is normal character used in Java language which doesn't require escaping.
|
0

Try adding a . at the end, like:

\\[a-fA-F0-9].

Comments

0

If you don't want to modify the input string, you could try something like:

static public void main(String[] argv) {

            String s = "\1f\1e\1d\020028";
            Pattern regex = Pattern.compile("[\\x00-\\x1f][0-9A-Fa-f]");
            Matcher match = regex.matcher(s);
            while (match.find()) {
                    char[] c = match.group().toCharArray();
                    System.out.println(String.format("\\%d%s",c[0]+0, c[1])) ;
            }
    }

Yes, it's not perfect, but you get the idea.

1 Comment

Thank you! This solution works partially. For the input string as I mentioned in my modified answer, following is the output: '\1f \1e \1d \160 \1f \1e \1d \0d'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.