2

Let me make my question Simple what I want is I am using white list Regex pattern to avoid xss and sql injection so as my allowed character in string is [A-Za-z0-9,()[]{}\"\:./_\s] and I want to restrict occurrence of -- in any coming request from client but it should allow - or jjdfasd-dsfads-12321 string

In short it below test cases should run successfully

import java.util.regex.Pattern;


public class RegExTest {

private static Pattern xssAttackPattern;

private static final String XSS_ATTACK_REGULAR_EXPRESSION1 = "-?[A-Za-z0-9,\\(\\)\\[\\]\\{\\}\"\\:./_\\s]*";


public static Pattern getXSSAttackPattern1() {
    xssAttackPattern = Pattern.compile(XSS_ATTACK_REGULAR_EXPRESSION1);
    return xssAttackPattern;
}

public static boolean hasXSSAttackOrSQLInjection1(String value) {

    if (getXSSAttackPattern1().matcher(value).matches()) {
        return true;
    }
    return false;
}



public static void main(String arg[]) {

    System.out.println(" :::::: Regular Expression ::::::");
    regexTest();

}

private static void regexTest() {

    String str1 = "-dsfdsfddsfd2112212s";
    String str2 = "--dsfdsfddsfd2112212s";
    String str3 = "-dsfdsfdd-sfd2112212s";
    String str4="http://rss.cnn.com/rss/edition_business.rss?id=121132511$@#$@$@#%242444+gfghgfhg";
    String str5="(.:[]{}";
    String str6="--";
    String str7="-";

    System.out.println("String::" + str1 + "::Result::"
            + hasXSSAttackOrSQLInjection1(str1));
    System.out.println("String::" + str2 + "::Result::"
            + hasXSSAttackOrSQLInjection1(str2));
    System.out.println("String::" + str3 + "::Result::"
            + hasXSSAttackOrSQLInjection1(str3));
    System.out.println("String::" + str4 + "::Result::"
            + hasXSSAttackOrSQLInjection1(str4));
    System.out.println("String::" + str5 + "::Result::"
            + hasXSSAttackOrSQLInjection1(str5));
    System.out.println("String::" + str6 + "::Result::"
            + hasXSSAttackOrSQLInjection1(str6));
    System.out.println("String::" + str7 + "::Result::"
            + hasXSSAttackOrSQLInjection1(str7));
}

}

3
  • "allow multiple occurrence of A-Z 0-9 and a-z but (-) occurrence only zero or one in string." I can't understand this, the sentence contradicts itself. Commented Apr 30, 2012 at 10:17
  • 1
    What with the backslash orgy in your proposed regexp??? Commented Apr 30, 2012 at 10:18
  • @Harshil - so do you want your regex to match the above strings or to discard them ? They are far from A-Za-z0-9 and a dash only... And please explain the slashes. Commented Apr 30, 2012 at 10:44

1 Answer 1

2

You current regex matches

  • a string consisting of a single - character, or
  • a string consisting of a sequence of letters, digits, and some special characters, or
  • an empty string

If you would like to change it to allow zero or one dash - only at the beginning of the string, remove the OR character | from your expression; if you would like to match at most one dash anywhere in the string, change expression to

[A-Za-z0-9,\\(\\)\\[\\]\\{\\}\"\\:./_\\s]*-?[A-Za-z0-9,\\(\\)\\[\\]\\{\\}\"\\:./_\\s]*

EDIT 1: If you need to avoid two consecutive dashes, you can use this expression with negative lookbehind:

([A-Za-z0-9,\\(\\)\\[\\]\\{\\}\"\\:./_\\s]|(?<!-)-)*

The (?<!-)- part of the expression above matches a dash unless it is preceded by another dash.

EDIT 2: If you have strings of 10000+ length, a positive regex solution is not as good as a negative one. Instead of looking for myString.matches(positiveExpr), it is much more efficient to look for !myString.matches(negativeExpr), and use this expression for your negative match. In other words, instead of specifying an expression defining the string that you want, you could define a much simpler expression for the string that you do not want:

[^A-Za-z0-9,\\(\\)\\[\\]\\{\\}\"\\:./_\\s]|--

NOTE: Sanitizing your strings is not the best way to avoid SQL injection attacks; using parameterized statements is.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks for your quick response but still it is not satisfying my above test case class
I just ran your code at ideone, and it produces a good sequence of answers: first is a match, second and third are not, because both have two dashes. Fourth has invalid characters (@, #, $, %), fifth does not have illegal characters or dashes, so it matches; sixth is a double dash, and seventh matches fine.
what I want is my third string should match as there is no two consecutive --
@Harshil I did not realize that you were looking for consecutive dashes. Please take a look at my edit.
Hi as I am getting issue with you given Regex pattern for very large string approx (14965 characters)which is read from some file I have have same failure trace as in this link stackoverflow.com/questions/3681928/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.