0

I'm trying to create a regex pattern to match the lines in the following format:

field[bii] = float4:.4f_degree  // Galactic Latitude
field[class] = int2  (index) // Browse Object Classification
field[dec] = float8:.4f_degree (key) // Declination
field[name] = char20  (index) // Object Designation
field[dircos1] = float8   // 1st Directional Cosine

I came up with this pattern, which seemed to work, then suddenly seemed NOT to work:

field\[(.*)\] = (float|int|char)([0-9]|[1-9][0-9]).*(:(\.([0-9])))

Here is the code I'm trying to use (edit: provided full method instead of excerpt):

private static Map<String, String> createColumnMap(String filename) {

    // create a linked hashmap mapping field names to their column types. Use LHM because I'm picky and
    // would prefer to preserve the order
    Map<String, String> columnMap = new LinkedHashMap<String, String>();

    // define the regex patterns
    Pattern columnNamePattern = Pattern.compile(columnNameRegexPattern);

    try {
        Scanner scanner = new Scanner(new FileInputStream(filename));
        while (scanner.hasNextLine()) {
            String line = scanner.nextLine();

            if (line.indexOf("field[") != -1) {
                // get the field name
                Matcher fieldNameMatcher = columnNamePattern.matcher(line);
                String fieldName = null;
                if (fieldNameMatcher.find()) {
                    fieldName = fieldNameMatcher.group(1);
                }

                String columnName = null;
                String columnType = null;
                String columnPrecision = null;
                String columnScale = null;
                //Pattern columnTypePattern = Pattern.compile(".*(float|int|char)([0-9]|[1-9][0-9])");
                Pattern columnTypePattern = Pattern.compile("field\\[(.*)\\] = (float|int|char).*([0-9]|[1-9][0-9]).*(:(\\.([0-9])))");
                Matcher columnTypeMatcher = columnTypePattern.matcher(line);

                System.out.println(columnTypeMatcher.lookingAt());

                if (columnTypeMatcher.lookingAt()) {
                    System.out.println(fieldName + ": " + columnTypeMatcher.groupCount());
                    int count = columnTypeMatcher.groupCount();
                    if (count > 1) {
                        columnName = columnTypeMatcher.group(1);
                        columnType = columnTypeMatcher.group(2);
                    }
                    if (count > 2) {
                        columnScale = columnTypeMatcher.group(3);
                    }
                    if (count >= 6) {
                        columnPrecision = columnTypeMatcher.group(6);
                    }
                }

                int precision = Integer.parseInt(columnPrecision);
                int scale = Integer.parseInt(columnScale);

                if (columnType.equals("int")) {
                    if (precision <= 4) {
                        columnMap.put(fieldName, "INTEGER");
                    } else {
                        columnMap.put(fieldName, "BIGINT");
                    }
                } else if (columnType.equals("float")) {
                    if (columnPrecision==null) {
                        columnMap.put(fieldName,"DECIMAL(8,4)");
                    } else {
                        columnMap.put(fieldName,"DECIMAL(" + columnPrecision + "," + columnScale + ")");
                    }
                } else {
                    columnMap.put(fieldName,"VARCHAR("+columnPrecision+")");
                }
            }

            if (line.indexOf("<DATA>") != -1) {
                scanner.close();
                break;
            }
        }

        scanner.close();
    } catch (FileNotFoundException e) {

    }

    return columnMap;
}

When I get the groupCount from the Matcher object, it says there are 6 groups. However, they aren't matching the text, so I could definitely use some help... can anyone assist?

6
  • 1
    When I run this code against field[bii] = float4:.4f_degree // Galactic Latitude, I get bii, float, 4 and 5 for the values. What is not working? Commented Nov 3, 2011 at 20:04
  • That's what it's supposed to do, but it's throwing IllegalStateException when trying to access ColumnTypeMatcher.group(n) Commented Nov 3, 2011 at 20:08
  • Didn't for me, so what line are you running against and which group was it attempting to get? Commented Nov 3, 2011 at 20:10
  • Okay, now it works, but it's failing against the int2 line. Commented Nov 3, 2011 at 20:11
  • Now it works, but it is failing??? What does that mean??? What works & what is failing? Commented Nov 3, 2011 at 20:13

2 Answers 2

1

It's not entirely clear to me what you're after but I came up with the following pattern and it accepts all of your input examples:

field\\[(.*)\\] = (float|int|char)([1-9][0-9]?)?(:\\.([0-9]))?

using this code:

    String columnName = null;
    String columnType = null;
    String columnPrecision = null;
    String columnScale = null;
    // Pattern columnTypePattern =
    // Pattern.compile(".*(float|int|char)([0-9]|[1-9][0-9])");
    // field\[(.*)\] = (float|int|char)([0-9]|[1-9][0-9]).*(:(\.([0-9])))
    Pattern columnTypePattern = Pattern
            .compile("field\\[(.*)\\] = (float|int|char)([1-9][0-9]?)?(:\\.([0-9]))?");
    Matcher columnTypeMatcher = columnTypePattern.matcher(line);

    boolean match = columnTypeMatcher.lookingAt();
    System.out.println("Match: " + match);

    if (match) {
        int count = columnTypeMatcher.groupCount();
        if (count > 1) {
            columnName = columnTypeMatcher.group(1);
            columnType = columnTypeMatcher.group(2);
        }
        if (count > 2) {
            columnScale = columnTypeMatcher.group(3);
        }
        if (count > 4) {
            columnPrecision = columnTypeMatcher.group(5);
        }
        System.out.println("Name=" + columnName + "; Type=" + columnType + "; Scale=" + columnScale + "; Precision=" + columnPrecision);
    }

I think the problem with your regex was it needed to make the scale and precision optional.

Sign up to request clarification or add additional context in comments.

1 Comment

That did it. I'm still having trouble with the optional stuff.
0
field\[(.*)\] = (float|int|char)([0-9]|[1-9][0-9]).*(:(\.([0-9])))

The .* is overly broad, and there is a lot of redundancy in ([0-9]|[1-9][0-9]), and I think the parenthetical group that starts with : and preceding .* should be optional.

After removing all the ambiguity, I get

field\[([^\]]*)\] = (float|int|char)(0|[1-9][0-9]+)(?:[^:]*(:(\.([0-9]+))))?

1 Comment

I'm still fighting with it, but with that pattern .lookingAt() returns false.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.