0

I have to identify lines from a CSV file that match a certain search criteria. The data in the CSV file looks somethin like this:

Wilbur Smith,Elephant Song,McMillain,1992,1
Wilbur Smith,Birds of Prey,McMillain,1992,1
George Orwell,Animal Farm,Secker & Warburg,1945,1
George Orwell,1984,Secker & Warburg,1949,1

The search criteria is like this:

Orwell,,,,
,Elephant,,,

The first line identifies 2 lines, the second 1 line. I'm currently reading the file as follows, but not using the criteria above.

br = new BufferedReader(new FileReader(csvFile));
while ((line = br.readLine()) != null) {
    String[] dataItems = line.split(cvsSplitBy);

    if (dataItems[0].contains(title) && dataItems[1].contains(author) && dataItems[2].contains(publisher)) {
        bk[i++] = line;
        if (bk.length > 4) {break;}
    }
}

I am adding to a fixed size array. How can I use the criteria as a regular expression to identify a line?

6
  • 2
    You should use a CSV parser, so it can handle embedded commas. Then you should simply use contains() to see if a column value contains the given text, like you're doing now. No need for regex. Commented May 2, 2017 at 22:34
  • I am no java expert. Would you be satisfied with an answer with just a regex to be used? Or do you need the java code for using it, too? Commented May 2, 2017 at 22:37
  • Why do you think you need regex? Commented May 2, 2017 at 22:41
  • Most of the sub strings are empty. So if a publisher for example is empty then a search on a publisher should not occur. I am no expert in Java, but wouldn't a regex be better than using contains | contains | .. etc. Commented May 2, 2017 at 22:45
  • You should just extract this data into classes and then use .contains() or similar methods to search instead of trying to use Regex. This problem isn't the right candidate for using regex IMO. Commented May 2, 2017 at 22:51

1 Answer 1

1

Seems like I'm in a minority here :) but here is a version using a regex in case you are interested.

BufferedReader br = null;

String[] searches = new String[]{
            ",Animal Farm,Secker & Warburg,,",
            ",,Secker & Warburg,,",
            "George Orwell,,,,1",
            "Wilbur Smith,,,,",
            ",,,,1",
            "random,,,,1",
            "WILBUR SMITH,Birds of PREY,mcmillain,1992,1",
            ",,,,"
};

try {
    br = new BufferedReader(new FileReader("file.txt"));
    String line = null;

    // to store results of matches for easier output
    String[] matchResult = new String[searches.length];

    while ((line = br.readLine()) != null) {
        // go through all searches
        for (int i = 0; i < searches.length; i++) {

            /*
             *  replace all commas that don't have letters or numbers on both 
             *  sides with a new regex to match all characters
             */
            String searchPattern = searches[i].replaceAll("(?<![a-zA-z0-9])\\,|\\,(?![a-zA-z0-9\\,])", ".*,.*");

            // do the match on the line
            Matcher m = Pattern.compile("^" + searchPattern + "$", Pattern.CASE_INSENSITIVE).matcher(line);

            // store the result
            matchResult[i] = m.matches() == true ? "matches" : "no match";
        }

        System.out.println(String.format("%-50s %-10s %-10s %-10s %-10s %-10s %-10s %-10s", line, 
                    matchResult[0], matchResult[1], matchResult[2], matchResult[3], matchResult[4], matchResult[5], matchResult[6], matchResult[7]));
    }
} catch (Exception e) {
        e.printStackTrace();
} finally {
    try {
        br.close();
    } catch (IOException e) {}
}

Output

Wilbur Smith,Elephant Song,McMillain,1992,1        no match   no match   no match   matches    matches    no match   no match  
Wilbur Smith,Birds of Prey,McMillain,1992,1        no match   no match   no match   matches    matches    no match   matches   
George Orwell,Animal Farm,Secker & Warburg,1945,1  matches    matches    matches    no match   matches    no match   no match  
George Orwell,1984,Secker & Warburg,1949,1         no match   matches    matches    no match   matches    no match   no match 
Sign up to request clarification or add additional context in comments.

1 Comment

thanks, whilst I appreciate what the others were saying wrt performance & complexity, I simply wanted to see the other side of the coin, aka regex usage.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.