4

I want to Parse the lines of a file Using parsingMethod

test.csv

 Frank George,Henry,Mary / New York,123456
,Beta Charli,"Delta,Delta Echo
", 25/11/1964, 15/12/1964,"40,000,000.00",0.0975,2,"King, Lincoln ",Alpha

This is the way i read line

 public static void main(String[] args) throws Exception {


        File file = new File("C:\\Users\\test.csv");
        BufferedReader reader = new BufferedReader(new FileReader(file));   
        String line2;
        while ((line2= reader.readLine()) !=null) {
            String[] tab = parsingMethod(line2, ",");
            for (String i : tab) {
                System.out.println( i );
            }
        }


    }

    public static String[] parsingMethod(String line,String parser) {

        List<String> liste = new LinkedList<String>();
        String patternString ="(([^\"][^"+parser+ "]*)|\"([^\"]*)\")" +parser+"?";
        Pattern pattern = Pattern.compile(patternString);
        Matcher matcher =pattern.matcher(line);

        while (matcher.find()) {
            if(matcher.group(2) != null){
                liste.add(matcher.group(2).replace("\n","").trim());
            }else if(matcher.group(3) != null){
                liste.add(matcher.group(3).replace("\n","").trim());
            }       
        }

        String[] result = new String[liste.size()];
        return liste.toArray(result);
    }
}

Output :

Frank George
Henry
Mary / New York
123456

Beta Charli
Delta
Delta Echo
"
25/11/1964
15/12/1964
40,000,000.00
0.0975
2
King
Lincoln
"
Alpha
Delta
Delta Echo

I want to remove this " , Can any one help me to improve my Pattern.


Expected output

Frank George
Henry
Mary / New York
123456
Beta Charli
Delta
Delta Echo
25/11/1964
15/12/1964
40,000,000.00
0.0975
2
King
Lincoln
Alpha
Delta
Delta Echo

Output for line 3

25/11/1964
15/12/1964

40
000
000.00


0.0975
2

King
Lincoln
12
  • 2
    How does this compile? you are assigning an array of strigns to patternString. String patternString = "(([^\"][^","]*)|\"([^\"]*)\")","?"; Commented May 15, 2013 at 8:54
  • This code not compile, you have an error on: String patternString = "(([^\"][^","]*)|\"([^\"]*)\")","?"; Commented May 15, 2013 at 9:09
  • @Joan : I am not good in pattern String. But this code compiles. I copied the output from console. Commented May 15, 2013 at 9:41
  • @RicardoCacheira : I am not good in pattern String. But this code compiles. I copied the output from console Commented May 15, 2013 at 9:42
  • As anana says, you must scape the double backslashes, but even if you do that, the program will not return an output. I tryied. Commented May 15, 2013 at 9:50

3 Answers 3

2

Your code didn't compile properly but that was caused by some of the " not being escaped.

But this should do the trick:

String patternString = "(?:^.,|)([^\"]*?|\".*?\")(?:,|$)";
Pattern pattern = Pattern.compile(patternString, Pattern.MULTILINE);

(?:^.,|) is a non capturing group that matches a single character at the start of the line

([^\"]*?|\".*?\") is a capturing group that either matches everything but " OR anything in between " "

(?:,|$) is a non capturing group that matches a end of the line or a comma.

Note: ^ and $ only work as stated when the pattern is compiled with the Pattern.MULTILINE flag

Sign up to request clarification or add additional context in comments.

6 Comments

I am not in Patterns. Now i have corrected my code()see question. I put your PatternString in my code. It gives me Error ` java.lang.IndexOutOfBoundsException: No group 2`
This pattern returns it one group at the time so there is no group 2. To check if there is a group to check out matcher.groupCount()
And if you really want to have multiple group use this: String patternString = "(?:(?:^.,|)([^\"]*?|\".*?\")(?:,|$))+"; (I didn't test this one but i should work)
@B8rede : Not working same erroe java.lang.IndexOutOfBoundsException: No group 2
Use if(matcher.groupCount() >= 2){ liste.add(matcher.group(2).replace("\n","").trim()); }else if(matcher.groupCount() >= 3){ liste.add(matcher.group(3).replace("\n","").trim()); } it will check if there is a group 2 and if it's there use it. Same for 3.
|
1

I can't reproduce your result but I'm thinking maybe you want to leave the quotes out of the second captured group, like this:

"(([^\"][^"+parser+ "]*)|\"([^\"]*))\"" +parser+"?"

Edit: Sorry, this won't work. Maybe you want to let any number of ^\" in the first group as well, like this: (([^,\"]*)|\"([^\"]*)\"),?

6 Comments

I'm sorry but I don't understand what you're saying. If my solution didn't work I'm sure someone else will bother to spoonfeed it to you.
I request you to check my code. with the patternString in my code i cannot Parse " . I want to remove " also . And with your PatternString my Output is like , , ,000.00 ,
The problem is I can't reproduce your result. Can you output line3 as it is passed to parserMethod? What if you change it to this (([^,\"]*)|\"([^\"]*)\"),? ?
That makes no sense to me. Try to output line 3 so I can reproduce your results.
I don't mean the output, but the input, the String that is given as an argument to parseMethod().
|
1

As i can see the lines are related so try this:

    public static void main(String[] args) throws Exception {

        File file = new File("C:\\Users\\test.csv");
        BufferedReader reader = new BufferedReader(new FileReader(file));
        StringBuilder line = new StringBuilder();
        String lineRead;
        while ((lineRead = reader.readLine()) != null) {
            line.append(lineRead);
        }
        String[] tab = parsingMethod(line.toString());
        for (String i : tab) {
            System.out.println(i);
        }


    }

    public static String[] parsingMethod(String line) {

        List<String> liste = new LinkedList<String>();
        String patternString = "(([^\"][^,]*)|\"([^\"]*)\"),?";
        Pattern pattern = Pattern.compile(patternString);
        Matcher matcher = pattern.matcher(line);

        while (matcher.find()) {
            if (matcher.group(2) != null) {
                liste.add(matcher.group(2).replace("\n", "").trim());
            } else if (matcher.group(3) != null) {
                liste.add(matcher.group(3).replace("\n", "").trim());
            }
        }

        String[] result = new String[liste.size()];
        return liste.toArray(result);
    }

Ouput:

Frank George
Henry
Mary / New York
123456
Beta Charli
Delta,Delta Echo
25/11/1964
15/12/1964
40,000,000.00
0.0975
2
King, Lincoln
Alpha

as Delta, Delta Echo is in a quotation this should appear in the same line ! like as King, Lincoln

5 Comments

Your PatternString and my PatternString is Same . this is a output only when you consider file content as a String. You have to read the file line by line and send it to parserMethod.
Sorry mate but you told that the quotation mark on the beggining of line 3 id the close to the quotation started at line 2, so it tells me that the lines are related. If not you don't know what you want !!!
Yes, the beginning of line 3 id the close to the quotation started at line 2. but it's line 3 and i have to parse each line separately.That is why I reading each line and parsing it using the method. Sorry for not being clear. Hope now you understand and will help me to solve this.
It's not making sense to me, but ok ! Just to make sense, can you tell what is that lines and what you want to do? I'll try to help
what have produced lines in test.csv ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.