2

I have the following issue: I am trying to parse a .csv file in java, and store specifically 3 columns of it in a 2 Dimensional array. The Code for the method looks like this:

    public static void parseFile(String filename) throws IOException{
    FileReader readFile = new FileReader(filename); 
    BufferedReader buffer = new BufferedReader(readFile);
    String line; 
    String[][] result = new String[10000][3];
    String[] b = new String[6];

    for(int i = 0; i<10000; i++){
            while((line = buffer.readLine()) != null){
                b = line.split(";",6);
                System.out.println("ID: "+b[0]+" Title: "+b[3]+ "Description: "+b[4]); // Here is where the outofbounds exception occurs...


                result[i][0] = b[0];
                result[i][1] = b[3];    
                result[i][2] = b[4];
                }
            }
            buffer.close();

}

I feel like I have to specify this: the .csv file is HUGE. It has 32 columns, and (almost) 10.000 entries (!). When Parsing, I keep getting the following:

    XXXXX CHUNKS OF SUCCESFULLY EXTRACTED CODE
    Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException:3
    at ParseCSV.parseFile(ParseCSV.java:24)
    at ParseCSV.main(ParseCSV.java:41)

However, I realized that SOME of the stuff in the file has a strange format e.g. some of the texts inside it for instance have newlines in them, but there is no newline character involved in any way. However, if I delete those blank lines manually, the output generated (before the error message is prompted) adds the stuff to the array up until the next blank line ... Does anyone have an idea how to fix this? Any help would be greately appreciated...

4 Answers 4

2

Your first problem is that you probably have at least one blank line in your csv file. You need to replace:

b = line.split(";", 6);

with

b = line.split(";");
if(b.length() < 5){
   System.err.println("Warning, line has only " + b.length() + 
                      "entries, so skipping it:\n" + line);
   continue;
} 

If your input can legitimately have new lines or embedded semi-colons within your entries, that is a more complex parsing problem, and you are probably better off using a third-party parsing library, as there are several very good ones.

If your input is not supposed to have new lines in it, the problem probably is \r. Windows uses \r\n to represent a new line, while most other systems just use \n. If multiple people/programs edited your text file, it is entirely possible to end up with stray \r by themselves, which are not easily handled by most parsers.

A way to easily check if that's your problem is before you split your line, do

line = line.replace("\r","").

If this is a process you are repeating many times, you might need to consider using a Scanner (or library) instead to get more efficient text processing. Otherwise, you can make do with this.

Sign up to request clarification or add additional context in comments.

7 Comments

New lines in CSV are possible. For example these examples are correct CSV (copied from here en.wikipedia.org/wiki/…) 1997,Ford,E350,"Go get one now they are going fast" 1997,Ford,E350,"Super, luxurious truck" Also think, what you will do if you will have ; or " as column value.
That's a good point, I was assuming he had control over his input, as he mentioned that no new line characters were embedded in the input, however as a generic solution, it should be noted that it only works when you don't have any extra new lines or semi-colons....
Thank you for your feedback. Your solution works, the ouput generates the entire .csv file and prompts succesfully err mess to console upon running!
... What type of REGEX would I need to use in the split() in order to ignore the ; that are inside the strings whil keeping the split on the ; between the cells of the .csv file... ?
Unfortunately, once you have to deal with the escaping rules, you have a context-sensitive parsing problem, and those are not very well handled with reg-exes. If you have to deal with them, your best bet is using a CSV parser, such as the Commons parser mentioned below.
|
0

When you have new lines in your CSV file, after this line while((line = buffer.readLine()) != null){ variable line will have not a CSV line but just some text without ;

For example, if you have file

column1;column2;column
3 value

after first iteration variable line will have

column1;column2;column

after second iteration it will have 3 value

when you call "3 value".split(";",6) it will return array with one element. and later when you call b[3] it will throw exception.

CSV format has many small things, to implement which you will spend a lot of time. This is a good article about all possible csv examples http://en.wikipedia.org/wiki/Comma-separated_values#Basic_rules_and_examples

I would recommend to you some ready CSV parsers like this

https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVParser.html

Comments

0

String's split(pattern, limit) method returns an array sized to the number of tokens found up to the the number specified by the limit parameter. Limit is the maximum, not the minimum number of array elements returned.

"1,2,3" split with (",", 6) with return an array of 3 elements: "1", "2" and "3".

"1,2,3,4,5,6,7" will return 6 elements: "1", "2", "3", "4", "5" and ""6,7" The last element is goofy because the split method stopped splitting after 5 and returned the rest of the source string as the sixth element.

An empty line is represented as an empty string (""). Splitting "" will return an array of 1 element, the empty string.

In your case, the string array created here

String[] b = new String[6];

and assigned to b is replaced by the the array returned by

b = line.split(";",6);

and meets it's ultimate fate at the hands of the garbage collector unseen and unloved.

Worse, in the case of the empty lines, it's replaced by a one element array, so

System.out.println("ID: "+b[0]+" Title: "+b[3]+ "Description: "+b[4]);

blows up when trying to access b[3].

Suggested solution is to either

while((line = buffer.readLine()) != null){
    if (line.length() != 0)
    {
            b = line.split(";",6);
            System.out.println("ID: "+b[0]+" Title: "+b[3]+ "Description: "+b[4]); // Here is where the outofbounds exception occurs...
        ...
    }

or (better because the previous could trip over a malformed line)

while((line = buffer.readLine()) != null){
    b = line.split(";",6);
    if (b.length() == 6)
    {
            System.out.println("ID: "+b[0]+" Title: "+b[3]+ "Description: "+b[4]); // Here is where the outofbounds exception occurs...
        ...
    }

You might also want to think about the for loop around the while. I don't think it's doing you any good.

 while((line = buffer.readLine()) != null)

is going to read every line in the file, so

for(int i = 0; i<10000; i++){
        while((line = buffer.readLine()) != null){

is going to read every line in the file the first time. Then it going to have 9999 attempts to read the file, find nothing new, and exit the while loop.

You are not protected from reading more than 10000 elements because the while loop because the while loop will read a 10001th element and overrun your array if there are more than 10000 lines in the file. Look into replacing the big array with an arraylist or vector as they will size to fit your file.

Comments

0

Please check b.length>0 before accessing b[].

1 Comment

Are you sure it be solved by just checking that if the Array Size is greater than 0 . he is getting java.lang.ArrayIndexOutOfBoundsException:3 . You might reconsider your answer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.