4

I am new to Java 8 and trying a requirement on Streams. I have a csv file with thousands of recods my csv format is

DepId,GrpId,EmpId,DepLocation,NoofEmployees,EmpType
===
D100,CB,244340,USA,1000,Contract
D101,CB,543126,USA,1900,Permanent
D101,CB,356147,USA,1800,Contract
D100,DB,244896,HK,500,SemiContract
D100,DB,543378,HK,100,Permanent

My requirement is to filter the records with two conditions a) EmpId starts with "244" or EmpId starts with "543" b) EmpType is "Contract" and "Permanent"

I tried below

 try (Stream<String> stream = Files.lines(Paths.get(fileAbsolutePath))) {
    list = stream                
        .filter(line -> line.contains("244") || line.contains("543"))
        .collect(Collectors.toList());
     }

It is filtering the employees based on 244 and 543 but my concern is since i am using contains it might fetch other data also i.e. it will fetch the data not only from EmpId column but also from other columns(other columns might also have data starting with these numbers)

similarly to incorporate EmpType as i am reading line by line there is no way for me to enforce that EmpType should be in "Permanent" and "Contract"

Am i missing any advanced options??

1
  • Using regex is the neatest way to do this. Otherwise, break each String (representing each line) with ,, then take the index 2 substring to do comparison. Commented Jul 10, 2018 at 1:11

2 Answers 2

3

You can do it like so,

Pattern comma = Pattern.compile(",");
Pattern empNum = Pattern.compile("(244|543)\\d+");
Pattern empType = Pattern.compile("(Contract|Permanent)");
try (Stream<String> stream = Files.lines(Paths.get("C:\\data\\sample.txt"))) {
    List<String> result = stream.skip(2).map(l -> comma.split(l))
            .filter(s -> empNum.matcher(s[2]).matches())
            .filter(s -> empType.matcher(s[5]).matches())
            .map(s -> Arrays.stream(s).collect(Collectors.joining(",")))
            .collect(Collectors.toList());
    System.out.println(result);
} catch (IOException e) {
    e.printStackTrace();
}

First read the file and skip 2 header lines. Then split it using the , character. Filter it out using EmpId and EmpType. Next, merge the tokens back again to form the line, and finally collect each line into a List.

Sign up to request clarification or add additional context in comments.

1 Comment

Perfect.. Thank you :)
1

The elegant way is regex, which I would skip for now. The less elegant way using Stream API is as follows:

list = stream.filter(line -> {
    String empId = line.split(",")[2];
    return empId.startsWith("244") || empId.startsWith("543");
}.collect(Collectors.toList());

The shorter way with Stream API (pointed out by shmosel), is to use a mini regex.

list = stream.filter(line -> line.split(",")[2].matches("(244|543).*")
             .collect(Collectors.toList());

2 Comments

Or line -> line.split(",")[2].matches("(244|543).*")
@shmosel or line -> line.matches("([^,]*,){2}(244|543)(,.*)?")

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.