0

I am trying to modify a list of strings to keep only the substring of each of them. Here is what I'm trying to do:

List<String> paychecks = new ArrayList<>();
paychecks.add("Paycheck_Box_EMP_61299_451");
paychecks.add("Paycheck_Box_EMP_5512_221");
paychecks.add("Paycheck_Box_EMP_99993_881");
paychecks.add("Paycheck_Box_EMP_831_141");

paychecks.replaceAll(paycheck -> paycheck.subString("insert here"))

I've tried to write something where it says "insert here" but it throws me errors or only red lines appear, but basically I want to take the substring of the paycheck ID after EMP_ and before the next _ . So ideally it should be like this:

[61299, 5512, 99993, 831]

Update (second attempt):

paychecks.forEach(paycheck -> 
                      paycheck.replaceAll(paycheck, paycheck.substring(paycheck.indexOf("Paycheck_Box_"),
                         paycheck.indexOf("Paycheck_Box_" + "\\[(.*?)\\]" + "_")))))

Error thrown:

    java.lang.StringIndexOutOfBoundsException: begin 0, end -1, length 26   
    at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319)
    at java.base/java.lang.String.substring(String.java:1874)
3
  • How married are you to using subString()? It'd probably be cleaner just to use replaceAll() and a regexp pattern. Commented Mar 16, 2021 at 19:07
  • I'm not too married haha, but what is the regex pattern to ignore everything between 2 things? Commented Mar 16, 2021 at 19:08
  • 1
    Actually, I find it more readable to break it into two calls, one to trim the front off, and one to trim the back off, e.g. people.replaceAll(person -> person.replaceAll("^.*EMP_", "").replaceAll("_.*$", "")); Commented Mar 16, 2021 at 19:15

5 Answers 5

2

Personally, I'm bad at writing, and even worse at reading regular expressions, so rather than trying to make the replacement efficient, I'd prioritize human readability.

Unless I'm looking at modifying a really large set of data, I'd do something like:

List<String> paychecks = new ArrayList<>();
paychecks.add("Paycheck_Box_EMP_61299_451");
paychecks.add("Paycheck_Box_EMP_5512_221");
paychecks.add("Paycheck_Box_EMP_99993_881");
paychecks.add("Paycheck_Box_EMP_831_141");
    

paychecks.replaceAll(person -> person
                                .replaceFirst("^Paycheck_Box_EMP_", "") // remove prefix
                                .replaceFirst("_.*$", ""));             // remove suffix

    
System.out.println(paychecks);      // [61299, 5512, 99993, 831]

You could further refine the prefix and suffix regexp, depending on how exactly you know what the format is going to be.

For instance, in your updated question, the prefix is always constant, so you could use a simple replace() call instead. Likewise, if you know the suffix is always numberic, you could use [0-9]* instead of .*.

Sign up to request clarification or add additional context in comments.

2 Comments

Oh are you basically removing everything before EMP and everything after the second _?
Or in a single replacement operation, paychecks.replaceAll(p -> p.replaceFirst("^Paycheck_Box_EMP_(\\d+)_.*", "$1")); You could also think about whether matching the prefix is important here and decide to just extract the first number: paychecks.replaceAll(p -> p.replaceFirst(".*?(\\d+).*", "$1"));
2

If I understand the task correctly, you want to have the ##### from Paycheck_Box_EMP_#####_451.

So you do not want to replace something, what you want is to extract something, right?

This should work like this:

List<String> paychecks = new ArrayList<>();
paychecks.add( "Paycheck_Box_EMP_61299_451" );
paychecks.add( "Paycheck_Box_EMP_5512_221" );
paychecks.add( "Paycheck_Box_EMP_99993_881" );
paychecks.add( "Paycheck_Box_EMP_831_141" );

final var pattern = Pattern.compile( "Paycheck_Box_EMP_(\\d{3,5})_\\d{3}" );
paychecks = paychecks.stream()
  .map( paycheck -> pattern.matcher( paycheck ) )
  .filter( matcher -> matcher.find() )
  .map( matcher -> group( 1 ) )
  .collect( Collectors.toList() );

Or when you insist in using List.replaceAll():

List<String> paychecks = new ArrayList<>();
paychecks.add( "Paycheck_Box_EMP_61299_451" );
paychecks.add( "Paycheck_Box_EMP_5512_221" );
paychecks.add( "Paycheck_Box_EMP_99993_881" );
paychecks.add( "Paycheck_Box_EMP_831_141" );

final var pattern = Pattern.compile( "Paycheck_Box_EMP_(\\d{3,5})_\\d{3}" );
paychecks.replaceAll( paycheck -> 
{
  var matcher = pattern.matcher( paycheck );
  matcher.find();
  return matcher.group( 1 );
} );

Fixed the Java based on Alex Rudenko's comments.

4 Comments

This does not work, Exception in thread "main" java.lang.IllegalStateException: No match found is thrown. You should get the match results and map them to String via flatMap: paychecks = paychecks.stream().flatMap( paycheck -> pattern.matcher( paycheck ).results().map(mr -> mr.group(1)) ).collect(Collectors.toList() );
The problem is with Java code, not with the regexp which is fine, I guess, Matcher::find or Matcher::matches have to be invoked before calling Matcher::group
Got you! Common mistake here …
Reusing a prepared Pattern is a nice optimization, still, its use doesn’t have to be that complicated. paychecks.replaceAll(paycheck -> pattern.matcher(paycheck).replaceFirst("$1")); will extract the first group.
2

You was almost right in your second attempt, the easiest way to do this:

String prefix = "Paycheck_Box_EMP_"; // or use 17 instead of prefix.length()

paychecks.replaceAll(paycheck ->
        paycheck.replaceAll(paycheck, paycheck.substring(prefix.length(), paycheck.lastIndexOf('_'))));

Comments

1

You can use the regex, Paycheck_Box_EMP_(\d+).* and replace the string with group(1).

Demo:

import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;

public class Main {
    public static void main(String[] args) {
        List<String> paychecks = new ArrayList<>();
        paychecks.add("Paycheck_Box_EMP_61299_451");
        paychecks.add("Paycheck_Box_EMP_5512_221");
        paychecks.add("Paycheck_Box_EMP_99993_881");
        paychecks.add("Paycheck_Box_EMP_831_141");

        List<String> substrs = 
                paychecks.stream()
                        .map(s -> s.replaceAll("Paycheck_Box_EMP_(\\d+).*", "$1"))
                        .collect(Collectors.toList());

        System.out.println(substrs);
    }
}

Output:

[61299, 5512, 99993, 831]

Explanation of the regex at regex101:

enter image description here

Comments

1

Try in this way:

  • Step 1 : Eliminate the prefix of targeted sub-string e.i Paycheck_Box_EMP_ 61299_451 and then temp result sub-string : 61299_451

  • Step 2 : Eliminate the suffix of targeted sub-string e.i 61299 _451 and final result of sub-string will be 61299

    paychecks.replaceAll(x-> x .replaceFirst("^Paycheck_Box_EMP_", "") .replaceFirst("_.*$", ""));

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.