2

I need to do a lot of different preprocessing of some text data, the preprocessing consists of several simple regex functions all written in class Filters that all take in a String and returns the formatted String. Up until now, in the different classes that needed some preprocessing, I created a new function where I had a bunch of calls to Filters, they would look something like this:

private static String filter(String text) {
    text = Filters.removeURL(text);
    text = Filters.removeEmoticons(text);
    text = Filters.removeRepeatedWhitespace(text);
    ....
    return text;
}

Since this is very repetitive (I would call about 90% same functions, but 2-3 would be different for each class), I wonder if there are some better ways of doing this, in Python you can for example put function in a list and iterate over that, calling each function, I realize this is not possible in Java, so what is the best way of doing this in Java?

I was thinking of maybe defining an enum with a value for each function and then call a main function in Filters with array of enums with the functions I want to run, something like this:

enum Filter {
    REMOVE_URL, REMOVE_EMOTICONS, REMOVE_REPEATED_WHITESPACE
}


public static String filter(String text, Filter... filters) {
    for(Filter filter: filters) {
        switch (filter) {
            case REMOVE_URL:
                text = removeURL(text);
                break;

            case REMOVE_EMOTICONS:
                text = removeEmoticons(text);
                break;
        }
    }

    return text;
}

And then instead of defining functions like shown at the top, I could instead simply call:

filter("some text", Filter.REMOVE_URL, Filter.REMOVE_EMOTICONS, Filter.REMOVE_REPEATED_WHITESPACE);

Are there any better ways to go about this?

6
  • 4
    "in Python you can for example put function in a list and iterate over that, calling each function, I realize this is not possible in Java" Welcome to the shiny new world of Java 8, where you can put functions into list and iterate over them. Commented Feb 10, 2016 at 22:57
  • 1
    Also, your enum literals can have methods, i.e. you could give each literal a different apply or filter method. Commented Feb 10, 2016 at 22:59
  • 1
    Pre Java 8, you can define a Filter interface, with an apply() method, and have a list of these instead. Commented Feb 10, 2016 at 23:02
  • 2
    Look up anonymous classes - you can declare an implementation on one or two lines. Commented Feb 10, 2016 at 23:06
  • 1
    Plus they are easier to test, they are easier to remove, you don't have to change your main program when you add on, etc. Plus it doesn't seem they are oneliners, since you're delegating to methods already. Commented Feb 10, 2016 at 23:08

3 Answers 3

3

Given that you already implemented your Filters utility class you can easily define a list of filter functions

List<Function<String,String>> filterList = new ArrayList<>();
filterList.add(Filters::removeUrl);
filterList.add(Filters::removeRepeatedWhitespace);
...

and then evaluate:

 String text = ...
 for (Function<String,String> f : filterList)
      text = f.apply(text);

A variation of this, even easier to handle:

Define

public static String filter(String text, Function<String,String>... filters) 
{
    for (Function<String,String> f : filters)
        text = f.apply(text);
    return text;
}

and then use

String text = ...
text = filter(text, Filters::removeUrl, Filters::removeRepeatedWhitespace);
Sign up to request clarification or add additional context in comments.

3 Comments

This looks very promising. Is it possible to define this in an array instead of List?
How? I get error with Function<String,String>[] functions = new Function[]{Filters::removeEmoticons}; ?
This is perfect. Thank you
3

You could do this in Java 8 pretty easily as @tobias_k said, but even without that you could do something like this:

public class FunctionExample {

    public interface FilterFunction {
        String apply(String text);
    }

    public static class RemoveSpaces implements  FilterFunction {
        public String apply(String text) {
            return text.replaceAll("\\s+", "");
        }
    }

    public static class LowerCase implements  FilterFunction {
        public String apply(String text) {
            return text.toLowerCase();
        }
    }

    static String filter(String text, FilterFunction...filters) {
        for (FilterFunction fn : filters) {
            text = fn.apply(text);
        }
        return text;
    }

    static FilterFunction LOWERCASE_FILTER = new LowerCase();
    static FilterFunction REMOVE_SPACES_FILTER = new RemoveSpaces();


    public static void main(String[] args) {
        String s = "Some Text";

        System.out.println(filter(s, LOWERCASE_FILTER, REMOVE_SPACES_FILTER));
    }
}

Comments

2

Another way would be to add a method to your enum Filter and implement that method for each of the enum literals. This will also work with earlier versions of Java. This is closest to your current code, and has the effect that you have a defined number of possible filters.

enum Filter {
    TRIM {
        public String apply(String s) {
            return s.trim();
        }
    }, 
    UPPERCASE {
        public String apply(String s) {
            return s.toUpperCase();
        }
    };
    public abstract String apply(String s);
}

public static String applyAll(String s, Filter... filters) {
    for (Filter f : filters) {
        s = f.apply(s);
    }
    return s;
}

public static void main(String[] args) {
    String s = "   Hello World   ";
    System.out.println(applyAll(s, Filter.TRIM, Filter.UPPERCASE));
}

However, if you are using Java 8 you can make your code much more flexible by just using a list of Function<String, String> instead. If you don't like writing Function<String, String> all the time, you could also define your own interface, extending it:

interface Filter extends Function<String, String> {}

You can then define those functions in different ways: With method references, single- and multi-line lambda expressions, anonymous classes, or construct them from other functions:

Filter TRIM = String::trim; // method reference
Filter UPPERCASE = s -> s.toUpperCase(); // one-line lambda
Filter DO_STUFF = (String s) -> { // multi-line lambda
    // do more complex stuff
    return s + s;
};
Filter MORE_STUFF = new Filter() { // anonymous inner class
    // in case you need internal state
    public String apply(String s) {
        // even more complex calculations
        return s.replace("foo", "bar");
    };
};
Function<String, String> TRIM_UPPER = TRIM.andThen(UPPERCASE); // chain functions

You can then pass those to the applyAll function just as the enums and apply them one after the other in a loop.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.