How to execute multiple queries in parallel instead of sequentially?

Question

I am querying all my 10 tables to get the user id from them and loading all the user id's into HashSet so that I can have unique user id.

As of now it is sequentially. We go to one table and extract all the user_id from it and load it in hash set and then second and third table and keep going.

    private Set<String> getRandomUsers() {
        Set<String> userList = new HashSet<String>();

        // is there any way to make this parallel?
        for (int table = 0; table < 10; table++) {
            String sql = "select * from testkeyspace.test_table_" + table + ";";

            try {
                SimpleStatement query = new SimpleStatement(sql);
                query.setConsistencyLevel(ConsistencyLevel.QUORUM);
                ResultSet res = session.execute(query);

                Iterator<Row> rows = res.iterator();
                while (rows.hasNext()) {
                    Row r = rows.next();

                    String user_id = r.getString("user_id");
                    userList.add(user_id);
                }
            } catch (Exception e) {
                System.out.println("error= " + ExceptionUtils.getStackTrace(e));
            }
        }

        return userList;
    }

Is there any way to make this multithreaded so that for each table they get the data from my table in parallel? At the end, I need userList hashset which should have all the unique user id from all the 10 tables.

I am working with Cassandra database and connection is made only once so I don't need to create multiple connections.

if you just need better performance, I'd start with changing select to "SELECT DISTINCT user_id" instead of selecting a lot of duplicates and extra columns. — Geoduck
– Geoduck, Commented Feb 28, 2015 at 0:51
there is no distinct as well but yes I can use user_id instead of *. That's a good point. — john
– john, Commented Feb 28, 2015 at 0:54
i believe Cassandra does support DISTINCT keyword. (only in 3.1.1 and on certain keys) — Geoduck
– Geoduck, Commented Feb 28, 2015 at 0:57

Tano · Accepted Answer · 2015-02-28 09:16:47Z

2

If you're able to use Java 8, you could probably do this using parallelStream against a list of the tables, and use a lambda to expand the table name into the corresponding list of unique IDs per table, then join the results together into a single hash.

Without Java 8, I'd use Google Guava's listenable futures and an executor service something like this:

public static Set<String> fetchFromTable(int table) {
    String sql = "select * from testkeyspace.test_table_" + table + ";";
    Set<String> result = new HashSet<String>();
    // populate result with your SQL statements
    // ...
    return result;
}

public static Set<String> fetchFromAllTables() throws InterruptedException, ExecutionException {
    // Create a ListeningExecutorService (Guava) by wrapping a 
    // normal ExecutorService (Java) 
    ListeningExecutorService executor = 
            MoreExecutors.listeningDecorator(Executors.newCachedThreadPool());

    List<ListenableFuture<Set<String>>> list = 
            new ArrayList<ListenableFuture<Set<String>>>(); 
    // For each table, create an independent thread that will 
    // query just that table and return a set of user IDs from it
    for (int i = 0; i < 10; i++) {
        final int table = i;
        ListenableFuture<Set<String>> future = executor.submit(new Callable<Set<String>>() {
            public Set<String> call() throws Exception {
                return fetchFromTable(table);
            }
        });
        // Add the future to the list
        list.add(future);
    }
    // We want to know when ALL the threads have completed, 
    // so we use a Guava function to turn a list of ListenableFutures
    // into a single ListenableFuture
    ListenableFuture<List<Set<String>>> combinedFutures = Futures.allAsList(list);

    // The get on the combined ListenableFuture will now block until 
    // ALL the individual threads have completed work.
    List<Set<String>> tableSets = combinedFutures.get();

    // Now all we have to do is combine the individual sets into a
    // single result
    Set<String> userList = new HashSet<String>();
    for (Set<String> tableSet: tableSets) {
        userList.addAll(tableSet);
    }

    return userList;
}

The use of Executors and Futures is all core Java. The only thing Guava does is let me turn Futures into ListenableFutures. See here for a discussion of why the latter is better.

There are probably still ways to improve the parallelism of this approach, but if the bulk of your time is being spent in waiting for the DB to respond or in processing network traffic, then this approach may help.

edited Feb 28, 2015 at 9:16

answered Feb 28, 2015 at 0:52

Tano

29.4k8 gold badges66 silver badges92 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

john Over a year ago

I am working with java 7 as of now. Can you provide an example with Java 7 if possible?

john Over a year ago

This looks awesome. I have never used Guava Executors before so something new I have learned today. Could you add little bit explanation of the code as well? It will help me to understand better, just for my learning experience.

Tano Over a year ago

I've updated a bit more, but if the Executor stuff isn't clear to you, you should take a look through some basic documentation on Java concurrency.

Geoduck · Accepted Answer · 2015-02-28 00:39:49Z

0

You may be able to make it multithreaded but with the overhead of thread creation and multiple connections, you probably won't have significant benefit. Instead, use a UNION statement in mysql and get them all at once. Let the database engine figure out how to get them all efficiently:

String sql = "select user_id from testkeyspace.test_table_1 UNION select  user_id from testkeyspace.test_table_2 UNION select user_id from testkeyspace.test_table_3 ...."

Of course, you'll have to programatically create the sql query string. Don't actually put "...." in your query.

answered Feb 28, 2015 at 0:39

Geoduck

9,0212 gold badges29 silver badges27 bronze badges

4 Comments

john Over a year ago

I am working with Cassandra not mysql and I doubt this will work. I can edit my question to add this info.

Geoduck Over a year ago

you are right. Can only select from 1 table at a time in cassandra. maybe rethink WHY you have 10 different tables.

scottb Over a year ago

Normalization happens

Vipin Over a year ago

I don't think in this scenario we should think about thread creation overhead ( < 1ms ) and SQL connection. Removing Union may reduce time. And results in parallel will also give time benefit.

Collectives™ on Stack Overflow

How to execute multiple queries in parallel instead of sequentially?

2 Answers 2

3 Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related