1

I am working on an algorithm, using SQL and JAVA, concerning big datasets. In SQL I have a table with all the data and I want to use as much of SQL queries as possible before loading it into JAVA.

I generate random datasets (in Java), consisting exclusively of integers between 1 and 40001 and then insert them into a MySQL table. The rows can be of different lengths, with a maximum of 30 items/records (this includes the ID). So normally the amount of columns is 30 (so COL1, COL2, COL3,......COL30) but this amount will also be random at some point

What I want to do is count the occurrence of every distinct item in a table/dataset and put them in a new table with their count. This however is tricky since I want to count it over the entire table, not just one column. How do I do this?

To specify:

Take this table for example (this is a very small one in comparison with my usual tables):

ID | COL1 | COL2 | COL3 | COL4 | COL5 |
---------------------------------------
1  |   8  |  35  |  42  |  12  |  27  |
2  |  22  |  42  |  35  |  8   | NULL |
3  |  18  |  22  |   8  | NULL | NULL |
4  |  42  |  12  |  27  |  35  |  8   |
5  |  18  |  27  |  12  |  22  | NULL |

What I want to extract from this table is this:

Item | Count
-------------
8    |  3
35   |  3
40   |  1
12   |  3
27   |  3
22   |  3
42   |  2 
43   |  1
18   |  2

It is also the case that an item can't be in the same row more than once, if that helps.

Can anyone help me? Or can it just simply not be done in SQL? Would it be better to do this in JAVA, performance-wise?

Thanks in advance!

1 Answer 1

4

You can do this by unpivoting the data and then aggregating:

select col, count(*)
from (select col1 as col from t union all
      select col2 from t union all
      . . . 
      select col30 from t
     ) t
group by col;

If you don't have a known set of columns, then you will need to use dynamic SQL.

Sign up to request clarification or add additional context in comments.

1 Comment

This is great, works perfectly. I am considering making the amount of columns nonchangeable since it isn't significant to my algorithm. So dynamic queries won't be necessary. Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.