0

There is a table with 97972561 rows (recordings) and 8 columns (attributes). The format looks like:

+--------------+------+-------------+------------+--------------+-----------+-----------+-------------+
| PREDICATE_ID | PMID | SENTENCE_ID | SUBJECT_ID | SUBJECT_NAME | PREDICATE | OBJECT_ID | OBJECT_NAME |
+--------------+------+-------------+------------+--------------+-----------+-----------+-------------+

I would like to filter recordings whose subject, predication and object value only appear once. For example, there are four recordings in a table. The last recording should be excluded from the result because (Bob, is_a, Person) only appears once.

+--------------+------+-------------+------------+--------------+-----------+-----------+-------------+
| PREDICATE_ID | PMID | SENTENCE_ID | SUBJECT_ID | SUBJECT_NAME | PREDICATE | OBJECT_ID | OBJECT_NAME |
+--------------+------+-------------+------------+--------------+-----------+-----------+-------------+
| 1            | 100  | 1           | 2          | Bob          | is_born_in| 3         | 1994        |
+--------------+------+-------------+------------+--------------+-----------+-----------+-------------+
| 1            | 103  | 3           | 2          | Bob          | is_born_in| 3         | 1994        |
+--------------+------+-------------+------------+--------------+-----------+-----------+-------------+
| 1            | 102  | 5           | 2          | Bob          | is_born_in| 3         | 1994        |
+--------------+------+-------------+------------+--------------+-----------+-----------+-------------+
| 2            | 104  | 2           | 2          | Bob          | is_a      | 4         | Person      |
+--------------+------+-------------+------------+--------------+-----------+-----------+-------------+

Any help would be appreciated!

3
  • Filtered or deleted, as in you want them to be excluded from your query result, or deleted from the table entirely? Commented Nov 12, 2019 at 3:20
  • I want them to be excluded. @MichaelNovello Commented Nov 12, 2019 at 3:25
  • Why would a table store both the subject_id and the subject- likewise for predicate and object!?!?!? Commented Nov 12, 2019 at 7:57

1 Answer 1

1

Using aggregation, we can try:

SELECT t1.*
FROM yourTable t1
INNER JOIN
(
    SELECT SUBJECT_ID, PREDICATE_ID, OBJECT_ID
    FROM yourTable
    GROUP BY SUBJECT_ID, PREDICATE_ID, OBJECT_ID
    HAVING COUNT(*) > 1
) t2
    ON t1.SUBJECT_ID = t2.SUBJECT_ID AND
       t1.PREDICATE_ID = t2.PREDICATE_ID AND
       t1.OBJECT_ID = t2.OBJECT_ID;

If you are using MySQL 8+, we can leverage analytical functions for a cleaner looking query:

WITH cte AS (
    SELECT *, COUNT(*) OVER (PARTITION BY BY SUBJECT_ID, PREDICATE_ID, OBJECT_ID) cnt
    FROM yourTable
)

SELECT *
FROM cte
WHERE cnt > 1;
Sign up to request clarification or add additional context in comments.

2 Comments

Where are you getting t2 from? It looks like he's just using one table.
I am running it. It seems to work. Because there are too many recordings in the table, so it takes time. Will accept it after the result returns. Thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.