0

I have a data table called iso:

> iso
                     variant_id             transcript_id is_NL counts nrows
     1: chr10_129450960_T_C_b38 chr10_129467297_129536240     0  33029   458
     2: chr10_129450960_T_C_b38 chr10_129467297_129536240     1   3477    54
     3: chr10_129450960_T_C_b38 chr10_129467297_129536240     2    130     3
     4: chr10_129450960_T_C_b38 chr10_129536378_129563778     0     51   458
     5: chr10_129450960_T_C_b38 chr10_129536378_129563778     1      8    54
    ---
500148:   chr9_34699703_G_C_b38    chr9_34649082_34649409     1   4214    57
500149:   chr9_34699703_G_C_b38    chr9_34649082_34649409     2    171     2
500150:   chr9_34699703_G_C_b38    chr9_34649565_34650368     0  48713   456
500151:   chr9_34699703_G_C_b38    chr9_34649565_34650368     1   4932    57
500152:   chr9_34699703_G_C_b38    chr9_34649565_34650368     2    208     2

I filtered it such that for each row, when is_NL == 0, only include the row if counts/nrows < 50 or when is_NL == c(1, 2), only include the row if counts/nrows < 50:

> iso[with(iso, (is_NL == 0 & counts/nrows < 50) |
+                 (is_NL %in% c(1,2) & counts/nrows > 50)),]
                     variant_id             transcript_id is_NL counts nrows
     1: chr10_129450960_T_C_b38 chr10_129467297_129536240     1   3477    54
     2: chr10_129450960_T_C_b38 chr10_129536378_129563778     0     51   458
     3: chr10_129450960_T_C_b38 chr10_129536378_129707894     1   3847    54
     4: chr10_129450960_T_C_b38 chr10_129701913_129707894     0    188   458
     5: chr10_129450960_T_C_b38 chr10_129708044_129715519     0     17   458
    ---
198076:   chr9_34699703_G_C_b38    chr9_34648908_34648997     0    611   456
198077:   chr9_34699703_G_C_b38    chr9_34649082_34649409     1   4214    57
198078:   chr9_34699703_G_C_b38    chr9_34649082_34649409     2    171     2
198079:   chr9_34699703_G_C_b38    chr9_34649565_34650368     1   4932    57
198080:   chr9_34699703_G_C_b38    chr9_34649565_34650368     2    208     2

However, now I realized that I only want to include rows whose other instances of matching variant_id and transcript_id meet that criteria. For example:

500150:   chr9_34699703_G_C_b38    chr9_34649565_34650368     0  48713   456
500151:   chr9_34699703_G_C_b38    chr9_34649565_34650368     1   4932    57
500152:   chr9_34699703_G_C_b38    chr9_34649565_34650368     2    208     2

The above demonstrates what I mean. The variant_id and transcript_id pairs, for each value of is_NL, meets the criteria of either counts/nrows < 50 (when is_NL == 0) or counts/nrows > 50 (when is_NL == c(1, 2))

198077:   chr9_34699703_G_C_b38    chr9_34649082_34649409     1   4214    57
198078:   chr9_34699703_G_C_b38    chr9_34649082_34649409     2    171     2

The above is an example of what I do not want. Both rows have matching variant_id and transcript_id values and the correct value for counts/nrows, but the row containing is_NL == 0 is missing presumably because, for that row, counts/nrows !< 50.

I hope I have made myself clear. I just want instances where variant_id and transcript_id match, and counts/nrows for each value of is_NL is either < 50 if is_NL == 0 and > 50 if is_NL == c(1,2).

If this is done correctly, I should have triplets of variant_id and transcript_id combinations, and each triplet should have an is_NL value of either 0, 1 or 2.

1 Answer 1

2

Try the following:

library(dplyr)

iso <- setDT(iso)[with(iso, (is_NL == 0 & counts/nrows < 50) | (is_NL %in% c(1,2) & counts/nrows > 50)),][, triplet := .N, by = .(variant_id, transcript_id)][triplet == 3, ][, triplet := NULL]

It creates a temporary variable and selects only those rows which create needed triplets.

Sign up to request clarification or add additional context in comments.

3 Comments

Hmm.. it's not giving me the desired result. Should I try that line after iso[with(iso, (is_NL == 0 & counts/nrows < 50) | (is_NL %in% c(1,2) & counts/nrows > 50)),]?
Yes, I did not wanted to change anything of your concept, just add something.
The answer has now been edited and should work as is.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.