3

I have a data.table with sequences and number of reads, like so:

   sequence   num_reads
1: AACCTGCCG          1
2: CGCGCTCAA         12
3: AGTGTGAGC          3
4: TGGGTACAC         11
5: GGCCGCGTG         15
6: CCTTAAGAG          2
7: GCGGAACTG          9
8: GCGTTGTAG         17
9: GTTGTAGCG         20
10: ACACGTGAC        16

I'd like to use data.table to add two new columns to this table, based on the results of applying dpois() with two weights and two lambdas. The correct output should be this (based on using data.frame):

sequence     num_reads                       clus1                       clus2
1  AACCTGCCG         1 2.553269503552647000377e-03 1.610220613932057849571e-03
2  CGCGCTCAA        12 1.053993989051599418361e-02 2.887608256917401083896e-02
3  AGTGTGAGC         3 2.085170094567994833468e-02 1.717568654860860896672e-02
4  TGGGTACAC        11 1.806846838374168498498e-02 4.331412385376097462508e-02
5  GGCCGCGTG        15 1.324248858039188620275e-03 5.415587646672919558410e-03
6  CCTTAAGAG         2 8.936443262434262332916e-03 6.440882455728230530922e-03
7  GCGGAACTG         9 4.056186780023639942838e-02 7.444615037365168164207e-02
8  GCGTTGTAG        17 2.385595369261770803265e-04 1.274255916864215588610e-03
9  GTTGTAGCG        20 1.196285397159046524451e-05 9.538289904012846548518e-05
10 ACACGTGAC        16 5.793588753921446012421e-04 2.707793823336458478163e-03

But when I try to use data.table I can't seem to get the right result. Here is what I tried (based on similar questions asked around this topic):

    pois = function(n, p, l){return(dpois(as.numeric(as.character(n)), l)*p) }
    x = x[, c(paste("clus", seq(1,2), sep = '')) := pois(num_reads, c(0.4,0.6), c(7,8)), by = seq_len(nrow(x))]

And here is the result:

          sequence num_reads                             clus1                      clus2
    1:   AACCTGCCG         1       2.553269503552647000377e-03 2.553269503552647000377e-03
    2:   CGCGCTCAA        12       1.053993989051599418361e-02 1.053993989051599418361e-02
    3:   AGTGTGAGC         3       2.085170094567994833468e-02 2.085170094567994833468e-02
    4:   TGGGTACAC        11       1.806846838374168498498e-02 1.806846838374168498498e-02
    5:   GGCCGCGTG        15       1.324248858039188620275e-03 1.324248858039188620275e-03
    6:   CCTTAAGAG         2       8.936443262434262332916e-03 8.936443262434262332916e-03
    7:   GCGGAACTG         9       4.056186780023639942838e-02 4.056186780023639942838e-02
    8:   GCGTTGTAG        17       2.385595369261770803265e-04 2.385595369261770803265e-04
    9:   GCGTTGTAG        20       1.196285397159046524451e-05 1.196285397159046524451e-05
    10:  ACACGTGAC        16       5.793588753921446012421e-04 5.793588753921446012421e-04

The reason I'm using data.table and not data.frame is that my real data has 100,000s of rows. I studied the answers to this and this but I haven't been able to come up with a solution.

Any tips you have would be much appreciated. Thanks!

1 Answer 1

1

We can try

x[,  paste("clus", seq(1,2), sep = ''):= 
      as.list(pois(num_reads, c(0.4,0.6), c(7,8))), by = seq_len(nrow(x))]
x
#    sequence num_reads         clus1        clus2
#1: AACCTGCCG         1 0.00255326950 0.0016102206
#2: CGCGCTCAA        12 0.01053993989 0.0288760826
#3: AGTGTGAGC         3 0.02085170095 0.0171756865
#4: TGGGTACAC        11 0.01806846838 0.0433141239
#5: GGCCGCGTG        15 0.00132424886 0.0054155876
#6: CCTTAAGAG         2 0.00893644326 0.0064408825
#7: GCGGAACTG         9 0.04056186780 0.0744461504
#8: GCGTTGTAG        17 0.00023855954 0.0012742559
#9: GTTGTAGCG        20 0.00001196285 0.0000953829
#10:ACACGTGAC        16 0.00057935888 0.0027077938
Sign up to request clarification or add additional context in comments.

2 Comments

this worked exactly as intended- thank you for your prompt response. I'm curious though- why is it necessary to coerce the function output to a list?
@sqlck - list objects are converted to columns in data.table - that's just how it works. The same is true of data.frames too - data.frame(list(1,2,3)) for instance.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.