Combine values on matching strings

Question

I have two files and I want to combine the logFC from File2 for those miRNA that have similar ID in File1 and matches File2. F.ex all miRNA with ID1 should be combined based on the matching string in File2.

File1:

ID  miRNA
ID1 hsa-miR-512-1
ID1 hsa-miR-512-2
ID1 hsa-miR-1323
ID1 hsa-miR-498
ID1 hsa-miR-520e
ID1 hsa-miR-515-1
ID1 hsa-miR-519e
ID1 hsa-miR-520f
ID2 hsa-miR-495
ID2 hsa-miR-376c
ID2 hsa-miR-376a-2
ID2 hsa-miR-654
ID2 hsa-miR-376b
ID2 hsa-miR-376a-1
ID2 hsa-miR-300
ID2 hsa-miR-1185-1
ID2 hsa-miR-1185-2
ID24    hsa-miR-1179
ID24    hsa-miR-7-2
ID24    hsa-miR-3677
ID25    hsa-miR-940
ID25    hsa-miR-4717

File2: 
miRNA           logFC
hsa-miR-512-1   13
hsa-miR-512-2   123 
hsa-miR-1323    53
hsa-miR-498     4.2
hsa-miR-520e    12
hsa-miR-515-1   1
hsa-miR-519e    56
hsa-miR-520f    113
hsa-miR-495     11
hsa-miR-376c    11
hsa-miR-376a-2  113 
hsa-miR-654     13
hsa-miR-376b    123
hsa-miR-376a-1  567
hsa-miR-300     757
hsa-miR-1185-1   6
hsa-miR-1185-2  35
hsa-miR-1179    2
hsa-miR-7-2     2
hsa-miR-3677    1
hsa-miR-940     134
hsa-miR-4717    566


Output:

ID1     Average logFC for all ID1 miRNA
ID2     Average logFC for all ID2 miRNA
...

Can't see how the ID is used when you don't have an ID column in File2. You might need to provide an output example. Also, the "miRNA" column in File2 includes "miR" when in File1 includes "mir" and the capital "R" will mess up your maching on "miRNA" column. You'll probably need to use lower case or upper case names. — AntoniosK
– AntoniosK, Commented Sep 1, 2015 at 10:40

AntoniosK · Accepted Answer · 2015-09-01 12:10:04Z

As @Heroka mentioned in the beginning it is a merge job (that means joining your tables on the right key column). I'm using a dplyr approach, however there are many other ways/commands to do this:

File1 = read.table(text="ID  miRNA
ID1 hsa-miR-512-1
ID1 hsa-miR-512-2
ID1 hsa-miR-1323
ID1 hsa-miR-498
ID1 hsa-miR-520e
ID1 hsa-miR-515-1
ID1 hsa-miR-519e
ID1 hsa-miR-520f
ID2 hsa-miR-495
ID2 hsa-miR-376c
ID2 hsa-miR-376a-2
ID2 hsa-miR-654
ID2 hsa-miR-376b
ID2 hsa-miR-376a-1
ID2 hsa-miR-300
ID2 hsa-miR-1185-1
ID2 hsa-miR-1185-2
ID24    hsa-miR-1179
ID24    hsa-miR-7-2
ID24    hsa-miR-3677
ID25    hsa-miR-940
ID25    hsa-miR-4717", header=T)

File2 = read.table(text="miRNA           logFC
hsa-miR-512-1   13
hsa-miR-512-2   123 
hsa-miR-1323    53
hsa-miR-498     4.2
hsa-miR-520e    12
hsa-miR-515-1   1
hsa-miR-519e    56
hsa-miR-520f    113
hsa-miR-495     11
hsa-miR-376c    11
hsa-miR-376a-2  113 
hsa-miR-654     13
hsa-miR-376b    123
hsa-miR-376a-1  567
hsa-miR-300     757
hsa-miR-1185-1   6
hsa-miR-1185-2  35
hsa-miR-1179    2
hsa-miR-7-2     2
hsa-miR-3677    1
hsa-miR-940     134
hsa-miR-4717    566", header=T)


library(dplyr)

File1 %>% 
  inner_join(File2, by="miRNA") %>%     # join your datasets based on miRNA column
  group_by(ID) %>%                      # group by ID
  summarise(AvgLogFC = mean(logFC))     # calculate average values

#     ID   AvgLogFC
# 1  ID1  46.900000
# 2  ID2 181.777778
# 3 ID24   1.666667
# 4 ID25 350.000000

Note that I'm using inner_join which assumes that all your miRNA values in File1 exist in File2.

Collectives™ on Stack Overflow

Combine values on matching strings

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related