0

I have two files and I want to combine the logFC from File2 for those miRNA that have similar ID in File1 and matches File2. F.ex all miRNA with ID1 should be combined based on the matching string in File2.

File1:

ID  miRNA
ID1 hsa-miR-512-1
ID1 hsa-miR-512-2
ID1 hsa-miR-1323
ID1 hsa-miR-498
ID1 hsa-miR-520e
ID1 hsa-miR-515-1
ID1 hsa-miR-519e
ID1 hsa-miR-520f
ID2 hsa-miR-495
ID2 hsa-miR-376c
ID2 hsa-miR-376a-2
ID2 hsa-miR-654
ID2 hsa-miR-376b
ID2 hsa-miR-376a-1
ID2 hsa-miR-300
ID2 hsa-miR-1185-1
ID2 hsa-miR-1185-2
ID24    hsa-miR-1179
ID24    hsa-miR-7-2
ID24    hsa-miR-3677
ID25    hsa-miR-940
ID25    hsa-miR-4717

File2: 
miRNA           logFC
hsa-miR-512-1   13
hsa-miR-512-2   123 
hsa-miR-1323    53
hsa-miR-498     4.2
hsa-miR-520e    12
hsa-miR-515-1   1
hsa-miR-519e    56
hsa-miR-520f    113
hsa-miR-495     11
hsa-miR-376c    11
hsa-miR-376a-2  113 
hsa-miR-654     13
hsa-miR-376b    123
hsa-miR-376a-1  567
hsa-miR-300     757
hsa-miR-1185-1   6
hsa-miR-1185-2  35
hsa-miR-1179    2
hsa-miR-7-2     2
hsa-miR-3677    1
hsa-miR-940     134
hsa-miR-4717    566


Output:

ID1     Average logFC for all ID1 miRNA
ID2     Average logFC for all ID2 miRNA
...
6
  • What have you already tried? Look into merge(). Commented Sep 1, 2015 at 10:01
  • Thanks, I have looked at merge and within Commented Sep 1, 2015 at 10:01
  • similar or the same ID?? Commented Sep 1, 2015 at 10:02
  • It would be straightforward with ?merge Commented Sep 1, 2015 at 10:02
  • 1
    Can't see how the ID is used when you don't have an ID column in File2. You might need to provide an output example. Also, the "miRNA" column in File2 includes "miR" when in File1 includes "mir" and the capital "R" will mess up your maching on "miRNA" column. You'll probably need to use lower case or upper case names. Commented Sep 1, 2015 at 10:40

1 Answer 1

1

As @Heroka mentioned in the beginning it is a merge job (that means joining your tables on the right key column). I'm using a dplyr approach, however there are many other ways/commands to do this:

File1 = read.table(text="ID  miRNA
ID1 hsa-miR-512-1
ID1 hsa-miR-512-2
ID1 hsa-miR-1323
ID1 hsa-miR-498
ID1 hsa-miR-520e
ID1 hsa-miR-515-1
ID1 hsa-miR-519e
ID1 hsa-miR-520f
ID2 hsa-miR-495
ID2 hsa-miR-376c
ID2 hsa-miR-376a-2
ID2 hsa-miR-654
ID2 hsa-miR-376b
ID2 hsa-miR-376a-1
ID2 hsa-miR-300
ID2 hsa-miR-1185-1
ID2 hsa-miR-1185-2
ID24    hsa-miR-1179
ID24    hsa-miR-7-2
ID24    hsa-miR-3677
ID25    hsa-miR-940
ID25    hsa-miR-4717", header=T)

File2 = read.table(text="miRNA           logFC
hsa-miR-512-1   13
hsa-miR-512-2   123 
hsa-miR-1323    53
hsa-miR-498     4.2
hsa-miR-520e    12
hsa-miR-515-1   1
hsa-miR-519e    56
hsa-miR-520f    113
hsa-miR-495     11
hsa-miR-376c    11
hsa-miR-376a-2  113 
hsa-miR-654     13
hsa-miR-376b    123
hsa-miR-376a-1  567
hsa-miR-300     757
hsa-miR-1185-1   6
hsa-miR-1185-2  35
hsa-miR-1179    2
hsa-miR-7-2     2
hsa-miR-3677    1
hsa-miR-940     134
hsa-miR-4717    566", header=T)


library(dplyr)

File1 %>% 
  inner_join(File2, by="miRNA") %>%     # join your datasets based on miRNA column
  group_by(ID) %>%                      # group by ID
  summarise(AvgLogFC = mean(logFC))     # calculate average values

#     ID   AvgLogFC
# 1  ID1  46.900000
# 2  ID2 181.777778
# 3 ID24   1.666667
# 4 ID25 350.000000

Note that I'm using inner_join which assumes that all your miRNA values in File1 exist in File2.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.