In each gene group (OG1, OG2) I have the same set of organisms.
Each organism has one or more genes in a given group. However, the number of genes for each organism varies across groups. In the below example, P.fragile has 3 genes in OG1, but only 2 genes in OG2.
To compare all-against-all genes, I need to rearrange the table: within one group, each gene of an organism should be listed in a row with all combinations of genes of the other organisms. I provided how the output should look.
The organism name can be omitted in the output because the gene_ID contains part of the organism name. I used the dplyer package to group the data using:
group_by(data,group)
But since each organism has a different number of genes in each gene group, I am stuck.
input table:
df <- structure(list(gene_ID = c("PF_1", "PF_2", "PF_3", "PI_1", "PI_2",
"PI_3", "PB_1", "PB_2", "PFa_1", "PFa_2", "PIa_1", "PIa_2", "PBa_1",
"PBa_2", "PBa_3"), organism = c("P. fragile", "P. fragile", "P. fragile",
"P. inui", "P. inui", "P. inui", "P. berghei", "P. berghei",
"P. fragile", "P. fragile", "P. inui", "P. inui", "P. berghei",
"P. berghei", "P. berghei"), group = c("OG1", "OG1", "OG1", "OG1",
"OG1", "OG1", "OG1", "OG1", "OG2", "OG2", "OG2", "OG2", "OG2",
"OG2", "OG2")), .Names = c("gene_ID", "organism", "group"), class = "data.frame", row.names = c(NA,
-15L))
output table:
group
OG1 PF_1 PI_1 PB_1
OG1 PF_1 PI_1 PB_2
OG1 PF_1 PI_2 PB_1
OG1 PF_1 PI_2 PB_2
OG1 PF_1 PI_3 PB_1
OG1 PF_1 PI_3 PB_2
OG1 PF_2 PI_1 PB_1
OG1 PF_2 PI_1 PB_2
OG1 PF_2 PI_2 PB_1
OG1 PF_2 PI_2 PB_2
OG1 PF_2 PI_3 PB_1
OG1 PF_2 PI_3 PB_2
OG1 PF_3 PI_1 PB_1
OG1 PF_3 PI_1 PB_2
OG1 PF_3 PI_2 PB_1
OG1 PF_3 PI_2 PB_2
OG1 PF_3 PI_3 PB_1
OG1 PF_3 PI_3 PB_2
OG2 PFa_1 PIa_1 PBa_1
OG2 PFa_1 PIa_1 PBa_2
OG2 PFa_1 PIa_1 PBa_3
OG2 PFa_1 PIa_2 PBa_1
OG2 PFa_1 PIa_2 PBa_2
OG2 PFa_1 PIa_2 PBa_3
OG2 PFa_2 PIa_1 PBa_1
OG2 PFa_2 PIa_1 PBa_2
OG2 PFa_2 PIa_1 PBa_3
OG2 PFa_2 PIa_2 PBa_1
OG2 PFa_2 PIa_2 PBa_2
OG2 PFa_2 PIa_2 PBa_3