I was just wondering if it was possible to use python codes in R to modify or create new tables?
Here is an example of code I use in python
Here is a dataframe:
species family Events groups
1 SP1 A 10,22 G1
2 SP1 B 7 G2
3 SP1 C,D 4,5,6,1,3 G3,G4,G5,G6
4 SP2 A 22,10 G1
5 SP2 D,C 6,5,4,3,1 G4,G6,G5,G3
6 SP3 C 4,5,3,6,1 G3,G6,G5
7 SP3 E 7 G2
8 SP3 A 10 G1
9 SP4 C 7,22 G12
with this code in Python:
g = df['groups'].apply(lambda x: set(x.split(','))) # explode into sets
# keep the larger set from g containing the current one and make it back a string
g2 = g.apply(lambda s: ','.join(sorted(
g[g.apply(lambda x: x.issuperset(s))].max())))
resul = df[['species', 'family', 'Events']].groupby(g2).agg(
lambda x: ','.join(sorted(set((i for j in x for i in j.split(',')))))
).reset_index()
I can transform it as :
species family Events groups
0 SP1,SP2,SP3 A 10,22 G1
1 SP4 C 22,7 G12
2 SP1,SP3 B,E 7 G2
3 SP1,SP2,SP3 C,D 1,3,4,5,6 G3,G4,G5,G6
And I just wanted to know if there is a way to call the python code and produce them directly on R?
In fact I need to work on R but I'm much more familiar with Python code.
Python dictionary equivalent dataframe:
{'species': {1: 'SP1', 2: 'SP1', 3: 'SP1', 4: 'SP2', 5: 'SP2', 6: 'SP3', 7: 'SP3', 8: 'SP3', 9: 'SP4'}, 'family': {1: 'A', 2: 'B', 3: 'C,D', 4: 'A', 5: 'D,C', 6: 'C', 7: 'E', 8: 'A', 9: 'C'}, 'Events': {1: '10,22', 2: '7', 3: '4,5,6,1,3', 4: '22,10', 5: '6,5,4,3,1', 6: '4,5,3,6,1', 7: '7', 8: '10', 9: '7,22'}, 'groups': {1: 'G1', 2: 'G2', 3: 'G3,G4,G5,G6', 4: 'G1', 5: 'G4,G6,G5,G3', 6: 'G3,G6,G5', 7: 'G2', 8: 'G1', 9: 'G12'}}
r equivalent dataframe:
structure(list(species = structure(c(1L, 1L, 1L, 2L, 2L, 3L,
3L, 3L, 4L), .Label = c("SP1", "SP2", "SP3", "SP4"), class = "factor"),
family = structure(c(1L, 2L, 4L, 1L, 5L, 3L, 6L, 1L, 3L), .Label = c("A",
"B", "C", "C,D", "D,C", "E"), class = "factor"), Events = structure(c(2L,
7L, 5L, 3L, 6L, 4L, 7L, 1L, 8L), .Label = c("10", "10,22",
"22,10", "4,5,3,6,1", "4,5,6,1,3", "6,5,4,3,1", "7", "7,22"
), class = "factor"), groups = structure(c(1L, 3L, 4L, 1L,
6L, 5L, 3L, 1L, 2L), .Label = c("G1", "G12", "G2", "G3,G4,G5,G6",
"G3,G6,G5", "G4,G6,G5,G3"), class = "factor")), class = "data.frame", row.names = c(NA,
-9L))