0

I have this below lines. How to write equivalent python code for R code.

c1 <- c(7.15,7.45,8.15,8.45,9.15,9.45,10.15,10.45,11.15,11.45,12.15,12.45,13.15,13.45,14.15,14.45,15.15,15.45,16.15,16.45,17.15,17.45,18.15,18.45,19.15,19.45,20.15)
numeric_vector <- c(12.15,12.45,13.15,13.45,14.15,14.45,15.15,15.45,16.15,16.45,17.15,17.45,18.15)
data <- data.frame(matrix(nrow = 1,ncol = length(c1)))
colnames(data) <- c(c1)
data[1,] <- 0
data[1,colnames(data)[(colnames(data) %in% as.character(numeric_Vector))]] = data[1,colnames(data)[(colnames(data) %in% as.character(numeric_Vector))]] + 1
df <- tibble::rownames_to_column(data.frame(t(data)), "col1")

I have tried like below in python:

data = pd.DataFrame(index=np.arange(0), columns=np.arange(len(c1)))
data.columns = c1
data[0,] = 0
d1 = pd.DataFrame(numeric_vector)
d1.columns = ['col1']
d1['count'] =d1.apply(lambda x: 1, axis=1)
d1['col1'] = d1['col1'].astype('category')
add_col1 = set(c1) - set(d1['col1'].unique())
d1['col1'] = d1['col1'].cat.add_categories(add_col1)
otData = d1['col1'].value_counts().reset_index()

Please, help me to convert the lines to python. It is giving different output.

2
  • Please, make sure to provide a easy-to-reproduce minimal example, format your code, and show your attempts (to avoid down votes). See how-to-ask. Commented Nov 19, 2020 at 6:41
  • 1
    I have included my attempt Commented Nov 19, 2020 at 6:58

1 Answer 1

1

R:

df <- data.frame(col1=c1)
df$col2 <- as.integer(d$col1 %in% numeric_vector)

Python:

import pandas as pd
df = pd.DataFrame({'col1': c1})
df['col2'] = df.col1.isin(numeric_vector).astype(int)

Comparing outputs:

First, in R:

c1 <- c(7.15,7.45,8.15,8.45,9.15,9.45,10.15,10.45,11.15,11.45,12.15,12.45,13.15,13.45,14.15,14.45,15.15,15.45,16.15,16.45,17.15,17.45,18.15,18.45,19.15,19.45,20.15)
numeric_vector = c(12.15,12.45,13.15,13.45,14.15,14.45,15.15,15.45,16.15,16.45,17.15,17.45,18.15)

df <- data.frame(col1=c1)
df$col2 <- as.integer(df$col1 %in% numeric_vector)
write.csv(df, 'df.csv', row.names = F)

Then, in Python:

c1 = [7.15,7.45,8.15,8.45,9.15,9.45,10.15,10.45,11.15,11.45,12.15,12.45,13.15,13.45,14.15,14.45,15.15,15.45,16.15,16.45,17.15,17.45,18.15,18.45,19.15,19.45,20.15]
numeric_vector = [12.15,12.45,13.15,13.45,14.15,14.45,15.15,15.45,16.15,16.45,17.15,17.45,18.15]

import pandas as pd
df = pd.DataFrame({'col1': c1})
df['col2'] = df.col1.isin(numeric_vector).astype(int)

# Compare if all values are equal
df_R = pd.read_csv('df.csv')
print((df_R == df).values.all())
True

# Merge and compare outputs:
print(df.add_suffix('_Python').join(df_R.add_suffix('_R')))
    col1_Python  col2_Python  col1_R  col2_R
0          7.15            0    7.15       0
1          7.45            0    7.45       0
2          8.15            0    8.15       0
3          8.45            0    8.45       0
4          9.15            0    9.15       0
5          9.45            0    9.45       0
6         10.15            0   10.15       0
7         10.45            0   10.45       0
8         11.15            0   11.15       0
9         11.45            0   11.45       0
10        12.15            1   12.15       1
11        12.45            1   12.45       1
12        13.15            1   13.15       1
13        13.45            1   13.45       1
14        14.15            1   14.15       1
15        14.45            1   14.45       1
16        15.15            1   15.15       1
17        15.45            1   15.45       1
18        16.15            1   16.15       1
19        16.45            1   16.45       1
20        17.15            1   17.15       1
21        17.45            1   17.45       1
22        18.15            1   18.15       1
23        18.45            0   18.45       0
24        19.15            0   19.15       0
25        19.45            0   19.45       0
26        20.15            0   20.15       0
Sign up to request clarification or add additional context in comments.

9 Comments

But, I am getting different outputs in R and Python.
How so? I just saved the output from R, loaded it in Python, then compared to the df produced with Python, and they're identical.
Yes.I got same output. Thank you
Glad to help! I've added some comparison info. Let me know if you need more assistance. Best!
Hi, Actually I am doing above calculation using for loop by subsetting the data by several columns. How to use group by instead of for loop? Can I post the question ?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.