I have a dataset with an ID column and Value column and I ran the Shapiro Wilk test in order to check the normality of the different values in the ID column regarding their values in the Value column. What I want to do is to create a data frame that will have 4 columns. The first will be the name of the ID, the second will be the W value and the third will be the p-value. The fourth column will have values non- normal /normal or accept H1 depending on the p-values. This is the dput
dput(df)
structure(list(ID = c("F1", "F1", "F1", "F1", "F1", "F1", "F1",
"F2", "F2", "F2", "F2", "F2", "F2", "F2", "F2", "F3", "F3", "F3",
"F3", "F3", "F3", "F3", "F3", "F3", "F4", "F4", "F4", "F4", "F4",
"F4", "F4", "F4"), Values = c(9.6, NA, 10.2, 9.8, 9.9, 9.9, 9.9,
1.2, 1.2, 1.8, 1.5, 1.5, 1.6, 1.4, NA, 3266, 3256, 7044, 6868,
NA, 3405, 3410, NA, 5567, 59.4, 56, 52.8, 52.4, 55.5, NA, NA,
53.6)), class = "data.frame", row.names = c(NA, -32L))
This is what I have done so far in order to run the test of all the IDs and an effort to create a data frame.
install.packages("data.table")
library(data.table)
uniq_id <- unique(df$ID)
number_loops <- length(uniq_id)
headings <- c("ID", "Wilk", "p_value","Accept H1")
df_table <- as.data.frame(matrix(NA, nrow = (number_loops ), ncol = length(headings)))
for(i in unique(df$ID)){
u <- setDT(df)[ID == i]
s <- shapiro.test(u$Values)
print(i)
print(s)
}