2

I have an R file which imports a file, does some data manipulation, and performs a logistic regression model, and then saves those results to a txt file. However, when I run the file from the command line, I get the following error message and don't know what's going on.

anonymous@anonymous-Latitude-E6520:~/Downloads$ R --no-save < Auto_Model.r > out.txt
Warning message:
NAs introduced by coercion 
Error in if (x == "\\N") NA else if (x > 1 & x < 6999) "1:6999" else if (x >  : 
  missing value where TRUE/FALSE needed
Calls: bin.value -> do.call -> mapply -> .Call -> <Anonymous>
Execution halted
anonymous@anonymous-Latitude-E6520:~/Downloads$ R --no-save < Auto_Model.r

The R script which results in the error is below =

> ## IMPORT DATA:
> #setwd("~/Desktop")
> library(foreign)
> dat = read.csv("dat.csv", stringsAsFactors=FALSE)
> 
> ## zipcode = 
> dat$zipcode = as.character(dat$zipcode)
> 
> bin.value = Vectorize(function(x) {
+   if (x == "\\N") NA
+   else if (x > 1 & x < 6999) "1:6999"
+   else if (x > 7000 & x < 9999) "7000:9999"
+   else if (x > 10000 & x < 14849) "10000:14849"
+   else if (x > 14850 & x < 19699) "14850:19699"
+   else if (x > 19700 & x < 29999) "19700:29999"
+   else if (x > 30000 & x < 31999) "30000:31999"
+   else if (x > 32000 & x < 34999) "32000:34999"
+   else if (x > 35000 & x < 42999) "35000:42999"
+   else if (x > 43000 & x < 49999) "43000:49999"
+   else if (x > 50000 & x < 59999) "50000:59999"
+   else if (x > 60000 & x < 69999) "60000:69999"
+   else if (x > 70000 & x < 79999) "70000:79999"
+   else if (x > 80000 & x < 89999) "80000:89999"
+   else if (x > 90000 & x < 96999) "90000:96999"
+   else if (x > 97000 & x < 99820) "97000:99820"
+   else NA 
+ })
> 
> dat$zipcode2 = as.character(bin.value(as.integer(dat$zipcode)))
Error in if (x == "\\N") NA else if (x > 1 & x < 6999) "1:6999" else if (x >  : 
  missing value where TRUE/FALSE needed
Calls: bin.value -> do.call -> mapply -> .Call -> <Anonymous>
Execution halted

I assume some is wrong in how I am trying to manipulate the mode of the zipcode variable but nothing I've tried seems to fix the issue.

> str(dat$zipcode)
 int [1:12635] 76148 33825 61832 11368 98290 92078 44104 62052 55106 20861 ...
> 
0

1 Answer 1

3

It seems to me that what you're trying to do is already done by function cut:

bin.value <- function(x){
    cut(as.integer(x),
        breaks= c(1,6999,9999,14849,19699,29999,31999,34999,42999,49999,59999,69999,79999,89999,96999,99820),
        labels= c("1:6999", "7000:9999", "10000:14849", "14850:19699", "19700:29999", "30000:31999", "32000:34999", "35000:42999", "43000:49999", "50000:59999", "60000:69999", "70000:79999", "80000:89999", "90000:96999", "97000:99820"))
    }

Otherwise your specific problem is caused by as.integer:

a <- c("\\N",sample(seq(0,100000,by=1),10))
a
[1] "\\N"   "38987" "50403" "75683" "66706" "27924" "17216" "77539" "80658" "2335"  "53010"
as.integer(a)
[1]    NA 38987 50403 75683 66706 27924 17216 77539 80658  2335 53010

\\N is therefore traited straight away as NA which your loop only handle at the end, meanwhile all ifstatements try to compare a missing value with some elements.

as.integer(a)[1]=="\\N"
[1] NA # Instead of TRUE or FALSE
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, that's a much more elegant way to bucket the variables.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.