0

The following is the code I want to run. This is a code from Coursera. I am unable to run the sqldf function in this code:-

data = read.delim(file = 'purchases.txt', header = FALSE, sep = '\t', dec = '.')
str(data)

colnames(data) = c('customer_id', 'purchase_amount', 'date_of_purchase')
data$date_of_purchase = as.Date(data$date_of_purchase, "%Y-%m-%d")
data$days_since       = as.numeric(difftime(time1 = "2016-01-01",
                                        time2 = data$date_of_purchase,
                                        units = "days"))

head(data)
summary(data)

library(sqldf)

customers = sqldf("SELECT customer_id ,
                      MIN(days_since) AS 'recency',
                      COUNT(*) AS 'frequency',
                      AVG(purchase_amount) AS 'amount'
               FROM data GROUP BY 1")
1
  • Can I use data tables instead for the above query? How? Commented Jul 17, 2018 at 10:14

1 Answer 1

1

One must have the sqldf package installed in R to use the sqldf() function after loading it with the library() function.

To install sqldf in R, use the install.packages() function.

Here is a completely reproducible version of the OP code, including install.packages() to install sqldf:

textFile <- "
001,42.5,2017-01-01
001,38.7,2017-05-02
002,47.9,2017-06-05"

# commented out original data read section
# data = read.delim(file = 'purchases.txt', header = FALSE, sep = '\t', dec = '.')
# str(data)

# replace with inline data and read.csv()
data <- read.csv(text=textFile,header=FALSE,stringsAsFactors=FALSE)

colnames(data) = c('customer_id', 'purchase_amount', 'date_of_purchase')
data$date_of_purchase = as.Date(data$date_of_purchase, "%Y-%m-%d")
data$days_since       = as.numeric(difftime(time1 = "2016-01-01",
                                            time2 = data$date_of_purchase,
                                            units = "days"))

head(data)
summary(data)

# only need to run install.packages() once
install.packages("sqldf")
library(sqldf)

customers = sqldf("SELECT customer_id ,
                      MIN(days_since) AS 'recency',
                      COUNT(*) AS 'frequency',
                      AVG(purchase_amount) AS 'amount'
               FROM data GROUP BY 1")
customers

...and the output:

> customers
  customer_id   recency frequency amount
1           1 -486.7917         2   40.6
2           2 -520.7917         1   47.9
> 
Sign up to request clarification or add additional context in comments.

6 Comments

Thank you Len. I am still getting the same error message. I am pasting the error below:- Error in sqldf("SELECT customer_id ,\n MIN(days_since) AS 'recency',\n COUNT(*) AS 'frequency',\n AVG(purchase_amount) AS 'amount'\n FROM data GROUP BY 1") : could not find function "sqldf"
Could it be a problem with my RStudio? Do I need to reinstall RStudio?
What happens when you attempt to use install.packages() to install sqldf from the R console?
The package gets installed without any problems.
Have you tried to run R without RStudio, load sqldf and use it? If so, did you receive the same error message?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.