4

I have a dataframe in R which contains the output of previous queries. Unfortunately, I cannot do this directly in SQL since it is too slow so I am using the data.table package. The output from the data.table package is a data frame of 50,000 ids. I need to pull all records from the database for each id.

# x is a dataframe containing 50,000 ids. 

Usually, I would do something like,

dbGetQuery(con, "Select * from data where id in x") 

but that won't work. An alternative is to do 50,000 queries in a for loop, but I am thinking that there must be a more efficient method to do this.

What is the most efficient way to do this?

10
  • I don't know what MySQL's limit on the number of items in an IN clause are, but suspect (??) that it's pretty large. Are you sure you can't just put all or most of them in one IN clause? (Another option of course is to push the ids to a temporary table in the db and do a join.) Commented Oct 28, 2015 at 21:42
  • what do you mean but that wont work ? Commented Oct 28, 2015 at 21:44
  • how about if you step back, describe the tables, and what you want to achieve, so we don't go down the XY Problem path Commented Oct 28, 2015 at 21:46
  • dbGetQuery(con, "select * from data where order_id in x"). I get the following error: Error in .local(conn, statement, ...) : could not run statement: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'x' at line 1 Commented Oct 28, 2015 at 21:47
  • 2
    In order to do IN clause queries like this from R, you'll need to explicitly build the query via string concatenation using paste() or sprintf(). Commented Oct 28, 2015 at 21:51

1 Answer 1

9

For example,

x <- 0:3
> q <- "select * from table where id in (%s)"
> sprintf(q,paste(x,collapse = ","))
[1] "select * from table where id in (0,1,2,3)"

As I mentioned in my comment, some databases have limits on the number of items you can put in the IN clause. I'm not familiar enough with MySQL to know what that is, but I'd be willing to bet it's large enough that you could do this in only a handful of queries.

And in many cases this will be less efficient (slower) than having the IDs in a table in the database and doing a join, but sometimes people don't have the access to the database required to accomplish that.

Sign up to request clarification or add additional context in comments.

4 Comments

@quantactuary Out of curiosity, did it accept all 50k in one IN clause, or did you have to split it up?
Good question - it ran all at once, literally in seconds.
@joran How can the same thing be done with string type values rather than numeric?
@VijayBarve e.g. paste(paste0("'",letters[1:3],"'"),collapse = ",") just to add the single quotes; again with the usual warnings that this sort of thing should only be done if your db setup is such that SQL injection is not much of a concern.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.