3

SQLDF newbie here.

I have a data frame which has about 15,000 rows and 1 column. The data looks like:

cars
autocar
carsinfo
whatisthat
donnadrive
car
telephone
...

I wanted to use the package sqldf to loop through the column and pick all values which contain "car" anywhere in their value. However, the following code generates an error.

> sqldf("SELECT Keyword FROM dat WHERE Keyword="car")
Error: unexpected symbol in "sqldf("SELECT Keyword FROM dat WHERE Keyword="car"

There is no unexpected symbol, so I'm not sure whats wrong.

so first, I want to know all the values which contain 'car'. then I want to know only those values which contain just 'car' by itself.

Can anyone help.

EDIT:

allright, there was an unexpected symbol, but it only gives me just car and not every row which contains 'car'.

> sqldf("SELECT Keyword FROM dat WHERE Keyword='car'")
  Keyword
1     car
2
  • 1
    You have at least one set of closing parentheses missing in your statement... Commented Jun 2, 2011 at 14:57
  • you need to escape the quotes sqldf("SELECT Keyword FROM dat WHERE Keyword=\"car\""). Commented Jun 2, 2011 at 14:59

4 Answers 4

7

Using = will only return exact matches.

You should probably use the like operator combined with the wildcards % or _. The % wildcard will match multiple characters, while _ matches a single character.

Something like the following will find all instances of car, e.g. "cars", "motorcar", etc:

sqldf("SELECT Keyword FROM dat WHERE Keyword like '%car%'")

And the following will match "car" or "cars":

sqldf("SELECT Keyword FROM dat WHERE Keyword like 'car_'")
Sign up to request clarification or add additional context in comments.

Comments

3

This has nothing to do with sqldf; your SQL statement is the problem. You need:

dat <- data.frame(Keyword=c("cars","autocar","carsinfo",
  "whatisthat","donnadrive","car","telephone"))
sqldf("SELECT Keyword FROM dat WHERE Keyword like '%car%'")
#    Keyword
# 1     cars
# 2  autocar
# 3 carsinfo
# 4      car

1 Comment

No problem. A handy-dandy SQL cheat sheet (courtesy of Google) would probably save you some time / headaches.
2

You can also use regular expressions to do this sort of filtering. grepl returns a logical vector (TRUE / FALSE) stating whether or not there was a match or not. You can get very sophisticated to match specific items, but a basic query will work in this case:

#Using @Joshua's dat data.frame
subset(dat, grepl("car", Keyword, ignore.case = TRUE))

   Keyword
1     cars
2  autocar
3 carsinfo
6      car

Comments

0

Very similar to the solution provided by @Chase. Because we do not use subset we do not need a logical vector and can use both grep or grepl:

df <- data.frame(keyword = c("cars", "autocar", "carsinfo", "whatisthat", "donnadrive", "car", "telephone"))
df[grep("car", df$keyword), , drop = FALSE] # or
df[grepl("car", df$keyword), , drop = FALSE]

   keyword
1     cars
2  autocar
3 carsinfo
6      car

I took the idea from Selecting rows where a column has a string like 'hsa..' (partial string match)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.