0

I'm new to R and reg-ex but recently decided to take it up I'm trying to do something which I'm sure should be very simple

I have 2 SQL statements that i have randomly created using two methods. Both are set up differently

The first statement uses Aliases for the table names The second statement uses full names for tables

I have two questions

I want to get a script in R that will remove all the aliases and replace them with the full table name based on the From clause e.g.

SELECT AL1.attr1,AL1.attr2
FROM Table_1 as AL1

Would be turned into

SELECT Table_1.attr1,Table_1.attr2
FROM Table_1

As a second part of my little experiment, I want to be able to segment the fields using regex so for example only selecting AL1.attr1,AL1.attr2 and putting them in a column and a second column would have Table_1 as AL1

I think the second part will almost answer the first part

Any help would be much appreciated

Thanks

1 Answer 1

2

Without regex:

require(stringr); require(R.oo)
processMySQLtxt = function(txt)
{
  fromSplit = sapply(strsplit(txt,"FROM")[[1]],trim)
  tableInfo = trim(strsplit(fromSplit[2],"as")[[1]])
  tableName = tableInfo[1]
  aliasTable = tableInfo[2]
  originallySelectedNames = strsplit(fromSplit[1],"SELECT ")[[1]][2]  
  selectInfo = lapply(strsplit(originallySelectedNames,","),trim)[[1]]  
  newStatement = if(!is.na(aliasTable)) 
                    paste("SELECT ",paste(sapply(selectInfo, str_replace_all, pattern=aliasTable, replacement=tableName),collapse=","), " FROM ", tableName, sep="")
                 else
                   paste("SELECT ",paste(selectInfo,collapse=","), " FROM ", tableName, sep="")  
  return(data.frame("Originally"=originallySelectedNames, "OriginalTableAlias" = fromSplit[2], "newStatement" = newStatement))
}
txt= "SELECT AL1.attr1,AL1.attr2 FROM Table_1 as AL1"
processMySQLtxt(txt)
txt= "SELECT attr1,attr2 FROM Table_1"
processMySQLtxt(txt)

just (s/l)apply the function to your collection of statements. You can use rbindlist from data.table or do.call("rbind",results) to bring it all together.

Sign up to request clarification or add additional context in comments.

4 Comments

@ Hans Roggeman This is amazing, I need to probably go through the R documentation a little bit more to understand the sapply function but I understand the most part. A final question on this. What if there are no aliases so for example SELECT attr1,attr2 FROM Table_1. Both would be fed into your function and the full aliasing would need to be done in both. So if i put both SQLs into the function it would return the same answer for both?
You just need to check for NA on the aliasTable if that is the case - function changed in original post.
Final thing on this, Is it possible to use the strsplit function to split on 2 words so for example a WHERE clause if it exists. Im trying to make the function as robust as possible
The split can be any regular expression: strsplit("SELECT AL1.attr1,AL1.attr2 FROM Table_1 as AL1 WHERE AL1.attr1 == 1",split="WHERE|FROM|SELECT"). try ?strplit to get help on the function. If you are new to R you should get a decent IDE - try RStudio, Eclipse with StatET, Rkward or Revolution R.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.