1

I have a query string which I want to separate out

created_at BETWEEN '2018-01-01T00:00:00+05:30' AND '2019-01-01T00:00:00+05:30' AND updated_at BETWEEN '2018-05-01T00:00:00+05:30' AND '2019-05-01T00:00:00+05:30' AND user_id = 5 AND status = 'closed'

Like this

created_at BETWEEN '2018-01-01T00:00:00+05:30' AND '2019-01-01T00:00:00+05:30'

updated_at BETWEEN '2018-05-01T00:00:00+05:30' AND '2019-05-01T00:00:00+05:30'

user_id = 5

status = 'closed'

This is just an example string, I want to separate the query string dynamically. I know can't just split with AND because of the pattern like BETWEEN .. AND

7
  • There could be many edge cases in a SQL where clause which your sample string may not make obvious. Are you sure that this example covers all edge cases? Commented Jul 7, 2019 at 10:55
  • Looks like you are looking to create a regex, but do not know where to get started. Please check Reference - What does this regex mean resource, it has plenty of hints. Once you get some expression ready and still have issues with the solution, please edit the question with the latest details and we'll be glad to help you fix the problem. Commented Jul 7, 2019 at 11:46
  • @WiktorStribiżew Can you please check the question properly? This isn't duplicate and I know basics of regex, it contains AND and I can't split using 'AND' because, for some AND cases, I don't have to split. Commented Jul 7, 2019 at 11:51
  • So, where is the regex you need help with? Commented Jul 7, 2019 at 11:52
  • @TimBiegeleisen I know this doesn't cover all the cases but I have covered a sample edge case where for some 'AND' conditions I can't split a string. Commented Jul 7, 2019 at 11:55

3 Answers 3

1

You might be able to do this with regex but here's a parser that may work for your use case. It can surely be improved but it should work.

require 'time'

def parse(sql)
  arr = []
  split = sql.split(' ')
  date_counter = 0
  split.each_with_index do |s, i|
    date_counter = 2 if s == 'BETWEEN'
    time = Time.parse(s.strip) rescue nil
    date_counter -= 1 if time
    arr << i+1 if date_counter == 1
  end
  arr.select(&:even?).each do |index|
    split.insert(index + 2, 'SPLIT_ME')
  end
  split = split.join(' ').split('SPLIT_ME').map{|l| l.strip.gsub(/(AND)$/, '')}
  split.map do |line|
    line[/^AND/] ? line.split('AND') : line
  end.flatten.select{|l| !l.empty?}.map(&:strip)
end
Sign up to request clarification or add additional context in comments.

Comments

1

This is not really a regex, but more a simple parser.

  1. This works by matching a regex from the start of the string until it encounters a whitespace followed by either and or between followed by a whitespace character. The result is removed from the where_cause and saved in statement.
  2. If the start of the string now starts with a whitespace followed by between followed by a whitespace. It is added to statement and removed from where_cause with anything after that, allowing 1 and. Matching stops if the end of the string is reached or another and is encountered.
  3. If point 2 didn't match check if the string starts with a whitespace followed by and followed by a whitespace. If this is the case remove this from where_cause.
  4. Finally add statement to the statements array if it isn't an empty string.

All matching is done case insensitive.

where_cause = "created_at BETWEEN '2018-01-01T00:00:00+05:30' AND '2019-01-01T00:00:00+05:30' AND updated_at BETWEEN '2018-05-01T00:00:00+05:30' AND '2019-05-01T00:00:00+05:30' AND user_id = 5 AND status = 'closed'"

statements = []
until where_cause.empty?
  statement = where_cause.slice!(/\A.*?(?=[\s](and|between)[\s]|\z)/mi)

  if where_cause.match? /\A[\s]between[\s]/i
    between = /\A[\s]between[\s].*?[\s]and[\s].*?(?=[\s]and[\s]|\z)/mi
    statement << where_cause.slice!(between)
  elsif where_cause.match? /\A[\s]and[\s]/i
    where_cause.slice!(/\A[\s]and[\s]/i)
  end

  statements << statement unless statement.empty?
end

pp statements
# ["created_at BETWEEN '2018-01-01T00:00:00+05:30' AND '2019-01-01T00:00:00+05:30'",
#  "updated_at BETWEEN '2018-05-01T00:00:00+05:30' AND '2019-05-01T00:00:00+05:30'",
#  "user_id = 5",
#  "status = 'closed'"]

Note: Ruby uses \A to match the start of the string and \z to match the end of a string instead of the usual ^ and $, which match the beginning and ending of a line respectively. See the regexp anchor documentation.

You can replace every [\s] with \s if you like. I've added them in to make the regex more readable.

Keep in mind that this solution isn't perfect, but might give you an idea how to solve the issue. The reason I say this is because it doesn't account for the words and/between in column name or string context.

The following where cause:

where_cause = "name = 'Tarzan AND Jane'"

Will output:

#=> ["name = 'Tarzan", "Jane'"]

This solution also assumes correctly structured SQL queries. The following queries don't result in what you might think:

where_cause = "created_at = BETWEEN AND"
# TypeError: no implicit conversion of nil into String
# ^ does match /\A[\s]between[\s]/i, but not the #slice! argument 
where_cause = "id = BETWEEN 1 AND 2 BETWEEN 1 AND 3"
#=> ["id = BETWEEN 1 AND 2 BETWEEN 1", "3"]

1 Comment

Also note that this solution mutates the string. If you don't want this make sure to #dup it beforehand.
0

I'm not certain if I understand the question, particularly in view of the previous answers, but if you simply wish to extract the indicated substrings from your string, and all column names begin with lowercase letters, you could write the following (where str holds the string given in the question):

str.split(/ +AND +(?=[a-z])/)
  #=> ["created_at BETWEEN '2018-01-01T00:00:00+05:30' AND '2019-01-01T00:00:00+05:30'",
  #    "updated_at BETWEEN '2018-05-01T00:00:00+05:30' AND '2019-05-01T00:00:00+05:30'",
  #    "user_id = 5",
  #    "status = 'closed'"]

The regular expression reads, "match one or more spaces, followed by 'AND', followed by one or more spaces, followed by a positive lookahead that contains a lowercase letter". Being in a positive lookahead, the lowercase letter is not part of the match that is returned.

1 Comment

Datt, I'd be interested in knowing if this is what you wanted.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.