0

The regular expression I am looking for have to be able to deal with different patterns.

Those are the 3 different patterns.

"10.1234/altetric55,Awesome Steel Chair,1011-2513"
"\"Sporer, Kihn and Turner\",2885-6503"
"Bartell-Collins,1167-8230"

I will have to pass this regular expression to a ruby split method.

line.split(/regular_expression/)

The idea is to split the test when there is a comma except (like in the second expression) if the comma is part of the text

thanks

7
  • See Regex to pick commas outside of quotes. It should solve your issue. Commented Nov 2, 2015 at 21:09
  • 1
    Please show your desired output for each of the three strings. Commented Nov 2, 2015 at 21:17
  • 1
    What is wrong with CSV parser? See this IDEONE demo or this one. Commented Nov 2, 2015 at 21:33
  • @stribizhev the expected output is ["10.1234/altetric55", "Awesome Steel Chair", "1011-2513] ["Sporer, Kihn and Turner", "2885-6503"] ["Bartell-Collins", "1167-8230"] Commented Nov 2, 2015 at 21:34
  • 1
    Using Ruby's built-in CSV class is my recommendation. It's designed to handle the sort of comma-separated-values you show, including those with embedded commas inside quotes. Don't try to do it with a regex, instead rely on the pre-written, well-tested code. Commented Nov 2, 2015 at 21:57

2 Answers 2

2

In this case, don't try to split on each commas that is not enclosed between quotes. Try to find all that is not a comma or content between quotes with this pattern:

"10.1234/altetric55,Awesome Steel Chair,1011-2513".scan(/[^,"]*(?:"[^"\\]*(?:\\.[^"\\]*)*"[^,"]*)*/)

or to avoid empty items:

"10.1234/altetric55,Awesome Steel Chair,1011-2513".scan(/[^,"]+(?:"[^"\\]*(?:\\.[^"\\]*)*"[^,"]*)*|(?:"[^"\\]*(?:\\.[^"\\]*)*")+/)

But you can avoid these complex questions using the CSV class:

require 'csv'
CSV.parse("\"Sporer, Kihn and Turner\",2885-6503")
=> [["Sporer, Kihn and Turner", "2885-6503"]] 
Sign up to request clarification or add additional context in comments.

10 Comments

Did you reopen the question? Why?
@stribizhev: Yes I do, because all solutions in the answers of the linked question are ,(?=stupid pattern to know if I am not between quotes until the end of the string) (that stops to work if the string is a bit long).
You have answered almost the same question a day or two ago, why not link to YOUR answer? The questions like this are annoying, there must be a good duplicate original.
@CasimiretHippolyte ("10.1234/altetric55,Awesome Steel Chair,1011-2513").split(/[^,"]*(?:"[^"\\]*(?:\\.[^"\\]*)*"[^,"]*)*/ => /[^,"]*(?:"[^"\\]*(?:\\.[^"\\]*)*"[^,"]*)*/ pry(#<FileProcessor::Csv>)> => ["", ",", ","] Do I miss something?
Using CSV parser is my suggestion.
|
0

Here's another way, using recursion:

def split_it(str)
  outside_quotes = true
  pos = str.size.times.find do |i|
    case str[i]
    when '"'
      outside_quotes = !outside_quotes
      false
    when ','
      outside_quotes
    else false
    end
  end
  ret = pos ? [str[0,pos], *split_it(str[pos+1..-1])] : [str]
end

["10.1234/altetric55,Awesome Steel Chair,1011-2513",
"\"Sporer, Kihn and Turner\",2885-6503\",,,3\"",
"Bartell-Collins,1167-8230"].map { |s| split_it(s) }
  #=> [["10.1234/altetric55", "Awesome Steel Chair", "1011-2513"],
  #    ["\"Sporer, Kihn and Turner\"", "2885-6503\",,,3\""],
  #    ["Bartell-Collins", "1167-8230"]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.