1

I'm trying to parse a text file. Occurrences of the following format are buried within continuous text (so they are never at the start of a line, for example):

"name":"Fred Flintstone","neighborhood":  ...
... "name":"Barney Rubble","address":

I need to find the occurrence of "name":. name appears in other places, so only the word name with the quotes and colon should match. Then I need to print or store the text inside the first pairs of quotes to follow. I'd like to have it clean with just Barney Rubble on one line, Fred Flintstone on another.

This is what I've come up with:

File.open('textfile.txt','r') do |s|
  s.each_line do |eachline|
    wordmatch = eachline.match(/"name":"(.*?)(?=["])/)
    puts wordmatch
  end
end

but it doesn't work. The results appear like:

(lots of space)
"name":"random"
(lots of space)
"name":"Barney Rubble

It prints lots of spaces. It also is not showing all results. I don't see why.

So, apologies if it's confusing. Just to clarify. after the parser finds "name": everything inside the first, immediately following set of quotes needs to be selected/stored/printed. in the first example only Fred Flintsone should be selected, nothing else until the next "name": is encountered. Any characters and amount of space inside the quotes is legitimate.

1
  • 2
    It looks like you have a JSON string. If it is, I'd rather use a JSON parser for this task. If it is not, you can try using .scan(/"name":"([^"]+)/). The captured texts are the ones you must be looking for. Commented Dec 5, 2015 at 9:58

3 Answers 3

3

You could do it with a non greedy expression:

s = '"name":"Fred Flintstone","neighborhood":"foo","name":"Barney Rubble","address":"bar"'
s.scan(/"name":"(.*?)"/).flatten  #  => ["Fred Flintstone", "Barney Rubble"]
Sign up to request clarification or add additional context in comments.

3 Comments

thanks. When I use this with the text file I get an 'Undefined method 'scan'. parse.rb:26:in block in <main>': undefined method scan' for #<File:textfile.txt (closed)> (NoMethodError)
boggle, what do you mean, "When I use this with the text file"? Here s is a string consisting of the contents of the file, e.g., s = File.read(" my_file.txt").
@CarySwoveland Sorry, just to clarify. I used s= File.open with a "r" and this caused the error for some reason. Using File.read (as you suggested) worked and didn't give the error above but the result is still the same unfortunately
2

match only finds the first occurrence on a line; it sounds like you may have multiple matches per line, in which case you need to use scan with a loop body:

File.read('textfile.txt').scan(/"name":"([^"]*)"/) do |wordmatch|
  puts wordmatch
end

But that format looks suspisciously JSONlike, and if it's JSON, you should treat it as such:

require 'json'
require 'pp'

obj = JSON.parse(File.read 'textfile.txt')
pp obj

Then look at the structure, which is probably an array of hashes, so what you want is

puts obj.map { |o| o['name'] }

or similar.

Comments

1

You can use this regex pattern

/(?<="name":")([\w\s]+)/

Meaning:

(?<="name":") will look for occurrences of "name":", but will not include them in the result positive look-behind

([\w\s]+) will match string that contains letters or empty space, in your case until character ", that is the name

You can also check these sites: Rubular, Regex101, this can help you in building your regex

3 Comments

unfortunately it's doing exactly the same thing with this regular expression. I'm starting to wonder if there's something wrong with the text file, although I can pick out every mention of 'name:' without problems. It just seems to be problematic when trying to find the following text.
You also need a positive lookahead for a double-quote. If one is not found it's not a match. The question does not specify what characters are permitted between the double quotes, so you are have no basis for limiting the them to word characters and spaces.
@CarySwoveland I just edited the original question and tagged on a clarification. I've tried everything suggested so far, to varying levels of success but nothing is giving me more than one or two of the values required.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.