Splitting a String in Multiple points using Ruby on Rails

Question

I have a string in my DB that represents notes for a user. I want to split this string up so I can separate each note into the content, user, and date.

Here is the format of the String:

"Example Note <i>Josh Test 12:53 PM on 8/14/12</i><br><br> Another example note <i>John Doe 12:00 PM on 9/15/12</i><br><br>  Last Example Note <i>Joe Smoe 1:00 AM on 10/12/12</i><br><br>"

I need to break this into an array of

["Example Note",  "Josh Test", "12:53 8/14/12", "Another example note", "John Doe", "12:00 PM 9/15/12", "Last Example Note", "Joe Smoe", "1:00 AM 10/12/12"]

I am still experimenting with this. Any ideas are very welcomed thank you! :)

That's not the format of the string, it's an example. How much variation is there? Asked another way, what criteria do you use to split? — Mark Thomas
– Mark Thomas, Commented May 31, 2013 at 19:24
There is no variation Each note will begin right away, then the content will end with a ' <i>' then the name will always end with a space ' ' then a number. THe time and date are seperated with ' on ', and the whole note always ends with '</i><br><br>'. No variation. — user1977840
– user1977840, Commented May 31, 2013 at 19:34

Marcelo De Polli · Accepted Answer · 2013-05-31 20:53:43Z

3

You could use regex for a simpler approach.

s = "Example Note <i>Josh Test 12:53 PM on 8/14/12</i><br><br> Another example note <i>John Doe 12:00 PM on 9/15/12</i><br><br>  Last Example Note <i>Joe Smoe 1:00 AM on 10/12/12</i><br><br>" 
s.split(/\s+<i>|<\/i><br><br>\s?|(?<!on) (?=\d)/)
=> ["Example Note", "Josh Test", "12:53 PM on 8/14/12", "Another example note", "John Doe", "12:00 PM on 9/15/12", " Last Example Note", "Joe Smoe", "1:00 AM on 10/12/12"]

The datetime element is off format, but perhaps it would be acceptable to apply some formatting on them separately.

Edit: Removed unnecessary + character.

edited May 31, 2013 at 20:53

answered May 31, 2013 at 20:24

Marcelo De Polli

29.4k4 gold badges41 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user1977840 Over a year ago

This is what I was looking for in the first place thank you. I am horrible with Regexp. Definitely going to have to study up on that.

Noz · Accepted Answer · 2013-05-31 19:47:47Z

1

You can use Nokogiri to parse out the required text using Xpath/CSS selectors. Just to give you a simple example with bare-bones parsing to get you started, the following maps every i tag as a new element in an array:

require 'nokogiri'

html = Nokogiri::HTML("Example Note <i>Josh Test 12:53 PM on 8/14/12</i><br><br> Another example note <i>John Doe 12:00 PM on 9/15/12</i><br><br>  Last Example Note <i>Joe Smoe 1:00 AM on 10/12/12</i><br><br>")

my_array = html.css('i').map {|text| text.content}
#=> ["Josh Test 12:53 PM on 8/14/12", "John Doe 12:00 PM on 9/15/12", "Joe Smoe :00 AM on 10/12/12"]

With the CSS selector you could just as easily do something like:

require 'nokogiri'

html = Nokogiri::HTML("<h1>My Message</h1><p>Hi today's date is: <time>Firday, May 31st</time></p>")
message_header = html.css('h1').first.content #=> "My Message"
message_body = html.css('p').first.content #=> "Hi today's date is:"
message_sent_at = html.css('p > time').first.content #=> "Friday, May 31st"

edited May 31, 2013 at 19:47

answered May 31, 2013 at 19:37

Noz

6,3573 gold badges50 silver badges82 bronze badges

2 Comments

user1977840 Over a year ago

Are you saying the html tags should already exist because I am unable to edit the Database's data. It will always be the way I had it before because that's the way it was saved unfortunately for 100,000's of users. I'm trying to fix someones mistake.

Noz Over a year ago

@user1977840 That was just an example to get you started. So long as there's some common pattern to the way the HTML data is structured in the database (e.g., date and name data will always be after tag X and before tag Y), you can tailor your Nokogiri selector as needed to select and parse the relevant portions of data. If the HTML isn't well formed you might be better off using XSS selectors instead.

ajt · Accepted Answer · 2013-05-31 19:59:18Z

0

maybe this could be useful

require 'date'
require 'time'

text = "Example Note <i>Josh Test 12:53 PM on 8/14/12</i><br><br> Another example note <i>John Doe 12:00 PM on 9/15/12</i><br><br>  Last Example Note <i>Joe Smoe 1:00 AM on 10/12/12</i><br><br>"

notes=text.split('<br><br>')

pro_notes = []

notes.each do |note_e|
  notes_temp = note_e.split('<i>')
  words = notes_temp[1].split(' ')

  temp = words[5].gsub('</i>','')
  a = temp.split('/')

  full_name = words[0] + ' ' + words[1]
  nn = notes_temp[0]
  dt = DateTime.parse(a[2] +'/'+ a[0] +'/'+ a[1] +' '+ words[2])

  pro_notes << [full_name, nn, dt]
end

answered May 31, 2013 at 19:59

ajt

5528 silver badges25 bronze badges

1 Comment

user1977840 Over a year ago

Perfect. I added a strip in there to get rid of the white space and it worked thank you! :)

Collectives™ on Stack Overflow

Splitting a String in Multiple points using Ruby on Rails

3 Answers 3

1 Comment

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related