0

This is an example since it is not easy to explain:

<li id="l_f6a1ok3n4d4p" class="online"> <div class="link"> <a href="javascript:show('f6a1ok3n4d4p','random%20strings%204',%20'site2.com');%20" onclick="visited('f6a1ok3n4d4p');" style="float:left;">random strings - 4</a> <a style="float:left; display:block; padding-top:3px;" href="http://www.webtrackerplus.com/?page=flowplayerregister&amp;a_aid=&amp;a_bid=&amp;chan=flow"><img border="0" src="/resources/img/fdf.gif"></a> <!-- a class="none" href="#">random strings - 4  site2.com - # - </a --> </div> <div class="params"> <span>Submited: </span>7 June 2015  | <span>Host: </span>site2.com </div> <div class="report"> <a title="" href="javascript:report(3191274,%203,%202164691,%201)" class="alert"></a> <a title="" href="javascript:report(3191274,%203,%202164691,%200)" class="work"></a> <b>100% said work</b> </div> <div class="clear"></div> </li> <li id="l_zsgn82c4b96d" class="online"> <div class="link"> <a href="javascript:show('zsgn82c4b96d','random%20strings%204',%20'site1.com');%20" onclick="visited('zsgn82c4b96d');" style

In the above content I want to extract from

javascript:show('f6a1ok3n4d4p','random%20strings%204',%20'site2.com')

the string "f6a1ok3n4d4p" and "site2.com" then make it as

http://site2.com/f6a1ok3n4d4p

and same for

javascript:show('zsgn82c4b96d','random%20strings%204',%20'site1.com')

to become

http://site1.com/zsgn82c4b96d

I need it to be done with Ruby regex.

1
  • 2
    Welcome to Stack Overflow. What have you tried toward solving this problem? There are several things you're asking for, but you haven't showed any code you've written, so it really sounds like you're asking us to write a solution for you, which isn't how Stack Overflow works. Also, why do you need it to be done using a regular expression? Which part? Also, please reduce your sample HTML to the bare minimum necessary to demonstrate what you're working on. Anything beyond that wastes our time as we try to help you. Commented Jun 30, 2015 at 6:18

3 Answers 3

1

You can proceed like this:

require 'uri'
str = "javascript:show('f6a1ok3n4d4p','random%20strings%204',%20'site2.com')"

# regex scan to get values within javascript:show
vals = str.scan(/javascript:show\((.*)\)/)[0][0].split(',')
# => ["'f6a1ok3n4d4p'", "'random%20strings%204'", "%20'site2.com'"]

# joining resultant Array elements to generate url
url = "http://" +  URI.decode(a.last).tr("'", '').strip + "/" + a.first.tr("'", '')
# => "http://site2.com/f6a1ok3n4d4p"

obviously my answer is not foolproof. You can make it better with checks for what if scan returns []?

Sign up to request clarification or add additional context in comments.

Comments

1

This should do the trick, though the regexp isn't particularly flexible.

js_link_regex = /href=\"javascript:show\('([^']+)','[^']+',%20'([^']+)'\)/
link = <<eos
  <li id="l_f6a1ok3n4d4p" class="online"> <div class="link"> <a href="javascript:show('f6a1ok3n4d4p','random%20strings%204',%20'site2.com');%20" onclick="visited('f6a1ok3n4d4p');" style="float:left;">random strings - 4</a> <a style="float:left; display:block; padding-top:3px;" href="http://www.webtrackerplus.com/?page=flowplayerregister&amp;a_aid=&amp;a_bid=&amp;chan=flow"><img border="0" src="/resources/img/fdf.gif"></a> <!-- a class="none" href="#">random strings - 4  site2.com - # - </a --> </div> <div class="params"> <span>Submited: </span>7 June 2015  | <span>Host: </span>site2.com </div> <div class="report"> <a title="" href="javascript:report(3191274,%203,%202164691,%201)" class="alert"></a> <a title="" href="javascript:report(3191274,%203,%202164691,%200)" class="work"></a> <b>100% said work</b> </div> <div class="clear"></div> </li> <li id="l_zsgn82c4b96d" class="online"> <div class="link"> <a href="javascript:show('zsgn82c4b96d','random%20strings%204',%20'site1.com');%20" onclick="visited('zsgn82c4b96d');" style
eos

matches = link.scan(js_link_regex)
matches.each do |match|
  puts "http://#{match[1]}/#{match[0]}"
end

Comments

1

To just match your case,

str = "javascript:show('f6a1ok3n4d4p','random%20strings%204',%20'site2.com')"

parts = str.scan(/'([\w|\.]+)'/).flatten # => ["f6a1ok3n4d4p", "site2.com"]

puts "http://#{parts[1]}/#{parts[0]}" # => http://site2.com/f6a1ok3n4d4p

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.