10

am new to ruby using regular expression .how can i remove https and http and www from a string

server= http://france24.miles.com
server= https://seloger.com

from these sites i want to remove all http ,https and www

france24.miles.com
seloger.com

i used following code but it is not woking for me

server = server.(/^https?\:\/\/(www.)?/,'')

5 Answers 5

18
server = server.(/^https?\:\/\/(www.)?/,'')

This didn't work, because you aren't calling a method of the string server. Make sure you call the sub method:

server = server.sub(/^https?\:\/\/(www.)?/,'')

Example

> server = "http://www.stackoverflow.com"
> server = server.sub(/^https?\:\/\/(www.)?/,'')
stackoverflow.com

As per the requirement if you want it to work with the illegal format http:\\ as well, use the following regex:

server.sub(/https?\:(\\\\|\/\/)(www.)?/,'')
Sign up to request clarification or add additional context in comments.

2 Comments

Updated answer to make it work with https:\\ as well
@Psl, http:\\ is simply wrong. You are doing your users a disservice if you accept such a URL.
8

Std-lib URI is dedicated for such kind of work. Using this would be simpler and may be more reliable

require 'uri'

uri = URI.parse("http://www.ruby-lang.org/")

uri.host
=> "www.ruby-lang.org"

uri.host.sub(/\Awww\./, '')
=> "ruby-lang.org"

3 Comments

i used uri = URI.parse(server) uri.host content = Net::HTTP.get(uri.host.gsub('\Awww\.', ''), '/status') but getting bad URI(is not URI?): http:\\ingen.twosmiles.com
@Psi, Net::HTTP.get expect a full url with protocol http or https etc. If this is your usage, you should not remove 'http'
This doesn't appear to work - I still get the full www.ruby-lang.org string back unless I do uri.host.sub(/\Awww\./, '')
6

See the String#sub(...) method.

Also, consider using the %r{...} literal notation for Regexp objects so that forward-slashes (/) are easier to recognize:

def trim_url(str)
  str.sub %r{^https?:(//|\\\\)(www\.)?}i, ''
end

trim_url 'https://www.foo.com' # => "foo.com"
trim_url 'http://www.foo.com'  # => "foo.com"
trim_url 'http://foo.com'      # => "foo.com"
trim_url 'http://foo.com'      # => "foo.com"

Here is what each part of the regular expression means:

%r{^https?:(//|\\\\)(www\.)?}
#  │├──┘├┘│├───────┘ ├─┘├┘ └── everything in the group (...), or nothing.
#  ││   │ ││         │  └── the period character "."
#  ││   │ ││         └── the letters "www".
#  ││   │ │└── the characters "//" or "\\".
#  ││   │ └── the colon character ":".
#  ││   └── the letter "s", or nothing.
#  │└── the letters "http".
#  └── the beginning of the line.

2 Comments

Very nice, how do you generate lines to description like that?
@MartinKonecny: thanks, I just copy/paste UTF-8 box-drawing characters.
2
def strip_url(url)
    return url.to_s.sub!(/https?(\:)?(\/)?(\/)?(www\.)?/, '') if url.include?("http")
    url.to_s.sub!(/(www\.)?/,'') if url.include?("www")
  end

This will change in place the provided url, stripped of any leading http(s) or www. It covers the following formats:

  • http://www.example.com
  • http:/www.example.com
  • http:www.example.com
  • https://www.example.com
  • https:/www.example.com
  • https:www.example.com
  • http://example.com
  • http:/example.com
  • http:example.com
  • https://example.com
  • https:/example.com
  • https:example.com
  • www.example.com
  • example.com

You'll end up with example.com using this method.

Comments

1

With this regex: server\s*=\s*\Khttps?://(?:www\.)?

In Ruby 2.0+

result = subject.gsub(/server\s*=\s*\Khttps?:\/\/(?:www\.)?/, '')

In the demo, see the replacements at the bottom.

Hang tight for explanation. :)

Explanation

  • server\s*=\s* matches server= with optional spaces, to make sure we are looking at the right strings
  • The \K tells the engine to drop what was matched so far from the final match
  • https? matches http with an optional s
  • :// matches these literal characters
  • (?:www\.)? matches an optional www.
  • we replace the match with an empty string

Earlier Versions of Ruby

\K is only supported from Ruby 2.0+. Earlier versions have to use a lookbehind:

result = subject.gsub(/(?:(?<=server=)|(?<=server= ))https?:\/\/(?:www\.)?/, '')

6 Comments

FYI, added demo and explanation. :)
but in my code it is not working..and what is subject??
What version of Ruby?
i have this string http:\\integration.twosmiles.com ` server = ARGV[0] puts server result = server.gsub(/server\s*=\s*\Khttps?:\/\/(?:www\.)?/, '') puts result`
\K is only supported from Ruby 2.0+. Added a second option for earlier versions. :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.