178

How do I URI::encode a string like:

\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a

to get it in a format like:

%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A

as per RFC 1738?

Here's what I tried:

irb(main):123:0> URI::encode "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a"
ArgumentError: invalid byte sequence in UTF-8
    from /usr/local/lib/ruby/1.9.1/uri/common.rb:219:in `gsub'
    from /usr/local/lib/ruby/1.9.1/uri/common.rb:219:in `escape'
    from /usr/local/lib/ruby/1.9.1/uri/common.rb:505:in `escape'
    from (irb):123
    from /usr/local/bin/irb:12:in `<main>'

Also:

irb(main):126:0> CGI::escape "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a"
ArgumentError: invalid byte sequence in UTF-8
    from /usr/local/lib/ruby/1.9.1/cgi/util.rb:7:in `gsub'
    from /usr/local/lib/ruby/1.9.1/cgi/util.rb:7:in `escape'
    from (irb):126
    from /usr/local/bin/irb:12:in `<main>'

I looked all about the internet and haven't found a way to do this, although I am almost positive that the other day I did this without any trouble at all.

1

8 Answers 8

208
str = "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a".force_encoding('ASCII-8BIT')
puts CGI.escape str


=> "%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A"
Sign up to request clarification or add additional context in comments.

5 Comments

force_encoding('binary') might be a more self-documenting choice.
They deprecated that method, use * CGI.escape * instead. -> http://www.ruby-forum.com/topic/207489#903709. You should also be able to use URI.www_form_encode * URI.www_form_encode_component *, but I have never used those
No need to require 'open-uri' here. Did you mean require 'uri'?
@J-Rou, CGI.escape can escape whole URL, it does not selectively escapes query parameters, for instance, if you pass 'a=&!@&b=&$^' to CGI.escape it will escape whole thing with query separators & so this could be used only to query values. I suggest using addressable gem , it is more intellectual working with urls.
I needed to access files on remote server. Encoding with CGI didn't work, but URI.encode did the work just fine.
123

Nowadays, you should use ERB::Util.url_encode or CGI.escape. The primary difference between them is their handling of spaces:

>> ERB::Util.url_encode("foo/bar? baz&")
=> "foo%2Fbar%3F%20baz%26"

>> CGI.escape("foo/bar? baz&")
=> "foo%2Fbar%3F+baz%26"

CGI.escape follows the CGI/HTML forms spec and gives you an application/x-www-form-urlencoded string, which requires spaces be escaped to +, whereas ERB::Util.url_encode follows RFC 3986, which requires them to be encoded as %20.

See "What's the difference between URI.escape and CGI.escape?" for more discussion.

2 Comments

Neither of those commands are available in Ruby 2.7.3 and up. I think nowadays has passed.
@karatedog They're still around! You might need to require 'erb' or require 'cgi'.
75
str = "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a"
require 'cgi'
CGI.escape(str)
# => "%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A"

Taken from @J-Rou's comment

Comments

15

I was originally trying to escape special characters in a file name only, not on the path, from a full URL string.

ERB::Util.url_encode didn't work for my use:

helper.send(:url_encode, "http://example.com/?a=\11\15")
# => "http%3A%2F%2Fexample.com%2F%3Fa%3D%09%0D"

Based on two answers in "https://stackoverflow.com/questions/34274838/why-is-uri-escape-marked-as-obsolete-and-where-is-this-regexpunsafe-constant", it looks like URI::RFC2396_Parser#escape is better than using URI::Escape#escape. However, they both are behaving the same to me:

URI.escape("http://example.com/?a=\11\15")
# => "http://example.com/?a=%09%0D"
URI::Parser.new.escape("http://example.com/?a=\11\15")
# => "http://example.com/?a=%09%0D"

UPDATE: I think it's from Ruby 3.0, URI.escape does not work any more. I have not found replacement except URI::Parser.new.escape yet.

3 Comments

The only actual answer that I could find. Thank you.
This whole issue is a mess! Thanks for shedding some real light on it. I wasted at least a day chasing my tail before finding this!
I cannot believe that Ruby team have taken out a function without putting a replacement in place for it. URI.encode took a whole URL, encoded query string parts without messing up the whole of the URL and replacing :// too. With URI.www_encode_form_component doesn't have the same behaviour!
12

You can use Addressable::URI gem for that:

require 'addressable/uri'   
string = '\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a'
Addressable::URI.encode_component(string, Addressable::URI::CharacterClasses::QUERY)
# "%5Cx12%5Cx34%5Cx56%5Cx78%5Cx9a%5Cxbc%5Cxde%5Cxf1%5Cx23%5Cx45%5Cx67%5Cx89%5Cxab%5Cxcd%5Cxef%5Cx12%5Cx34%5Cx56%5Cx78%5Cx9a" 

It uses more modern format, than CGI.escape, for example, it properly encodes space as %20 and not as + sign, you can read more in "The application/x-www-form-urlencoded type" on Wikipedia.

2.1.2 :008 > CGI.escape('Hello, this is me')
 => "Hello%2C+this+is+me" 
2.1.2 :009 > Addressable::URI.encode_component('Hello, this is me', Addressable::URI::CharacterClasses::QUERY)
 => "Hello,%20this%20is%20me" 

1 Comment

Also can do like this: CGI.escape('Hello, this is me').gsub("+", "%20") => Hello%2C%20this%20is%20me" if don't want to use any gems
8

Code:

str = "http://localhost/with spaces and spaces"
encoded = URI::encode(str)
puts encoded

Result:

http://localhost/with%20spaces%20and%20spaces

2 Comments

If the receiving server is old, it might not respond well to CGI.escape. This is still a valid alternative.
too old solution
6

I created a gem to make URI encoding stuff cleaner to use in your code. It takes care of binary encoding for you.

Run gem install uri-handler, then use:

require 'uri-handler'

str = "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a".to_uri
# => "%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A"

It adds the URI conversion functionality into the String class. You can also pass it an argument with the optional encoding string you would like to use. By default it sets to encoding 'binary' if the straight UTF-8 encoding fails.

Comments

2

If you want to "encode" a full URL without having to think about manually splitting it into its different parts, I found the following worked in the same way that I used to use URI.encode:

URI.parse(my_url).to_s

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.