How to decode URL-encoded string in shell?

Question

I have a file with a list of user-agents which are encoded. E.g.:

Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en

I want a shell script which can read this file and write to a new file with decoded strings.

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en

I have been trying to use this example to get it going but it is not working so far.

$ echo -e "$(echo "%31+%32%0A%33+%34" | sed 'y/+/ /; s/%/\\x/g')"

My script looks like:

#!/bin/bash
for f in *.log; do
  echo -e "$(cat $f | sed 'y/+/ /; s/%/\x/g')" > y.log
done

line 5: 'x' should be double escaped (s/%/\x/g -> s/%/\\x/g — barti_ddu
– barti_ddu, Commented Jun 6, 2011 at 10:38
@F. Hauri What about the last answer(sorted by highest score)? — Victor Lee
– Victor Lee, Commented May 23, 2022 at 7:01
@VictorLee This kind of function are mostly used to populate variables. Using forks each time you have to urldecode is overkill and counter productive — F. Hauri - Give Up GitHub
– F. Hauri - Give Up GitHub, Commented May 25, 2022 at 4:19

netDesign8 · Accepted Answer · 2020-05-06 12:19:50Z

193

Here is a simple one-line solution.

$ function urldecode() { : "${*//+/ }"; echo -e "${_//%/\\x}"; }

It may look like perl :) but it is just pure bash. No awks, no seds ... no overheads. Using the : builtin, special parameters, pattern substitution and the echo builtin's -e option to translate hex codes into characters. See bash's manpage for further details. You can use this function as separate command

$ urldecode https%3A%2F%2Fgoogle.com%2Fsearch%3Fq%3Durldecode%2Bbash
https://google.com/search?q=urldecode+bash

or in variable assignments, like so:

$ x="http%3A%2F%2Fstackoverflow.com%2Fsearch%3Fq%3Durldecode%2Bbash"
$ y=$(urldecode "$x")
$ echo "$y"
http://stackoverflow.com/search?q=urldecode+bash

edited May 6, 2020 at 12:19

netDesign8

771 silver badge8 bronze badges

answered Jun 15, 2016 at 16:30

guest

1,9551 gold badge11 silver badges2 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Justin Putney Over a year ago

Would love some more explanation on the pattern substitution. This function works for me, but it changes the characters in a way that makes file paths not work with the unzip function.

Matthieu Over a year ago

@JustinPutney ${*//+/ } will replace all + with space and ${_//%/\\x} will replace all % with \x.

ThorSummoner Over a year ago

just want to mention this is horrendously slow for me; for 50k urls, bash: 0m3.767s python: 0m0.200s (python one liner below: stackoverflow.com/a/21693459/1695680)

Adam Katz Over a year ago

@nhed – : is a no-op in bash, but this code plays on the value of $_, which “expands to the last argument to the previous simple command” (which is to say this is a perl-level obfuscation). It’d be more legible as urldecode() { local i="${*//+/ }"; echo -e "${i//%/\\x}"; } (replace each + with a space, then replace each % with \x so bash knows to interpret the escape sequences properly).

netDesign8 Over a year ago

@8bitjunkie, you need to insert "function" ahead of the example like this: function urldecode() { : "${*//+/ }"; echo -e "${_//%/\\x}"; } And then you can execute the function like so: urldecode "Current%3A+7995%28Mbps%29+%2F+Limit+5000%28Mbps%29" Result: Current: 7995(Mbps) / Limit 5000(Mbps)

|

Rory O'Kane · Accepted Answer · 2023-04-30 16:29:21Z

70

If you are a Python developer, this may be preferable:

For Python 3.x (default):

echo -n "%21%20" | python3 -c "import sys; from urllib.parse import unquote; print(unquote(sys.stdin.read()));"

For Python 2.x (deprecated):

echo -n "%21%20" | python -c "import sys, urllib as ul; print ul.unquote(sys.stdin.read());"

urllib is really good at handling URL parsing.

edited Apr 30, 2023 at 16:29

Rory O'Kane

30.7k11 gold badges101 silver badges133 bronze badges

answered Feb 11, 2014 at 4:55

Jay

1,82115 silver badges11 bronze badges

6 Comments

Rodrigo Over a year ago

Nice, but I would change a little bit to use argv and use is as an alias. Here is an example for encoding: alias encode='python2 -c "import sys, urllib as ul; print ul.quote(sys.argv[1]);"'

jakebrinkmann Over a year ago

Modified for python 3: echo "%21%20" | python -c "import sys; from urllib.parse import unquote; print(unquote(sys.stdin.read()));"

jk2K Over a year ago

if you want to convert + to blank, for python2, you can use unquote_plus()

Sridhar Sarnobat Over a year ago

Unlike all the other solutions, this is practical in an interactive shell

Taras Sereda Over a year ago

Love this solution! A slight modification to take files from stdin will look like this python3 -c "import sys; from urllib.parse import unquote_plus; print(unquote_plus(sys.stdin.read()));" < input_file

|

brendan · Accepted Answer · 2018-11-28 12:38:47Z

26

With BASH, to read the per cent encoded URL from standard in and decode:

while read; do echo -e ${REPLY//%/\\x}; done

Press CTRL-D to signal the end of file(EOF) and quit gracefully.

You can decode the contents of a file by setting the file to be standard in:

while read; do echo -e ${REPLY//%/\\x}; done < file

You can decode input from a pipe either, for example:

echo 'a%21b' | while read; do echo -e ${REPLY//%/\\x}; done

The read built in command reads standard in until it sees a Line Feed character. It sets a variable called REPLY equal to the line of text it just read.
${REPLY//%/\\x} replaces all instances of '%' with '\x'.
echo -e interprets \xNN as the ASCII character with hexadecimal value of NN.
while repeats this loop until the read command fails, eg. EOF has been reached.

The above does not change '+' to ' '. To change '+' to ' ' also, like guest's answer:

while read; do : "${REPLY//%/\\x}"; echo -e ${_//+/ }; done

: is a BASH builtin command. Here it just takes in a single argument and does nothing with it.
The double quotes make everything inside one single parameter.
_ is a special parameter that is equal to the last argument of the previous command, after argument expansion. This is the value of REPLY with all instances of '%' replaced with '\x'.
${_//+/ } replaces all instances of '+' with ' '.

This uses only BASH and doesn't start any other process, similar to guest's answer.

edited Nov 28, 2018 at 12:38

answered Mar 6, 2017 at 22:22

brendan

3943 silver badges12 bronze badges

3 Comments

Robin A. Meade Over a year ago

The decoding of + to SPACE should occur before the percent decoding. See guest's answer for an example of the correct order.

brendan Over a year ago

: "${REPLY//%/\\x}"; echo -e ${_//+/ } The order here is replace % with \x, replace + with ' ', interpret characters with \xNN notation. Why do you think it matters whether you replace % characters first or + characters @RobinA.Meade ?

Robin A. Meade Over a year ago

You're right, sorry. I see now your answer does replace + with SPACE before the \xNN are evaluated. My test string was The time is 2013-12-31T14:00:00+00:00 which I encoded at meyerweb.com/eric/tools/dencoder .Your answer correctly decodes it with the + in the time stamp preserved.

Mr. Lance E Sloan · Accepted Answer · 2017-07-18 07:29:33Z

21

This is what seems to be working for me.

#!/bin/bash
urldecode(){
  echo -e "$(sed 's/+/ /g;s/%\(..\)/\\x\1/g;')"
}

for f in /opt/logs/*.log; do
    name=${f##/*/}
    cat $f | urldecode > /opt/logs/processed/$HOSTNAME.$name
done

Replacing '+'s with spaces, and % signs with '\x' escapes, and letting echo interpret the \x escapes using the '-e' option was not working. For some reason, the cat command was printing the % sign as its own encoded form %25. So sed was simply replacing %25 with \x25. When the -e option was used, it was simply evaluating \x25 as % and the output was same as the original.

Trace:

Original: Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en

sed: Mozilla\x252F5.0\x2520\x2528Macintosh\x253B\x2520U\x253B\x2520Intel\x2520Mac\x2520OS\x2520X\x252010.6\x253B\x2520en

echo -e: Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en

Fix: Basically ignore the 2 characters after the % in sed.

sed: Mozilla\x2F5.0\x20\x28Macintosh\x3B\x20U\x3B\x20Intel\x20Mac\x20OS\x20X\x2010.6\x3B\x20en

echo -e: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en

Not sure what complications this would result in, after extensive testing, but works for now.

edited Jul 18, 2017 at 7:29

Mr. Lance E Sloan

3,4425 gold badges39 silver badges52 bronze badges

answered Jun 7, 2011 at 12:42

user785717

1,6552 gold badges11 silver badges8 bronze badges

3 Comments

svante Over a year ago

Works, but there should be a \1 after \\x like echo -e "$(sed 's/+/ /g;s/%$..$/\\x\1/g;')"

Mr. Lance E Sloan Over a year ago

As @svante wrote, the \1 is missing. I've edited the answer to include it. (Plus a couple small formatting/grammar changes to meet the 6-character minimum edit requirement.)

jocap Over a year ago

For a POSIX-compatible version of this, use printf '%b\n'instead of echo -e.

Robin A. Meade · Accepted Answer · 2022-05-22 20:46:40Z

bash idiom for url-decoding

Here is a bash idiom for url-decoding a string held in variabe x and assigning the result to variable y:

: "${x//+/ }"; printf -v y '%b' "${_//%/\\x}"

Unlike the accepted answer, it preserves trailing newlines during assignment. (Try assigning the result of url-decoding v%0A%0A%0A to a variable.)

It also is fast. It is 6700% faster at assigning the result of url-decoding to a variable than the accepted answer.

Caveat: It is not possible for a bash variable to contain a NUL. For example, any bash solution attempting to decode %00 and assign the result to a variable will not work.

Benchmark details

function.sh

#!/bin/bash
urldecode() { : "${*//+/ }"; echo -e "${_//%/\\x}"; }
x=%21%20
for (( i=0; i<5000; i++ )); do
  y=$(urldecode "$x")
done

idiom.sh

#!/bin/bash
x=%21%20
for (( i=0; i<5000; i++ )); do
  : "${x//+/ }"; printf -v y '%b' "${_//%/\\x}"
done

$ hyperfine --warmup 5 ./function.sh ./idiom.sh
Benchmark #1: ./function.sh
  Time (mean ± σ):      2.844 s ±  0.036 s    [User: 1.728 s, System: 1.494 s]
  Range (min … max):    2.801 s …  2.907 s    10 runs
 
Benchmark #2: ./idiom.sh
  Time (mean ± σ):      42.4 ms ±   1.0 ms    [User: 40.7 ms, System: 1.1 ms]
  Range (min … max):    40.5 ms …  44.8 ms    64 runs
 
Summary
  './idiom.sh' ran
   67.06 ± 1.76 times faster than './function.sh'

If you really want a function ...

If you really want a function, say for readability reasons, I suggest the following:

# urldecode [-v var ] argument
#
#   Urldecode the argument and print the result.
#   It replaces '+' with SPACE and then percent decodes.
#   The output is consistent with https://meyerweb.com/eric/tools/dencoder/
#
# Options:
#   -v var    assign the output to shell variable VAR rather than
#             print it to standard output
#
urldecode() {
  local assign_to_var=
  local OPTIND opt
  while getopts ':v:' opt; do
    case $opt in
      v)
        local var=$OPTARG
        assign_to_var=Y
        ;;
      \?)
        echo "$FUNCNAME: error: -$OPTARG: invalid option" >&2
        return 1
        ;;
      :)
        echo "$FUNCNAME: error: -$OPTARG: this option requires an argument" >&2
        return 1
        ;;
      *)
        echo "$FUNCNAME: error: an unexpected execution path has occurred." >&2
        return 1
        ;;
    esac
  done
  shift "$((OPTIND - 1))"
  # Convert all '+' to ' '
  : "${1//+/ }"
  # We exploit that the $_ variable (last argument to the previous command
  # after expansion) contains the result of the parameter expansion
  if [[ $assign_to_var ]]; then
    printf -v "$var" %b "${_//%/\\x}"
  else
    printf %b "${_//%/\\x}"
  fi
}

Example 1: Printing the result to stdout

x='v%0A%0A%0A'
urldecode "$x" | od -An -tx1

Result:

 76 0a 0a 0a

Example 2: Assigning the result of decoding to a shell variable:

x='v%0A%0A%0A'
urldecode -v y "$x"
echo -n "$y" | od -An -tx1

(same result)

This function, while not as fast as the idiom above, is still 1300% faster than the accepted answer at doing assignments due to no subshell being involved. In addition, as shown in the example's output, it preserves trailing newlines due to no command substitution being involved.

Nice! Consider urldecode() { local -n var=$1;shift;: "${*//+/ }"; printf -v var %b "${_//%/\\x}"; }!
Or this: urldecode() { local ret=($'\n');: "${1//+/ }";printf ${2+-v} $2 %b%s "${_//%/\\x}" "${ret[$#-1]}";}. Where you have to quote (or double-quote) URL, then optional 2nd arg is varname if submited. (If not, decoded url will be printed with a newline.)

jamp · Accepted Answer · 2022-05-20 14:51:27Z

9

+100

Just wanted to share this other solution, pure bash:

encoded_string="Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en"
printf -v decoded_string "%b" "${encoded_string//\%/\\x}"
echo $decoded_string
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en

edited May 20, 2022 at 14:51

answered Jun 2, 2014 at 20:56

jamp

2,3231 gold badge19 silver badges29 bronze badges

2 Comments

Léa Gris Over a year ago

Or as a function streaming output (so it can handle %00: url::decode(){ printf %b "${@//\%/\\x}";}

Zlemini Over a year ago

Shortened to printf "%b\n" "${encoded_string//\%/\\x}"

Janus Troelsen · Accepted Answer · 2013-06-03 14:38:23Z

8

Bash script for doing it in native Bash (original source):

LANG=C

urlencode() {
    local l=${#1}
    for (( i = 0 ; i < l ; i++ )); do
        local c=${1:i:1}
        case "$c" in
            [a-zA-Z0-9.~_-]) printf "$c" ;;
            ' ') printf + ;;
            *) printf '%%%.2X' "'$c"
        esac
    done
}

urldecode() {
    local data=${1//+/ }
    printf '%b' "${data//%/\x}"
}

If you want to urldecode file content, just put the file content as an argument.

Here's a test that will run halt if the decoded encoded file content differs (if it runs for a few seconds, the script probably works correctly):

while true
  do cat /dev/urandom | tr -d '\0' | head -c1000 > /tmp/tmp;
     A="$(cat /tmp/tmp; printf x)"
     A=${A%x}
     A=$(urlencode "$A")
     urldecode "$A" > /tmp/tmp2
     cmp /tmp/tmp /tmp/tmp2
     if [ $? != 0 ]
       then break
     fi
done

edited Jun 3, 2013 at 14:38

answered May 2, 2013 at 9:55

Janus Troelsen

21.5k14 gold badges143 silver badges208 bronze badges

2 Comments

Stephane Chazelas Over a year ago

Note that your urldecode assumes the data contains no backslash.

MestreLion Over a year ago

@StephaneChazelas: I believe backslashes are not allowed in properly %-encoded strings

Oleg Bondar' · Accepted Answer · 2013-11-06 04:49:09Z

7

If you have php installed on your server, you can "cat" or even "tail" any file, with url encoded strings very easily.

tail -f nginx.access.log | php -R 'echo urldecode($argn)."\n";'

answered Nov 6, 2013 at 4:49

Oleg Bondar'

791 silver badge2 bronze badges

2 Comments

i336_ Over a year ago

Never looked at -R before, TIL about $argn (and $argi)! Reference (^F -R): php.net/manual/en/features.commandline.options.php

Sarke Over a year ago

You can also pass it in as an argument: php -r 'echo urldecode($argv[1]),"\n";' Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en

Zombo · Accepted Answer · 2014-11-04 13:17:22Z

7

perl -pi.back -e 'y/+/ /;s/%([\da-f]{2})/pack H2,$1/gie' ./*.log

With -i updates the files in-place (some sed implementations have borrowed that from perl) with .back as the backup extension.

s/x/y/e substitutes x with the evaluation of the y perl code.

The perl code in this case uses pack to pack the hex number captured in $1 (first parentheses pair in the regexp) as the corresponding character.

An alternative to pack is to use chr(hex($1)):

perl -pi.back -e 'y/+/ /;s/%([\da-f]{2})/chr hex $1/gie' ./*.log

If available, you could also use uri_unescape() from URI::Escape:

perl -pi.back -MURI::Escape -e 'y/+/ /;$_=uri_unescape$_' ./*.log

edited Nov 4, 2014 at 13:17

Zombo

1

answered May 1, 2014 at 12:03

Stephane Chazelas

6,3392 gold badges38 silver badges35 bronze badges

Comments

Community · Accepted Answer · 2017-05-23 12:34:15Z

6

As @barti_ddu said in the comments, \x "should be [double-]escaped".

% echo -e "$(echo "Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en" | sed 'y/+/ /; s/%/\\x/g')"
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en

Rather than mixing up Bash and sed, I would do this all in Python. Here's a rough cut of how:

#!/usr/bin/env python

import glob
import os
import urllib

for logfile in glob.glob(os.path.join('.', '*.log')):
    with open(logfile) as current:
        new_log_filename = logfile + '.new'
        with open(new_log_filename, 'w') as new_log_file:
            for url in current:
                unquoted = urllib.unquote(url.strip())
                new_log_file.write(unquoted + '\n')

edited May 23, 2017 at 12:34

CommunityBot

11 silver badge

answered Jun 6, 2011 at 11:39

johnsyweb

143k26 gold badges197 silver badges253 bronze badges

2 Comments

user785717 Over a year ago

Thanks for the script. Will give the sed version one more shot and take the python route if that fails. I read somewhere that the sed approach was faster. Is that true. I will have a few GB sized files to process every hour and can use all the advantage.

johnsyweb Over a year ago

@user785717: Hopefully that will work for you. I've no idea which will perform better on your data. time is your friend.

frcn · Accepted Answer · 2020-06-22 06:45:50Z

5

Building upon some of the other answers, but for the POSIX world, could use the following function:

url_decode() {
    printf '%b\n' "$(sed -E -e 's/\+/ /g' -e 's/%([0-9a-fA-F]{2})/\\x\1/g')"
}

It uses printf '%b\n' because there is no echo -e and breaks the sed call to make it easier to read, forcing -E to be able to use references with \1. It also forces what follows % to look like some hex code.

edited Jun 22, 2020 at 6:45

answered Jun 22, 2020 at 6:13

frcn

591 silver badge2 bronze badges

2 Comments

Bastian Bittorf Over a year ago

sadly, the %b is not enforced by POSIX: pubs.opengroup.org/onlinepubs/9699919799/utilities/printf.html "An additional conversion specifier character, b, shall be supported ..." shall is not must and here (ubuntu 20.04.4 LTS/dash) it does not work.

Ionic Aug 15 at 16:24

And even if %b was mandatory to be supported, the escape sequence \x is not supported by it - merely \0ddd, which would need conversion from hexadecimal to octal.

Daniel Cambría · Accepted Answer · 2021-07-10 05:50:49Z

With sed:

#!/bin/bash
URL_DECODE="$(echo "$1" | sed -E 's/%([0-9a-fA-F]{2})/\\x\1/g;s/\+/ /g'"
echo -e "$URL_DECODE"

s/%([0-9a-fA-F]{2})/\\x\1/g replaces % with \x to transform urlencoded to hexadecimal
s/\+/ /g replace + to space ' ', in case using + in query string

Just save it to decodeurl.sh and make it executable with chmod +x decodeurl.sh

If you need a way do encode too, this complete code will help:

#!/bin/bash
#
# Enconding e Decoding de URL com sed
#
# Por Daniel Cambría
# [email protected]
#
# jul/2021

function url_decode() {
echo "$@" \
    | sed -E 's/%([0-9a-fA-F]{2})/\\x\1/g;s/\+/ /g'
}

function url_encode() {
    # Conforme RFC 3986
    echo "$@" \
    | sed \
    -e 's/ /%20/g' \
    -e 's/:/%3A/g' \
    -e 's/,/%2C/g' \
    -e 's/\?/%3F/g' \
    -e 's/#/%23/g' \
    -e 's/\[/%5B/g' \
    -e 's/\]/%5D/g' \
    -e 's/@/%40/g' \
    -e 's/!/%41/g' \
    -e 's/\$/%24/g' \
    -e 's/&/%26/g' \
    -e "s/'/%27/g" \
    -e 's/(/%28/g' \
    -e 's/)/%29/g' \
    -e 's/\*/%2A/g' \
    -e 's/\+/%2B/g' \
    -e 's/,/%2C/g' \
    -e 's/;/%3B/g' \
    -e 's/=/%3D/g'
}

echo -e "URL decode: " $(url_decode "$1")
echo -e "URL encode: " $(url_encode "$1")

malhal · Accepted Answer · 2023-08-02 22:26:38Z

4

$ UENC='H%C3%B6he %C3%BCber%20dem%20Meeresspiegel'
$ UTF8=$(echo -e "${UENC//%/\\x}")
$ echo $UTF8
Höhe über dem Meeresspiegel
$

-e allows /

edited Aug 2, 2023 at 22:26

malhal

31.5k7 gold badges125 silver badges154 bronze badges

answered Apr 21, 2016 at 4:38

guest

491 bronze badge

1 Comment

Toby Speight Over a year ago

Although this code may answer the question, providing additional context regarding why and/or how it answers the question would significantly improve its long-term value. Please edit your answer to add some explanation.

Stephane Chazelas · Accepted Answer · 2020-10-12 06:46:32Z

3

With the zsh shell (instead of bash), the only shell whose variables can hold any byte value including NUL (encoded as %00):

set -o extendedglob +o multibyte
string='Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en'
decoded=${${string//+/ }//(#b)%([[:xdigit:]](#c2))/${(#):-0x$match[1]}}

${var//pattern/replacement}: ksh-style parameter expansion operator to expand to the value of $var with every string matching pattern replaced with replacement.
(#b) activate back references so every part inside brackets in the pattern can be accessed as corresponding $match[n] in the replacement.
(#c2): equivalent of ERE {2}
${(#)param-expansion}: parameter expansion where the # flag causes the result to be interpreted as an arithmetic expression and the corresponding byte value to be returned.
${var:-value}: expands to value if $var is empty, here applied to no variable at all, so we can just specify an arbitrary string as the subject of a parameter expansion.

To make it a function that decodes the contents of a variable in-place:

uridecode_var() {
  emulate -L zsh
  set -o extendedglob +o multibyte
  eval $1='${${'$1'//+/ }//(#b)%([[:xdigit:]](#c2))/${(#):-0x$match[1]}}'
}

$ string='Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en'
$ uridecode_var string
$ print -r -- $string
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en

edited Oct 12, 2020 at 6:46

answered Oct 12, 2020 at 5:22

Stephane Chazelas

6,3392 gold badges38 silver badges35 bronze badges

4 Comments

Dannid Over a year ago

I'm running zsh. This worked for me and is much faster than the python solutions. (Unfortunately, most of the bash solutions don't work in zsh.) I'm going to try to adjust this so it doesn't replace the variable in-place, but instead takes the input string and returns the decoded output.

Dannid Over a year ago

Ah: changing the eval line to eval echo instead of eval $1- merely ouputs the decoded string, rather than change the variable in-place.

Stephane Chazelas Over a year ago

@Dannid, a function that decodes its argument and prints the result on stdout would just be decode() print -r -- ${${1//+/ }//(#b)%([[:xdigit:]](#c2))/${(#):-0x$match[1]}} (used as decode $var). No need for eval (you'll still need the extendedglob and nomultibyte though)

Dannid Over a year ago

Yes, thanks @stephane-chazelas - that worked, and in my .zshrc it looks like

function decode() {   set -o extendedglob +o multibyte   print -r -- ${${1//+/ }//(#b)%([[:xdigit:]](#c2))/${(#):-0x$match[1]}} }

yemiteliyadu · Accepted Answer · 2018-08-27 19:54:14Z

2

Updating Jay's answer for Python 3.5+:
echo "%31+%32%0A%33+%34" | python -c "import sys; from urllib.parse import unquote ; print(unquote(sys.stdin.read()))"

Still, brendan's bash solution with explanation seems more direct and elegant.

answered Aug 27, 2018 at 19:54

yemiteliyadu

761 silver badge6 bronze badges

Comments

Stephane Chazelas · Accepted Answer · 2020-10-12 05:03:59Z

2

With GNU awk:

LC_ALL=C gawk -vRS='%[[:xdigit:]]{2}' '
  RT {RT = sprintf("%c",strtonum("0x" substr(RT, 2)))}
  {gsub(/\+/," ");printf "%s", $0 RT}'

Would take URI-encoded on stdin and print the decoded output on stdout.

We set the record separator as a regexp that matches a %XX sequence. In GNU awk, the input that matched it is stored in the RT special variable. We extract the hex digits from there, append to "0x" for strnum() to turn into a number, passed in turn to sprintf("%c") which in the C locale would convert to the corresponding byte value.

edited Oct 12, 2020 at 5:03

answered May 1, 2014 at 13:58

Stephane Chazelas

6,3392 gold badges38 silver badges35 bronze badges

Comments

CodeFarmer · Accepted Answer · 2021-07-29 05:25:35Z

2

python, for zshrc

# Usage: decodeUrl %3A%2F%2F
function decodeUrl(){
    echo "$1" | python3 -c "import sys; from urllib.parse import unquote; print(unquote(sys.stdin.read()));"    
}

# Usage: encodeUrl https://google.com/search?q=urldecode+bash
#          return: https://google.com/search\?q\=urldecode+bash
function encodeUrl(){
    echo "$1" | python3 -c "import sys; from urllib.parse import quote; print(quote(sys.stdin.read()));"
}

edited Jul 29, 2021 at 5:25

answered Jul 29, 2021 at 5:13

CodeFarmer

2,7281 gold badge24 silver badges32 bronze badges

2 Comments

RARE Kpop Manifesto Over a year ago

In my own codes somehow I have "quote_plus" / "unquote_plus" instead - do you think the difference is significant ?

ninhjs.dev Over a year ago

You might add end='' param to get rid of extra newlines print(unquote(sys.stdin.read()),end='') and echo -n "$1".

seqwait · Accepted Answer · 2022-04-18 02:52:09Z

1

used gridsite-clients

1. yum install gridsite-clients / or apt-get install gridsite-clients
2. grep -a 'http' access.log | xargs urlencode -d

answered Apr 18, 2022 at 2:52

seqwait

3192 silver badges10 bronze badges

Comments

adroste · Accepted Answer · 2022-04-23 22:01:10Z

1

Just a quick hint for other who are searching for a busybox compatible solution. In busybox shell you can use

httpd -d $ENCODED_URL

Example use case for busybox:

Download a file with wget and save it with the original decoded filename:

wget --no-check-certificate $ENCODED_URL -O $(basename $(httpd -d $ENCODED_URL))

answered Apr 23, 2022 at 22:01

adroste

8971 gold badge8 silver badges20 bronze badges

Comments

Adrian Miranda · Accepted Answer · 2024-08-15 17:20:47Z

1

People are using python or perl, you could use node too:

echo 'some%20encoded%20string' | node -e "console.log(decodeURIComponent(require('fs').readFileSync(0, 'utf-8').trim()))"

put it on your .bash_aliases

function decodeURIComponent() {
    node -e "console.log(decodeURIComponent(require('fs').readFileSync(0, 'utf-8').trim()))"
}

and then

echo 'some%20encoded%20string' | decodeURIComponent

answered Aug 15, 2024 at 17:20

Adrian Miranda

3484 silver badges14 bronze badges

Comments

nevertooloud · Accepted Answer · 2014-03-07 15:21:28Z

0

Here is a solution that is done in pure bash where input and output are bash variables. It will decode '+' as a space and handle the '%20' space, as well as other %-encoded characters.

#!/bin/bash
#here is text that contains both '+' for spaces and a %20
text="hello+space+1%202"
decoded=$(echo -e `echo $text | sed 's/+/ /g;s/%/\\\\x/g;'`)
echo decoded=$decoded

answered Mar 7, 2014 at 15:21

nevertooloud

173 bronze badges

1 Comment

tricasse Over a year ago

sed is not pure Bash; this spawns another process.

Calvin Kim · Accepted Answer · 2017-06-10 19:35:14Z

0

Expanding to https://stackoverflow.com/a/37840948/8142470
to work with HTML entities

$ htmldecode() { : "${*//+/ }"; echo -e "${_//&#x/\x}" | tr -d ';'; }
$ htmldecode "http://google.com/search&?q=urldecode+bash" http://google.com/search&?q=urldecode+bash

(argument must be quoted)

answered Jun 10, 2017 at 19:35

Calvin Kim

3642 silver badges6 bronze badges

Comments

RARE Kpop Manifesto · Accepted Answer · 2022-04-24 13:41:42Z

If you prefer gawk, there's absolutely no need to force LC_ALL=C or gawk -b just to decode URL-encoded -

here's a fully functional proof-of-concept showcasing how gawk-unicode mode could directly decode purely binary files like MP3-audio or MP4-video files that were URL-encoded,and get back the exact same file, as confirmed by hashing.

It uses FS | OFS to handle the spaces that were set to +, similar to python3's quote-plus in their urllib :

( fg && fg && fg ) 2>/dev/null; 
gls8x "${f}"
echo
pvE0 < "${f}" | xxh128sum | lgp3
echo ; echo
pvE0 < "${f}" | urlencodeAWKchk \
\
| gawk -ne '
  BEGIN { 
     RS="[%][[:xdigit:]]{2}"; 
     FS="[+]"
       _=(4^5)*54  # if this offset doesn-t 
                   # work, try
                   #           8^7 
                   #               instead
  
  } (NF+="_"*(ORS = sprintf("%.*s", RT != "",
                    sprintf("%c",\
                         _+("0x"  \     
                            substr( RT, 2 ))))))~""' |pvE9|xxh128sum|lgp3

  1 -rwxrwxrwx 1 5555 staff 9290187 May 27  2021 genieaudio_16277926_.lossless.mp3*
   

      in0: 8.86MiB 0:00:00 [3.56GiB/s] [3.56GiB/s][=================>] 100%            
5d43c221bf6c85abac80eea8dbb412a1  stdin


      in0: 8.86MiB 0:00:00 [3.47GiB/s] [3.47GiB/s] [=================>] 100%            
     out9: 8.86MiB 0:00:05 [1.72MiB/s] [1.72MiB/s] [ <=>  ]

5d43c221bf6c85abac80eea8dbb412a1  stdin


     1  -rw-r--r-- 1 5555 staff 215098877 Feb  8 17:30 vg3.mp4


      in0:  205MiB 0:00:00 [2.66GiB/s] [2.66GiB/s] [=================>] 100% 
          
2778670450b08cee694dcefc23cd4d93  stdin


      in0:  205MiB 0:00:00 [3.31GiB/s] [3.31GiB/s] [=================>] 100%            
     out9:  205MiB 0:02:01 [1.69MiB/s] [1.69MiB/s] [ <=> ]
2778670450b08cee694dcefc23cd4d93  stdin

F. Hauri - Give Up GitHub · Accepted Answer · 2022-05-22 06:06:06Z

Minimalistic `uridecode [-v varname]` bash function:

Comming late on this SO Question (11 year ago), I see:

First answer suggesting the use of printf -v varname %b... was offer by jamp, near than 3 year after question was asked.
Fist answer offering a function for doing this was offered 10 years and 6 month after question, by Robin A. Meade.

Here is my smaller function:

uridecode() {
    if [[ $1 == -v ]];then local -n _res="$2"; shift 2; else local _res; fi
    : "${*//+/ }"; printf -v _res %b "${_//%/\\x}"
    [[ ${_res@A} == _res=* ]] && echo "$_res"
}

Or less condensed:

uridecode() {
    if [[ $1 == -v ]];then           # If 1st argument is ``-v''
        local -n _res="$2"           # _res is a nameref to ``$2''
        shift 2                      # drop 1st two arguments
    else
        local _res                   # _res is a local variable
    fi
    : "${*//+/ }"                    # _ hold argumenrs having ``+'' replaced by spaces
    printf -v _res %b "${_//%/\\x}"  # store in _res rendered string
    [[ ${_res@A} == _res=* ]] &&     # print _res if local
        echo "$_res"
}

Usage:

uridecode Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en

uridecode -v myvar Hell%6f w%6Frld%21
echo $myvar 
Hello world!

As I use $* instead of $1, and because URI doesn't hold special characters, there is no need to quote arguments.

Guest · Accepted Answer · 2023-09-16 16:10:43Z

Here is my modified version for you:

source code

File urldecode.sh :

#!/usr/bin/env -S bash -euo pipefail

# @author : Unregistered Guest (https://stackoverflow.com/users/6470696/guest)
# @author : netDesign8 (https://stackoverflow.com/users/1988310/netdesign8)
# @license : CC BY-SA 4.0
# @see : https://stackoverflow.com/a/37840948
# @modified
urldecode() {
  : "${*//%/\\x}"
  printf '%b' "${_}"
}
urldecode_plus() {
  : "${*//+/ }"
  : "${_//%/\\x}"
  printf '%b' "${_}"
}
urldecode_restfull() {
  : "${*// /+}"
  : "${_//%20/+}"
  : "${_//%/\\x}"
  printf '%b' "${_}"
}

# @author : Jay (https://stackoverflow.com/users/448081/jay)
# @author : Rory O'Kane (https://stackoverflow.com/users/578288/rory-okane)
# @license : CC BY-SA 4.0
# @see https://stackoverflow.com/a/21693459
# @modified
urldecode_py3() {
  python3 \
    -c \
    "import sys; from urllib.parse import unquote; print(unquote(sys.stdin.read()));" \
    <<< "${*}"
}
urldecode_plus_py3() {
  python3 \
    -c \
    "import sys; from urllib.parse import unquote_plus; print(unquote_plus(sys.stdin.read()));" \
    <<< "${*}"
}
urldecode_py2() {
  #XXX Python 2.x deprecated
  >&2 echo 'warning: Python 2.x deprecated!'
  python2 \
    -c \
    "import sys, urllib as ul; print ul.unquote(sys.stdin.read());" \
    <<< "${*}"
}
urldecode_plus_py2() {
  #XXX Python 2.x deprecated
  >&2 echo 'warning: Python 2.x deprecated!'
  python2 \
    -c \
    "import sys, urllib as ul; print ul.unquote_plus(sys.stdin.read());" \
    <<< "${*}"
}

testing

#!/usr/bin/env -S bash -euo pipefail

echo ===============

source urldecode.sh

url='https%3A%2F%2Fgoogle.com%2Fgo/sss+kkk/hjo%20kop/kk jj/search%3Fq%3Dko%20ddd fff+urldecode%2Bbash'

echo

echo -----bash
urldecode "${url}" "${url}"

echo

echo -----bash plus
urldecode_plus "${url}" "${url}"

echo

echo -----bash restfull
urldecode_restfull "${url}" "${url}"

echo

echo -----py3
urldecode_py3 "${url}" "${url}"

echo

echo -----py3 plus
urldecode_plus_py3 "${url}" "${url}"

echo

echo -----py2
urldecode_py2 "${url}" "${url}"

echo

echo -----py2 plus
urldecode_plus_py2 "${url}" "${url}"

echo

${_} works in interactive shell but not sure if it always works in script ?

Peter · Accepted Answer · 2014-08-07 21:06:18Z

-1

A slightly modified version of the Python answer that accepts an input and output file in a one liner.

cat inputfile.txt | python -c "import sys, urllib as ul; print ul.unquote(sys.stdin.read());" > ouputfile.txt

answered Aug 7, 2014 at 21:06

Peter

29.9k23 gold badges93 silver badges126 bronze badges

Comments

guest · Accepted Answer · 2016-04-21 04:32:09Z

-4

$ uenc='H%C3%B6he %C3%BCber%20dem%20Meeresspiegel'
$ utf8=$(printf "${uenc//%/\\x}")
$ echo $utf8
Höhe über dem Meeresspiegel
$

answered Apr 21, 2016 at 4:32

guest

11

1 Comment

Toby Speight Over a year ago

Although this code may answer the question, providing additional context regarding why and/or how it answers the question would significantly improve its long-term value. Please edit your answer to add some explanation.

Collectives™ on Stack Overflow

How to decode URL-encoded string in shell?

27 Answers 27

10 Comments

6 Comments

3 Comments

3 Comments

bash idiom for url-decoding

Benchmark details

If you really want a function ...

2 Comments

2 Comments

2 Comments

2 Comments

Comments

2 Comments

2 Comments

Comments

1 Comment

4 Comments

Comments

Comments

2 Comments

Comments

Comments

Comments

1 Comment

Comments

Comments

Minimalistic `uridecode [-v varname]` bash function:

Here is my smaller function:

Comments

source code

testing

1 Comment

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

27 Answers 27

10 Comments

6 Comments

3 Comments

3 Comments

bash idiom for url-decoding

Benchmark details

If you really want a function ...

2 Comments

2 Comments

2 Comments

2 Comments

Comments

2 Comments

2 Comments

Comments

1 Comment

4 Comments

Comments

Comments

2 Comments

Comments

Comments

Comments

1 Comment

Comments

Comments

Minimalistic uridecode [-v varname] bash function:

Here is my smaller function:

Comments

source code

testing

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Minimalistic `uridecode [-v varname]` bash function: