URL Encoding and Decoding
What is URL?
URL (Uniform Resource Locator) is a uniform resource locator, a standard way of writing addresses in the Internet.
Initially the URL was developed as a system for the most natural indication of the location of resources in the network. The locator had to be easily extensible and to use only a limited set of ASCII characters (for example, whitespace is never used in the URL).
Typically, the URL consists of Protocol, domain, port number (default depends on Protocol), the location of the directory or page, query parameters, and may also contain an username and password to access the server.
The URL standard uses the character set US-ASCII. This has a serious drawback, since it is allowed to use only Latin letters, numbers and some punctuation. All other characters must be encoded. The procedure of encoding described in RFC 3986 and is called URL-encoding, URL encoded or percent‐encoding.
The transformation occurs in two stages:
- Each character is encoded in UTF-8 into a sequence of two bytes.
- Each byte of this sequence is written in hexadecimal representation preceded by a percent sign (%).
Because the percent (“%”) character serves as the indicator for percent-encoded octets, it must be percent-encoded as “%”.
For decoding analyses the presence of the percent signs(“%”) in the URL. Then all combinations of percent character and hexadecimal representations are decoded into a sequence of bytes of UTF-8. Therefore, the decoding is the reverse operation to the encoding.