Learn About Base64

What is Base64 and Why Do We Need It?

At its core, Base64 is a way to represent binary data (like images, files, or even raw bytes) as plain text. This is crucial because many systems, especially older ones or those built on text-based protocols (like email via SMTP or data in JSON/XML), are designed to handle only text characters.

If you try to send binary data through these systems, it can be misinterpreted as control characters, leading to data corruption. Base64 solves this by converting the binary data into a safe, universal subset of ASCII characters that every system can understand.

How It Works: The 3-to-4 Method

Base64 works by taking 3 bytes (24 bits) of binary data and representing them as 4 text characters (4 x 6 bits = 24 bits).

  1. Grouping: The input data is read in chunks of 3 bytes (24 bits).
  2. Splitting: This 24-bit chunk is then split into four 6-bit groups.
  3. Mapping: Each 6-bit group corresponds to a number from 0 to 63. This number is used as an index to look up a character in the Base64 alphabet table.

Example: Encoding the word "Man"

  • The ASCII values are: M (77), a (97), n (110).
  • In binary, this is: 01001101 01100001 01101110.
  • Combined, the 24 bits are: 010011010110000101101110.
  • Split into four 6-bit groups: 010011 (19), 010110 (22), 000101 (5), 101110 (46).
  • Looking these up in the Base64 alphabet gives us: T (19), W (22), F (5), u (46).

Thus, "Man" becomes "TWFu".

Handling Padding

What happens if the input data isn't a perfect multiple of 3 bytes? This is where padding comes in.

  • 2 Bytes Left: If there are two bytes left, they are treated as 16 bits. A 6-bit group, a second 6-bit group, and a third 4-bit group are formed. Two zero bits are added to complete the third group. The fourth group is represented by a single equals sign = for padding.
  • 1 Byte Left: If there is one byte left, it is treated as 8 bits. A 6-bit group and a 2-bit group are formed. Four zero bits are added to complete the second group. The last two groups are represented by two equals signs ==.

Common Variants

While the logic is the same, the 64-character alphabet can vary slightly.

VariantCharacters 62 & 63Use Case
Standard (RFC 4648)+ and /General purpose, used in email (MIME).
URL & Filename Safe- and _Used where '+' and '/' have special meaning, like in URLs and file paths.

Common Use Cases

  • Data URIs: Embedding images or other files directly in HTML or CSS using the format data:image/png;base64,....
  • Email Attachments: The MIME standard uses Base64 to attach files to emails.
  • JSON Web Tokens (JWT): The header and payload of a JWT are Base64URL encoded.
  • Basic HTTP Authentication: The username and password are combined and Base64 encoded.