Back to Blog
·10 min read·Security

Understanding Hash Functions: SHA-256, MD5, and When to Use What

Understanding Hash Functions: SHA-256, MD5, and When to Use What

What Is a Hash Function?

A hash function takes an input of any size and produces a fixed-size output called a hash, digest, or checksum. The same input always produces the same output. A completely different output results from even a tiny change to the input. This is called the avalanche effect: flip one bit in the input and roughly half the bits in the output change.

Hash functions are one-way: you can easily compute the hash from the input, but you cannot reconstruct the input from the hash. This is not encryption, where the process is reversible with a key. A hash permanently discards information. A 10 GB file and a 10 byte string both produce a 256-bit hash with SHA-256. There is no way to recover the original data from those 256 bits.

You encounter hashes constantly, even if you do not realize it. When you download a file and the website lists a SHA-256 checksum, that is a hash for verifying the download was not corrupted or tampered with. When you log into a website, your password is compared against a stored hash, not the plain text password. Git uses SHA-1 hashes to identify every commit, file, and tree in a repository.

Properties of Cryptographic Hashes

Not all hash functions are cryptographic. A hash table in a programming language uses a hash function for data lookup, but that function does not need to be secure. Cryptographic hash functions have three additional properties that make them suitable for security applications.

First, preimage resistance: given a hash output, it should be computationally infeasible to find any input that produces that output. This is what makes password hashing work. Even if an attacker obtains the hash, they cannot reverse it to find the password.

Second, second preimage resistance: given an input and its hash, it should be infeasible to find a different input that produces the same hash. This is critical for file integrity verification. If someone could find a malicious file with the same hash as a legitimate one, integrity checks would be useless.

Third, collision resistance: it should be infeasible to find any two different inputs that produce the same hash. Since hash outputs are fixed-size, collisions must mathematically exist (there are more possible inputs than possible outputs). The requirement is that finding one should take an impractical amount of computation. When researchers find practical collision attacks against a hash function, it is considered broken for security purposes.

MD5: History, Weaknesses, and Still-Valid Uses

MD5 was designed by Ronald Rivest in 1991 and produces a 128-bit hash. It was widely used for file checksums, password storage, and digital signatures throughout the 1990s and early 2000s. It is fast and simple, which contributed to its popularity.

MD5 is cryptographically broken. In 2004, Chinese researchers demonstrated practical collision attacks. By 2008, researchers used MD5 collisions to create a rogue CA certificate, proving the real-world impact. Today, generating two files with the same MD5 hash takes seconds on a modern computer. Any security application that relies on MD5 collision resistance is vulnerable.

Despite this, MD5 still has legitimate uses where collision resistance is not required. Checking file integrity against accidental corruption (not malicious tampering) is one example. If you download a file and want to verify it transferred correctly, MD5 is fine because accidental corruption will not produce a matching hash. It is also used as a fast checksum in internal systems, for deduplication in storage systems, and as a non-security hash in caching layers.

The rule is simple: never use MD5 for anything security-related. Do not use it for password hashing, digital signatures, certificate verification, or any context where an adversary might try to forge a collision. For those purposes, use SHA-256 or better.

Key Takeaway

MD5 was designed by Ronald Rivest in 1991 and produces a 128-bit hash.

The SHA Family: SHA-1, SHA-256, and SHA-3

The Secure Hash Algorithm family was designed by the NSA and published by NIST. SHA-1 (1995) produces a 160-bit hash. SHA-2 (2001) is a family that includes SHA-224, SHA-256, SHA-384, and SHA-512. SHA-3 (2015) is a completely different design based on the Keccak sponge construction.

SHA-1 is deprecated for security purposes. Google demonstrated a practical collision attack (SHAttered) in 2017 by producing two different PDF files with the same SHA-1 hash. Git still uses SHA-1 internally but is migrating to SHA-256. Certificate authorities stopped issuing SHA-1 certificates in 2016. Like MD5, SHA-1 remains usable for non-security checksums but should not be trusted for authentication or integrity against adversaries.

SHA-256 is the current workhorse. It produces a 256-bit hash and has no known practical attacks. It is used in TLS certificates, Bitcoin mining, code signing, and most modern security applications. When someone says "SHA" without specifying a version in a security context, they usually mean SHA-256.

SHA-3 was selected through a public competition as an alternative to SHA-2, not a replacement. Its internal design (sponge construction) is fundamentally different from SHA-2 (Merkle-Damgard construction), which means a theoretical breakthrough against SHA-2 would not affect SHA-3 and vice versa. SHA-3 is equally secure but less widely adopted because SHA-2 remains unbroken. It is a solid choice for applications that want defense-in-depth or compliance with the latest standards.

Use Cases: File Integrity, Passwords, and Digital Signatures

File integrity verification is the most visible use of hashing. Software distributors publish checksums alongside downloads. After downloading, you compute the hash locally and compare. If they match, the file is intact. Linux distributions, security tools, and firmware updates all use this pattern. SHA-256 is the standard choice here.

Password storage uses hashing to avoid storing plain text passwords. When you create an account, the system hashes your password and stores the hash. When you log in, it hashes your input and compares. However, raw SHA-256 is actually a poor choice for passwords because it is too fast. Attackers can compute billions of SHA-256 hashes per second with GPU hardware. Password-specific hashing functions like bcrypt, scrypt, and Argon2 deliberately slow down computation, making brute-force attacks impractical. They also add a random salt to each password, preventing precomputed rainbow table attacks.

Digital signatures use hashing as a preprocessing step. Instead of signing an entire document (which would be slow with asymmetric cryptography), the signer hashes the document and signs the hash. The verifier hashes the document independently and checks the signature against their hash. This is why collision resistance matters so much for signatures: if two documents produce the same hash, a signature for one is valid for both.

Key Takeaway

File integrity verification is the most visible use of hashing.

Hash vs. Encryption vs. Encoding: Choosing the Right Tool

These three concepts are frequently confused but serve completely different purposes. Hashing is one-way: it produces a fixed-size fingerprint and you cannot recover the original data. It is used for verification, integrity, and password storage.

Encryption is two-way: it transforms data so only someone with the correct key can read it. It is used for confidentiality. AES encrypts your disk, TLS encrypts your web traffic, PGP encrypts your email. The original data is fully recoverable if you have the key.

Encoding is a format transformation with no security properties. Base64 encodes binary data as ASCII text for transmission. URL encoding replaces special characters with percent-encoded values. Anyone can decode encoded data because there is no key and no secret. Encoding is for compatibility, not security.

Choosing the right hash function depends on your use case. For general file integrity and digital signatures, SHA-256 is the safe default. For password storage, use bcrypt or Argon2 -- never raw SHA-256 or MD5. For non-security checksums where speed matters, MD5 or CRC32 are acceptable. For maximum future-proofing, SHA-3 is the most conservative choice. When in doubt, SHA-256 is almost always the right answer.

Try these tools

Recommended Services

NordPassSponsored

Securely store and manage all your passwords in one place.

Visit NordPass
NordVPNSponsored

Protect your online privacy with encrypted browsing.

Visit NordVPN