What is a Hash Function?
How hash functions work, what MD5, SHA-1, and SHA-256 are used for, and why some are no longer safe.
What is a hash function?
A hash function is an algorithm that takes any input — a word, a file, an entire database — and produces a fixed-length string of characters called a hash, digest, or checksum. Think of it as a fingerprint for data.
A SHA-256 hash always produces exactly 64 hexadecimal characters, regardless of whether the input is one byte or one gigabyte:
SHA-256("hello")
→ 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
SHA-256("hello world")
→ b94d27b9934d3e08a52e52d7da7dabfac484efe04294e576b4db7a25f726c62b3Notice how changing the input completely changes the output. That is by design — and it is one of the properties that makes hash functions so useful.
Key properties of a good hash function
- Deterministic — the same input always produces the same output, every single time. Hash functions are not random; they are mathematical.
- One-way (pre-image resistant) — given a hash output, it is computationally infeasible to reverse it back to the original input. You cannot "un-hash" data.
- Collision-resistant — it should be computationally infeasible to find two different inputs that produce the same hash. This is where older algorithms like MD5 and SHA-1 fail.
- Avalanche effect — a tiny change to the input — even flipping a single bit — produces a completely different hash. This is not an accident; it is engineered so that similar inputs cannot be linked by comparing their hashes.
MD5 — still useful, but broken for security
MD5 (Message Digest 5) was designed by Ron Rivest in 1991 and produces a 128-bit (32-character hex) hash. For nearly a decade it was the standard for checksums and digital signatures.
The problem: in 2004, researchers demonstrated practical collision attacks against MD5 — they could engineer two different files that produced the same MD5 hash. By 2008, attackers exploited this to forge SSL certificates. MD5 is now fully broken for any security-sensitive purpose.
Safe uses for MD5
- File download checksums (non-security)
- Deduplication / cache keys
- Non-cryptographic hash maps
- Legacy system interoperability
Do not use MD5 for
- Password hashing
- Digital signatures
- TLS/SSL certificates
- Any security-critical purpose
If you see an MD5 hash used as a file download checksum, it still tells you whether the file was corrupted in transit. It just cannot protect against a malicious actor who deliberately crafted a second file with the same hash.
SHA-1 — deprecated since 2017
SHA-1 (Secure Hash Algorithm 1) was designed by the NSA and published in 1995 as a more secure successor to MD5, producing a 160-bit (40-character hex) hash. It was widely adopted in SSL/TLS, PGP, Git, and many other systems.
In February 2017, Google's Project Zero published SHAttered — the first practical collision attack against SHA-1. They produced two different PDF files with identical SHA-1 hashes. This took roughly 6,500 years of single-CPU computation, but is within reach of well-resourced attackers.
Most major browsers, CAs, and standards bodies stopped accepting SHA-1 certificates in 2017. It is officially deprecated for all security-sensitive uses. It remains in use in Git (for historical reasons, not security) and in some legacy systems — but any new system should use SHA-256 or better.
Git and SHA-1: Git historically identified commits and objects using SHA-1. This was for content addressing, not security — Git was never relying on SHA-1 to prevent malicious tampering. Git is actively migrating to SHA-256 object IDs (the SHA-256 transition plan has been underway since Git 2.29).
SHA-256 and SHA-512 — the current safe choices
SHA-256 and SHA-512 are part of the SHA-2 family, designed by the NSA and published by NIST in 2001. SHA-256 produces a 256-bit (64-character hex) hash; SHA-512 produces a 512-bit (128-character hex) hash.
No practical collision or pre-image attacks exist against either algorithm as of 2025. SHA-256 is the dominant choice for:
- TLS/HTTPS certificates — virtually all certificates issued since 2016 use SHA-256.
- Bitcoin — the proof-of-work algorithm is a double SHA-256 hash.
- Code signing — macOS, Windows, and most package managers use SHA-256 to sign and verify software.
SHA-512 offers a larger safety margin and can be faster than SHA-256 on 64-bit architectures due to its internal design. For most applications SHA-256 is the right choice; SHA-512 is useful when you need extra margin or are on hardware where it benchmarks faster.
Common real-world uses
File integrity checksums
Download a file, compute its hash, and compare against the publisher's expected value. If they match, the file was not corrupted or tampered with in transit. Linux ISO images, npm packages, and Docker layers all use this pattern.
Password storage
Databases should never store plaintext passwords. Instead, store a hash: on login, hash the provided password and compare to the stored hash. If an attacker steals the database, they get hashes, not passwords. Do not use raw SHA-256 for passwords — use bcrypt, Argon2, or scrypt, which are slow and salted by design.
Git commit IDs
Every Git commit, tree, and blob is identified by the SHA-1 (or SHA-256) hash of its contents. This is content addressing — the hash is not just a name, it is a proof that the content is exactly what it claims to be.
Digital signatures
Signing a large document directly with a private key is slow. Instead, sign the hash of the document. The hash is fast to compute, fixed in size, and uniquely represents the document content. Verifying the signature on the hash is equivalent to verifying the signature on the full document.
What not to do
Do not use MD5 or SHA-1 for passwords — ever
Even ignoring collision attacks, MD5 and SHA-1 are fast. An attacker with a GPU can compute billions of MD5 hashes per second, making brute-force and rainbow table attacks trivial. Use bcrypt, Argon2, or scrypt — they are designed to be slow and memory-hard.
Do not use unsalted hashes for passwords
If two users have the same password and you hash without a salt, they get the same hash. An attacker can precompute a rainbow table of common password hashes and look up cracked hashes in bulk. A salt is a random value stored alongside the hash that makes each hash unique even for identical passwords. Modern password hashing libraries (bcrypt, Argon2) handle salting automatically.
Do not use SHA-256 directly for passwords either
SHA-256 is secure as a general-purpose hash, but it is optimized to be fast — hardware can compute hundreds of millions of SHA-256 hashes per second. Password hashing needs to be slow. bcrypt with a cost factor of 12+, or Argon2id with appropriate memory settings, is the right tool for passwords.
Frequently asked questions
- Is MD5 safe to use?
- For non-security purposes — deduplication, cache keys, checksums where an adversary is not involved — MD5 is fine. For anything security-related (passwords, digital signatures, certificate fingerprints), MD5 is broken and should not be used. The key question is: does it matter if an attacker can engineer two inputs with the same hash? For a download checksum verifying file integrity from a trusted server, probably not. For a password, absolutely yes.
- What's the difference between hashing and encryption?
- Hashing is one-way: you can hash data but you cannot reverse a hash back to its original input (without brute force). Encryption is two-way: you can encrypt data with a key and decrypt it back to the original with the right key. Use hashing when you need to verify that data matches something you have seen before (passwords, file integrity). Use encryption when you need to store or transmit data confidentially and recover the original value later.
- Why is salting important for password hashes?
- A salt is a random value that is generated uniquely for each password and combined with the password before hashing. Without a salt, identical passwords produce identical hashes — an attacker who gets your database can crack thousands of accounts at once by looking up precomputed hashes in a rainbow table. With a unique salt per password, every hash is different even if two users share the same password, and precomputed tables become useless. Good password hashing libraries (bcrypt, Argon2) handle salting automatically — never implement it manually.