March 24, 20265 min read

File Compression Explained — Why Files Get Smaller (And Sometimes Can't)

How file compression actually works. Lossless vs lossy, ZIP vs RAR vs 7z, why some files compress 90% and others barely budge.

file compression zip lossless lossy data compression file size
Ad 336x280

The Magic Trick That Isn't Magic

File compression feels like magic: a 100 MB file becomes 10 MB, and when you decompress it, everything's still there. How?

The short answer: patterns. Every file has patterns — repeated sequences, predictable structures, redundant information. Compression algorithms find these patterns and replace them with shorter representations.

Think of it like this: instead of writing "the the the the the," you write "5×the." Same information, fewer characters. Real compression is vastly more sophisticated, but the principle is the same.

Lossless vs Lossy: The Fundamental Split

Lossless Compression

Every single bit is preserved. Decompress the file, and you get the exact original. Used for:
  • ZIP, RAR, 7z, GZIP archives
  • PNG images
  • FLAC audio
  • Text, code, databases — anything where every bit matters

Lossy Compression

Some data is permanently discarded to achieve smaller files. The discarded data is chosen to minimize perceptible quality loss. Used for:
  • JPG images (discards visual details humans can't easily see)
  • MP3/AAC audio (discards sounds masked by louder sounds)
  • H.264/H.265 video (discards inter-frame redundancy)
You cannot convert lossy to lossless and recover the lost data. Converting an MP3 to FLAC doesn't restore the audio that MP3 threw away. It just puts the remaining data in a bigger container.

Why Some Files Compress Well and Others Don't

File TypeTypical ZIP CompressionWhy
Plain text (.txt, .csv)60-90% smallerHighly repetitive patterns
Word documents (.docx)0-5% smallerAlready ZIP-compressed internally
Source code70-85% smallerRepetitive keywords and structure
BMP images80-95% smallerTons of redundancy
PNG images0-2% smallerAlready compressed
JPG images0-3% smallerAlready compressed
MP3 audio0-2% smallerAlready compressed
MP4 video0-1% smallerAlready compressed
Database files50-80% smallerStructured, repetitive data
Executable (.exe)20-40% smallerSome patterns, less than text
The pattern: files that are already compressed don't compress further. A ZIP of JPGs is barely smaller than the JPGs themselves. A ZIP of BMPs is dramatically smaller.

This is why zipping a folder of photos barely reduces its size — the photos are already compressed.

ZIP vs RAR vs 7z vs GZIP

FormatCompression RatioSpeedCompatibilityEncryption
ZIPGoodFastUniversalAES-256
RARVery GoodMediumNeeds WinRAR/7-ZipAES-256
7z (LZMA2)ExcellentSlowNeeds 7-ZipAES-256
GZIPGoodFastLinux/web standardNo
BrotliVery GoodSlowWeb (HTTP)No
ZstandardExcellentFastGrowing (Facebook)No
For general use: ZIP. Everyone can open it, it's fast, and it's "good enough." For maximum compression: 7z. 20-30% smaller than ZIP on average. For web servers: Brotli or Gzip. Your web server handles this automatically.

Compression Levels

Most tools let you choose a compression level (1-9 or similar). Higher levels don't always help:

LevelSpeedSize ReductionBest For
1 (fastest)Very fastMinimalQuick archiving, temporary files
5-6 (default)ModerateGoodGeneral use
9 (maximum)Very slowSlightly better than 6Long-term archival
Going from level 6 to 9 typically saves only 2-5% more while taking 3-10x longer. It's rarely worth it.

Real-World Compression Scenarios

Sending a folder of documents via email: ZIP the folder → often 50-70% smaller. MyPDF's Create ZIP handles this quickly online. Archiving project files: 7z with LZMA2 compression → smallest possible archive. Use 7-Zip (free desktop software). Reducing a PDF for email: Don't ZIP it — use PDF compression instead, which re-compresses images inside the PDF. Much more effective than generic ZIP compression on an already-compressed PDF. Backing up a code repository: GZIP or ZIP → excellent compression on source code. Git already stores objects compressed, so compressing a .git folder yields minimal additional savings.

The Diminishing Returns of Compression

A common misconception: "If I compress it twice, it'll be even smaller." Wrong. Compressing a ZIP file produces a ZIP that's the same size (or slightly larger, due to archive overhead). The patterns have already been found and replaced — there's nothing left to optimize.

This applies to all compression: MP3 of an MP3 doesn't get smaller. JPG of a JPG gets worse, not smaller. ZIP of a ZIP adds overhead.

Frequently Asked Questions

Does compression speed up file transfers?

Yes, if the compression ratio is significant. A 100 MB text file compressed to 10 MB transfers 10x faster (minus compression/decompression time). A 100 MB JPG compressed to 99 MB? Not worth it.

Can corrupted compressed files be recovered?

Partially. ZIP files can sometimes recover uncorrupted portions. RAR files with "recovery records" (a RAR-specific feature) can reconstruct damaged sections. 7z files are harder to recover from corruption.

What's the most compressed a file can get?

It depends entirely on the content. A file of all zeros compresses to nearly nothing. Random data can't be compressed at all (by mathematical proof). Real files fall somewhere between.
Ad 728x90