Python File Handling: The Right Way (and the Common Mistakes)
Context managers, reading strategies, pathlib vs os.path, CSV and JSON handling, and the file-handling mistakes that will burn you eventually.
File I/O in Python is straightforward -- until it isn't. The basics are simple, but there are enough foot-guns around encoding, path handling, and resource cleanup that it's worth getting the patterns right from the start.
Always Use Context Managers
This is rule number one. Don't do this:
f = open("data.txt")
content = f.read()
f.close()
If an exception happens between open() and close(), the file never gets closed. You'll leak file handles, and on Windows you might not be able to delete or modify the file until your program exits.
Do this instead:
with open("data.txt") as f:
content = f.read()
# File is closed automatically, even if an exception occurs
The with statement guarantees cleanup. It's not just best practice -- it's the only way you should open files in Python. Every time.
Read/Write/Append Modes
with open("file.txt", "r") as f: # read (default)
data = f.read()
with open("file.txt", "w") as f: # write (TRUNCATES the file first!)
f.write("new content")
with open("file.txt", "a") as f: # append (adds to end)
f.write("more content\n")
with open("file.txt", "r+") as f: # read AND write
data = f.read()
f.seek(0)
f.write("overwrite from start")
The big trap: "w" mode destroys the existing file contents immediately on open. If you meant to append, you just nuked your data. There's no undo.
For binary files (images, PDFs, anything non-text), add b:
with open("image.png", "rb") as f:
raw_bytes = f.read()
with open("copy.png", "wb") as f:
f.write(raw_bytes)
Reading Strategies
Three main approaches, each suited to different situations:
# read() -- entire file as one string. Fine for small files.
with open("config.txt") as f:
content = f.read()
# readlines() -- entire file as a list of lines (each line includes \n)
with open("data.txt") as f:
lines = f.readlines()
# Iterate line by line -- best for large files, memory efficient
with open("huge_log.txt") as f:
for line in f:
process(line.rstrip("\n"))
For large files, line-by-line iteration is the way to go. It only holds one line in memory at a time. If you call .read() on a 4GB log file, you're loading 4GB into RAM.
Quick tip: lines from readlines() and iteration include the trailing newline. Use .rstrip("\n") or .strip() to remove it.
pathlib: The Modern Way
Stop concatenating paths with string operations. os.path.join() was the old solution; pathlib is the current one:
from pathlib import Path
# Create path objects
project = Path("/home/user/project")
config = project / "config" / "settings.json" # / operator joins paths
# Common operations
print(config.name) # settings.json
print(config.stem) # settings
print(config.suffix) # .json
print(config.parent) # /home/user/project/config
# Check existence
if config.exists():
content = config.read_text() # built-in read, no open() needed
# List files
for py_file in project.glob("*/.py"): # recursive glob
print(py_file)
# Create directories
output = project / "output"
output.mkdir(parents=True, exist_ok=True)
pathlib handles path separators across operating systems automatically. No more f-string path construction that breaks on Windows because you used / instead of \\.
The .read_text() and .write_text() methods are convenient for small files -- they handle opening and closing internally:
from pathlib import Path
# One-liner read
data = Path("config.json").read_text(encoding="utf-8")
# One-liner write
Path("output.txt").write_text("result data", encoding="utf-8")
Working with CSV
The csv module handles quoting, escaping, and delimiters correctly. Don't try to parse CSV by splitting on commas -- it breaks the moment a field contains a comma.
import csv
# Reading
with open("data.csv", newline="") as f:
reader = csv.DictReader(f)
for row in reader:
print(row["name"], row["email"])
# Writing
with open("output.csv", "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=["name", "score"])
writer.writeheader()
writer.writerow({"name": "Alice", "score": 95})
writer.writerow({"name": "Bob", "score": 87})
Note the newline="" argument -- without it, you get blank lines between rows on Windows. This is a classic gotcha that confuses people for hours.
DictReader and DictWriter are almost always what you want. They map rows to dictionaries using the header, which is much cleaner than working with index-based lists.
Working with JSON
import json
# Reading JSON from a file
with open("data.json") as f:
data = json.load(f) # parse file -> Python dict/list
# Writing JSON to a file
with open("output.json", "w") as f:
json.dump(data, f, indent=2) # Python dict/list -> file
# Working with JSON strings (not files)
json_str = '{"name": "Alice", "age": 30}'
parsed = json.loads(json_str) # string -> dict
back = json.dumps(parsed) # dict -> string
Watch the naming: load/dump work with files, loads/dumps work with strings (the "s" stands for string).
Common mistake: trying to serialize objects that aren't JSON-compatible. json.dump handles dicts, lists, strings, numbers, booleans, and None. For dates, custom objects, or sets, you need a custom encoder or convert them first:
import json
from datetime import datetime
data = {"timestamp": datetime.now()}
# json.dump(data, f) # TypeError!
# Fix: convert before serializing
data["timestamp"] = data["timestamp"].isoformat()
Common Mistakes
Forgetting encoding. Python 3 defaults to your system encoding, which varies by platform. If you're working with anything that might contain non-ASCII characters, be explicit:with open("data.txt", encoding="utf-8") as f:
content = f.read()
This saves you from mysterious UnicodeDecodeError on someone else's machine.
"data\\file.txt" or "data/file.txt" breaks cross-platform. Use pathlib.Path or os.path.join().
Not handling FileNotFoundError. If the file might not exist, handle it:
from pathlib import Path
config_path = Path("settings.json")
if config_path.exists():
config = json.loads(config_path.read_text())
else:
config = default_config
Or use try/except if you prefer asking forgiveness over permission (EAFP -- very Pythonic):
try:
config = json.loads(Path("settings.json").read_text())
except FileNotFoundError:
config = default_config
File handling is one of those things you'll do in almost every project. If you want to practice these patterns with hands-on exercises -- reading data files, parsing CSV, manipulating paths -- CodeUp has Python challenges that build up from basic I/O through real-world file processing scenarios.