gRPC: Build High-Performance APIs That REST Can't Match
Learn gRPC from scratch -- Protocol Buffers, service definitions, streaming RPCs, Python implementation, error handling, and when to choose gRPC over REST.
REST has been the default for web APIs for over a decade. It's simple, well-understood, and works with any HTTP client. But it has real limitations: JSON is slow to parse, there's no built-in schema enforcement, and streaming data requires awkward workarounds like WebSockets or server-sent events.
gRPC solves these problems by using Protocol Buffers for serialization (binary, not text), HTTP/2 for transport (multiplexed, bidirectional), and a strict contract between client and server. It's what Google uses internally for virtually all service-to-service communication, and it's why your microservices can talk to each other 10x faster than with REST.
What gRPC Actually Is
gRPC is a Remote Procedure Call framework. Instead of thinking in terms of URLs and HTTP methods (GET /users/123), you think in terms of function calls: GetUser(user_id=123). The framework handles serialization, transport, and deserialization.
The stack:
- Protocol Buffers (protobuf) -- a language-neutral schema definition and serialization format
- HTTP/2 -- the transport protocol (supports multiplexing and bidirectional streaming)
- Code generation -- you define a
.protofile, and gRPC generates client and server code in your language
.proto definition.
Protocol Buffers: The Schema
Everything starts with a .proto file. This defines your data structures and service methods:
// user_service.proto
syntax = "proto3";
package userservice;
// Data structures
message User {
int32 id = 1;
string username = 2;
string email = 3;
repeated string roles = 4;
UserStatus status = 5;
}
enum UserStatus {
ACTIVE = 0;
INACTIVE = 1;
BANNED = 2;
}
message GetUserRequest {
int32 id = 1;
}
message GetUserResponse {
User user = 1;
}
message CreateUserRequest {
string username = 1;
string email = 2;
}
message ListUsersRequest {
int32 page_size = 1;
string page_token = 2;
}
message ListUsersResponse {
repeated User users = 1;
string next_page_token = 2;
}
// Service definition
service UserService {
rpc GetUser(GetUserRequest) returns (GetUserResponse);
rpc CreateUser(CreateUserRequest) returns (GetUserResponse);
rpc ListUsers(ListUsersRequest) returns (ListUsersResponse);
}
Key things to understand:
- Field numbers (= 1, = 2) are used for binary encoding, not values. Never reuse them.
repeatedmeans a list/array.messageis like a struct or class.servicedefines the RPC methods -- what clients can call.
Code Generation
Install the tools:
pip install grpcio grpcio-tools
Generate Python code from the .proto file:
python -m grpc_tools.protoc \
-I. \
--python_out=. \
--grpc_python_out=. \
user_service.proto
This produces two files:
user_service_pb2.py-- the message classes (User, GetUserRequest, etc.)user_service_pb2_grpc.py-- the service stubs and base classes
.proto file and regenerate.
Building the Server
# server.py
import grpc
from concurrent import futures
import user_service_pb2 as pb2
import user_service_pb2_grpc as pb2_grpc
# In-memory store for this example
users_db = {
1: {"id": 1, "username": "alice", "email": "alice@example.com", "roles": ["admin"], "status": 0},
2: {"id": 2, "username": "bob", "email": "bob@example.com", "roles": ["user"], "status": 0},
}
next_id = 3
class UserServiceServicer(pb2_grpc.UserServiceServicer):
"""Implements the UserService defined in the .proto file."""
def GetUser(self, request, context):
user_data = users_db.get(request.id)
if not user_data:
context.set_code(grpc.StatusCode.NOT_FOUND)
context.set_details(f"User {request.id} not found")
return pb2.GetUserResponse()
user = pb2.User(
id=user_data["id"],
username=user_data["username"],
email=user_data["email"],
roles=user_data["roles"],
status=user_data["status"],
)
return pb2.GetUserResponse(user=user)
def CreateUser(self, request, context):
global next_id
user_data = {
"id": next_id,
"username": request.username,
"email": request.email,
"roles": ["user"],
"status": 0,
}
users_db[next_id] = user_data
next_id += 1
user = pb2.User(**user_data)
return pb2.GetUserResponse(user=user)
def ListUsers(self, request, context):
page_size = request.page_size or 10
all_users = list(users_db.values())
users = [
pb2.User(
id=u["id"],
username=u["username"],
email=u["email"],
roles=u["roles"],
status=u["status"],
)
for u in all_users[:page_size]
]
return pb2.ListUsersResponse(users=users, next_page_token="")
def serve():
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
pb2_grpc.add_UserServiceServicer_to_server(
UserServiceServicer(), server
)
server.add_insecure_port("[::]:50051")
server.start()
print("Server started on port 50051")
server.wait_for_termination()
if __name__ == "__main__":
serve()
The server uses a thread pool to handle concurrent requests. add_insecure_port is for development -- in production, you'd use TLS credentials.
Building the Client
# client.py
import grpc
import user_service_pb2 as pb2
import user_service_pb2_grpc as pb2_grpc
def run():
# Create a channel to the server
channel = grpc.insecure_channel("localhost:50051")
stub = pb2_grpc.UserServiceStub(channel)
# Get a user
try:
response = stub.GetUser(pb2.GetUserRequest(id=1))
print(f"Got user: {response.user.username} ({response.user.email})")
except grpc.RpcError as e:
print(f"Error: {e.code()} - {e.details()}")
# Create a user
response = stub.CreateUser(
pb2.CreateUserRequest(
username="charlie",
email="charlie@example.com",
)
)
print(f"Created user: {response.user.id} - {response.user.username}")
# List users
response = stub.ListUsers(pb2.ListUsersRequest(page_size=10))
for user in response.users:
print(f" - {user.username} ({user.email})")
if __name__ == "__main__":
run()
Notice how the client code looks like calling local functions. stub.GetUser(request) feels like a method call, not an HTTP request. The framework handles all the network details.
Streaming RPCs
This is where gRPC truly separates from REST. There are four types of RPC:
- Unary -- client sends one request, server sends one response (like REST)
- Server streaming -- client sends one request, server streams back multiple responses
- Client streaming -- client streams multiple messages, server sends one response
- Bidirectional streaming -- both sides stream simultaneously
service UserService {
// Unary
rpc GetUser(GetUserRequest) returns (GetUserResponse);
// Server streaming -- stream all users matching a filter
rpc WatchUsers(WatchUsersRequest) returns (stream User);
// Client streaming -- upload a batch of users
rpc BulkCreateUsers(stream CreateUserRequest) returns (BulkCreateResponse);
// Bidirectional -- real-time chat-style
rpc UserChat(stream ChatMessage) returns (stream ChatMessage);
}
message WatchUsersRequest {
string role_filter = 1;
}
message BulkCreateResponse {
int32 created_count = 1;
}
message ChatMessage {
string sender = 1;
string text = 2;
}
Server streaming implementation:
class UserServiceServicer(pb2_grpc.UserServiceServicer):
def WatchUsers(self, request, context):
"""Server streaming -- yields users one at a time."""
for user_data in users_db.values():
if not request.role_filter or request.role_filter in user_data["roles"]:
user = pb2.User(
id=user_data["id"],
username=user_data["username"],
email=user_data["email"],
roles=user_data["roles"],
status=user_data["status"],
)
yield user
# In a real app, you might yield new users as they're created
# using an event-driven approach
def BulkCreateUsers(self, request_iterator, context):
"""Client streaming -- receives multiple create requests."""
count = 0
global next_id
for request in request_iterator:
users_db[next_id] = {
"id": next_id,
"username": request.username,
"email": request.email,
"roles": ["user"],
"status": 0,
}
next_id += 1
count += 1
return pb2.BulkCreateResponse(created_count=count)
Client streaming usage:
def generate_users():
"""Generator that yields CreateUserRequest messages."""
names = ["dave", "eve", "frank", "grace"]
for name in names:
yield pb2.CreateUserRequest(
username=name,
email=f"{name}@example.com",
)
# Client streaming call
response = stub.BulkCreateUsers(generate_users())
print(f"Created {response.created_count} users")
Error Handling
gRPC has a standard set of status codes (similar to HTTP status codes but different):
# Server-side error handling
def GetUser(self, request, context):
if request.id <= 0:
context.abort(
grpc.StatusCode.INVALID_ARGUMENT,
"User ID must be positive",
)
user_data = users_db.get(request.id)
if not user_data:
context.abort(
grpc.StatusCode.NOT_FOUND,
f"User {request.id} not found",
)
# ... return user
Common status codes:
| gRPC Code | HTTP Equivalent | When to use |
|---|---|---|
| OK | 200 | Success |
| INVALID_ARGUMENT | 400 | Bad input |
| NOT_FOUND | 404 | Resource doesn't exist |
| ALREADY_EXISTS | 409 | Duplicate creation |
| PERMISSION_DENIED | 403 | Not authorized |
| UNAUTHENTICATED | 401 | No valid credentials |
| INTERNAL | 500 | Server error |
| UNAVAILABLE | 503 | Service temporarily down |
| DEADLINE_EXCEEDED | 504 | Timeout |
try:
response = stub.GetUser(
pb2.GetUserRequest(id=999),
timeout=5.0, # 5-second deadline
)
except grpc.RpcError as e:
status_code = e.code()
if status_code == grpc.StatusCode.NOT_FOUND:
print("User not found")
elif status_code == grpc.StatusCode.DEADLINE_EXCEEDED:
print("Request timed out")
else:
print(f"RPC failed: {status_code} - {e.details()}")
Deadlines
Every gRPC call should have a deadline. Without one, a hung server means a hung client -- forever.
# Client sets a 5-second deadline
response = stub.GetUser(
pb2.GetUserRequest(id=1),
timeout=5.0,
)
# Server can check remaining time
def GetUser(self, request, context):
remaining = context.time_remaining()
if remaining < 0.1:
context.abort(grpc.StatusCode.DEADLINE_EXCEEDED, "Too late")
# ... do work
Deadlines propagate through service chains. If Service A calls Service B with a 5-second deadline, and 2 seconds are spent in Service A, Service B only gets 3 seconds. This prevents cascading timeouts from consuming resources.
Interceptors (Middleware)
Interceptors are gRPC's version of middleware. They wrap every RPC call:
import time
import logging
class LoggingInterceptor(grpc.ServerInterceptor):
def intercept_service(self, continuation, handler_call_details):
method = handler_call_details.method
start = time.time()
logging.info(f"RPC started: {method}")
response = continuation(handler_call_details)
duration = time.time() - start
logging.info(f"RPC completed: {method} ({duration:.3f}s)")
return response
# Add interceptor when creating the server
server = grpc.server(
futures.ThreadPoolExecutor(max_workers=10),
interceptors=[LoggingInterceptor()],
)
Client-side interceptors work similarly:
class AuthInterceptor(grpc.UnaryUnaryClientInterceptor):
def __init__(self, token):
self.token = token
def intercept_unary_unary(self, continuation, client_call_details, request):
metadata = list(client_call_details.metadata or [])
metadata.append(("authorization", f"Bearer {self.token}"))
new_details = client_call_details._replace(metadata=metadata)
return continuation(new_details, request)
# Use with an intercept_channel
channel = grpc.insecure_channel("localhost:50051")
auth_interceptor = AuthInterceptor("my-secret-token")
channel = grpc.intercept_channel(channel, auth_interceptor)
stub = pb2_grpc.UserServiceStub(channel)
When gRPC Beats REST (and When It Doesn't)
Use gRPC when:- Services talk to each other (microservices, backend-to-backend)
- You need real-time streaming (live data feeds, chat, monitoring)
- Performance matters (protobuf is 5-10x faster than JSON to serialize)
- You have polyglot services (one proto file generates code for every language)
- You want a strict contract between client and server
- Browsers are the primary client (gRPC-Web exists but adds complexity)
- You need human-readable payloads for debugging
- Your API is public-facing (REST is universal, gRPC requires generated stubs)
- The team is small and simplicity beats performance
- You're building a CRUD API where latency doesn't matter much
Common Mistakes
Not versioning your proto files. Once a proto file is in production, changing field numbers or removing fields breaks clients. Add new fields with new numbers. Usereserved to prevent accidental reuse of old field numbers.
Blocking in streaming RPCs. If your server streaming method does a slow database query for every message, the stream stalls. Use async patterns or batch data.
Forgetting deadlines. No deadline means the client waits forever. Always set timeouts on the client side.
Sending too much data in one message. gRPC has a default message size limit of 4MB. For large payloads, use streaming instead of stuffing everything into one message.
Not using health checks. Load balancers need to know if your gRPC server is alive. Implement the standard gRPC health checking protocol.
What's Next
Once you have the basics down, explore:
- gRPC-Web for browser clients
- TLS/mTLS for encrypted, authenticated connections
- Load balancing strategies (client-side vs proxy-based)
- Reflection for dynamic service discovery and debugging tools
- buf as a modern replacement for protoc with linting and breaking change detection
For more backend development tutorials and architecture guides, check out CodeUp.