March 27, 202610 min read

gRPC: Build High-Performance APIs That REST Can't Match

Learn gRPC from scratch -- Protocol Buffers, service definitions, streaming RPCs, Python implementation, error handling, and when to choose gRPC over REST.

grpc api python microservices tutorial

REST has been the default for web APIs for over a decade. It's simple, well-understood, and works with any HTTP client. But it has real limitations: JSON is slow to parse, there's no built-in schema enforcement, and streaming data requires awkward workarounds like WebSockets or server-sent events.

gRPC solves these problems by using Protocol Buffers for serialization (binary, not text), HTTP/2 for transport (multiplexed, bidirectional), and a strict contract between client and server. It's what Google uses internally for virtually all service-to-service communication, and it's why your microservices can talk to each other 10x faster than with REST.

What gRPC Actually Is

gRPC is a Remote Procedure Call framework. Instead of thinking in terms of URLs and HTTP methods (GET /users/123), you think in terms of function calls: GetUser(user_id=123). The framework handles serialization, transport, and deserialization.

The stack:

Protocol Buffers (protobuf) -- a language-neutral schema definition and serialization format
HTTP/2 -- the transport protocol (supports multiplexing and bidirectional streaming)
Code generation -- you define a .proto file, and gRPC generates client and server code in your language

Supported languages include Python, Go, Java, C++, Node.js, C#, Ruby, and more. A Python server can talk to a Go client with zero compatibility issues because they both conform to the same .proto definition.

Protocol Buffers: The Schema

Everything starts with a .proto file. This defines your data structures and service methods:

// user_service.proto
syntax = "proto3";

package userservice;

// Data structures
message User {
  int32 id = 1;
  string username = 2;
  string email = 3;
  repeated string roles = 4;
  UserStatus status = 5;
}

enum UserStatus {
  ACTIVE = 0;
  INACTIVE = 1;
  BANNED = 2;
}

message GetUserRequest {
  int32 id = 1;
}

message GetUserResponse {
  User user = 1;
}

message CreateUserRequest {
  string username = 1;
  string email = 2;
}

message ListUsersRequest {
  int32 page_size = 1;
  string page_token = 2;
}

message ListUsersResponse {
  repeated User users = 1;
  string next_page_token = 2;
}

// Service definition
service UserService {
  rpc GetUser(GetUserRequest) returns (GetUserResponse);
  rpc CreateUser(CreateUserRequest) returns (GetUserResponse);
  rpc ListUsers(ListUsersRequest) returns (ListUsersResponse);
}

Key things to understand:

Field numbers (= 1, = 2) are used for binary encoding, not values. Never reuse them.
repeated means a list/array.
message is like a struct or class.
service defines the RPC methods -- what clients can call.

The field numbers are the reason protobuf is backward-compatible. You can add new fields (with new numbers) without breaking existing clients. Old clients simply ignore fields they don't recognize.

Code Generation

Install the tools:

pip install grpcio grpcio-tools

Generate Python code from the .proto file:

python -m grpc_tools.protoc \
  -I. \
  --python_out=. \
  --grpc_python_out=. \
  user_service.proto

This produces two files:

user_service_pb2.py -- the message classes (User, GetUserRequest, etc.)
user_service_pb2_grpc.py -- the service stubs and base classes

Never edit these files. They're generated. If you need to change the API, edit the .proto file and regenerate.

Building the Server

# server.py
import grpc
from concurrent import futures
import user_service_pb2 as pb2
import user_service_pb2_grpc as pb2_grpc

# In-memory store for this example
users_db = {
    1: {"id": 1, "username": "alice", "email": "alice@example.com", "roles": ["admin"], "status": 0},
    2: {"id": 2, "username": "bob", "email": "bob@example.com", "roles": ["user"], "status": 0},
}
next_id = 3

class UserServiceServicer(pb2_grpc.UserServiceServicer):
    """Implements the UserService defined in the .proto file."""

def GetUser(self, request, context):
        user_data = users_db.get(request.id)

if not user_data:
            context.set_code(grpc.StatusCode.NOT_FOUND)
            context.set_details(f"User {request.id} not found")
            return pb2.GetUserResponse()

user = pb2.User(
            id=user_data["id"],
            username=user_data["username"],
            email=user_data["email"],
            roles=user_data["roles"],
            status=user_data["status"],
        )
        return pb2.GetUserResponse(user=user)

def CreateUser(self, request, context):
        global next_id

user_data = {
            "id": next_id,
            "username": request.username,
            "email": request.email,
            "roles": ["user"],
            "status": 0,
        }
        users_db[next_id] = user_data
        next_id += 1

user = pb2.User(**user_data)
        return pb2.GetUserResponse(user=user)

def ListUsers(self, request, context):
        page_size = request.page_size or 10
        all_users = list(users_db.values())

users = [
            pb2.User(
                id=u["id"],
                username=u["username"],
                email=u["email"],
                roles=u["roles"],
                status=u["status"],
            )
            for u in all_users[:page_size]
        ]

return pb2.ListUsersResponse(users=users, next_page_token="")

def serve():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    pb2_grpc.add_UserServiceServicer_to_server(
        UserServiceServicer(), server
    )
    server.add_insecure_port("[::]:50051")
    server.start()
    print("Server started on port 50051")
    server.wait_for_termination()

if __name__ == "__main__":
    serve()

The server uses a thread pool to handle concurrent requests. add_insecure_port is for development -- in production, you'd use TLS credentials.

Building the Client

# client.py
import grpc
import user_service_pb2 as pb2
import user_service_pb2_grpc as pb2_grpc

def run():
    # Create a channel to the server
    channel = grpc.insecure_channel("localhost:50051")
    stub = pb2_grpc.UserServiceStub(channel)

# Get a user
    try:
        response = stub.GetUser(pb2.GetUserRequest(id=1))
        print(f"Got user: {response.user.username} ({response.user.email})")
    except grpc.RpcError as e:
        print(f"Error: {e.code()} - {e.details()}")

# Create a user
    response = stub.CreateUser(
        pb2.CreateUserRequest(
            username="charlie",
            email="charlie@example.com",
        )
    )
    print(f"Created user: {response.user.id} - {response.user.username}")

# List users
    response = stub.ListUsers(pb2.ListUsersRequest(page_size=10))
    for user in response.users:
        print(f"  - {user.username} ({user.email})")

if __name__ == "__main__":
    run()

Notice how the client code looks like calling local functions. stub.GetUser(request) feels like a method call, not an HTTP request. The framework handles all the network details.

Streaming RPCs

This is where gRPC truly separates from REST. There are four types of RPC:

Unary -- client sends one request, server sends one response (like REST)
Server streaming -- client sends one request, server streams back multiple responses
Client streaming -- client streams multiple messages, server sends one response
Bidirectional streaming -- both sides stream simultaneously

Add streaming methods to the proto file:

service UserService {
  // Unary
  rpc GetUser(GetUserRequest) returns (GetUserResponse);

// Server streaming -- stream all users matching a filter
  rpc WatchUsers(WatchUsersRequest) returns (stream User);

// Client streaming -- upload a batch of users
  rpc BulkCreateUsers(stream CreateUserRequest) returns (BulkCreateResponse);

// Bidirectional -- real-time chat-style
  rpc UserChat(stream ChatMessage) returns (stream ChatMessage);
}

message WatchUsersRequest {
  string role_filter = 1;
}

message BulkCreateResponse {
  int32 created_count = 1;
}

message ChatMessage {
  string sender = 1;
  string text = 2;
}

Server streaming implementation:

class UserServiceServicer(pb2_grpc.UserServiceServicer):

def WatchUsers(self, request, context):
        """Server streaming -- yields users one at a time."""
        for user_data in users_db.values():
            if not request.role_filter or request.role_filter in user_data["roles"]:
                user = pb2.User(
                    id=user_data["id"],
                    username=user_data["username"],
                    email=user_data["email"],
                    roles=user_data["roles"],
                    status=user_data["status"],
                )
                yield user
                # In a real app, you might yield new users as they're created
                # using an event-driven approach

def BulkCreateUsers(self, request_iterator, context):
        """Client streaming -- receives multiple create requests."""
        count = 0
        global next_id

for request in request_iterator:
            users_db[next_id] = {
                "id": next_id,
                "username": request.username,
                "email": request.email,
                "roles": ["user"],
                "status": 0,
            }
            next_id += 1
            count += 1

return pb2.BulkCreateResponse(created_count=count)

Client streaming usage:

def generate_users():
    """Generator that yields CreateUserRequest messages."""
    names = ["dave", "eve", "frank", "grace"]
    for name in names:
        yield pb2.CreateUserRequest(
            username=name,
            email=f"{name}@example.com",
        )

# Client streaming call
response = stub.BulkCreateUsers(generate_users())
print(f"Created {response.created_count} users")

Error Handling

gRPC has a standard set of status codes (similar to HTTP status codes but different):

# Server-side error handling
def GetUser(self, request, context):
    if request.id <= 0:
        context.abort(
            grpc.StatusCode.INVALID_ARGUMENT,
            "User ID must be positive",
        )

user_data = users_db.get(request.id)
    if not user_data:
        context.abort(
            grpc.StatusCode.NOT_FOUND,
            f"User {request.id} not found",
        )

# ... return user

Common status codes:

gRPC Code	HTTP Equivalent	When to use
OK	200	Success
INVALID_ARGUMENT	400	Bad input
NOT_FOUND	404	Resource doesn't exist
ALREADY_EXISTS	409	Duplicate creation
PERMISSION_DENIED	403	Not authorized
UNAUTHENTICATED	401	No valid credentials
INTERNAL	500	Server error
UNAVAILABLE	503	Service temporarily down
DEADLINE_EXCEEDED	504	Timeout

Client-side error handling:

try:
    response = stub.GetUser(
        pb2.GetUserRequest(id=999),
        timeout=5.0,  # 5-second deadline
    )
except grpc.RpcError as e:
    status_code = e.code()
    if status_code == grpc.StatusCode.NOT_FOUND:
        print("User not found")
    elif status_code == grpc.StatusCode.DEADLINE_EXCEEDED:
        print("Request timed out")
    else:
        print(f"RPC failed: {status_code} - {e.details()}")

Deadlines

Every gRPC call should have a deadline. Without one, a hung server means a hung client -- forever.

# Client sets a 5-second deadline
response = stub.GetUser(
    pb2.GetUserRequest(id=1),
    timeout=5.0,
)

# Server can check remaining time
def GetUser(self, request, context):
    remaining = context.time_remaining()
    if remaining < 0.1:
        context.abort(grpc.StatusCode.DEADLINE_EXCEEDED, "Too late")

# ... do work

Deadlines propagate through service chains. If Service A calls Service B with a 5-second deadline, and 2 seconds are spent in Service A, Service B only gets 3 seconds. This prevents cascading timeouts from consuming resources.

Interceptors (Middleware)

Interceptors are gRPC's version of middleware. They wrap every RPC call:

import time
import logging

class LoggingInterceptor(grpc.ServerInterceptor):
    def intercept_service(self, continuation, handler_call_details):
        method = handler_call_details.method
        start = time.time()
        logging.info(f"RPC started: {method}")

response = continuation(handler_call_details)

duration = time.time() - start
        logging.info(f"RPC completed: {method} ({duration:.3f}s)")
        return response

# Add interceptor when creating the server
server = grpc.server(
    futures.ThreadPoolExecutor(max_workers=10),
    interceptors=[LoggingInterceptor()],
)

Client-side interceptors work similarly:

class AuthInterceptor(grpc.UnaryUnaryClientInterceptor):
    def __init__(self, token):
        self.token = token

def intercept_unary_unary(self, continuation, client_call_details, request):
        metadata = list(client_call_details.metadata or [])
        metadata.append(("authorization", f"Bearer {self.token}"))

new_details = client_call_details._replace(metadata=metadata)
        return continuation(new_details, request)

# Use with an intercept_channel
channel = grpc.insecure_channel("localhost:50051")
auth_interceptor = AuthInterceptor("my-secret-token")
channel = grpc.intercept_channel(channel, auth_interceptor)
stub = pb2_grpc.UserServiceStub(channel)

When gRPC Beats REST (and When It Doesn't)

Use gRPC when:

Services talk to each other (microservices, backend-to-backend)
You need real-time streaming (live data feeds, chat, monitoring)
Performance matters (protobuf is 5-10x faster than JSON to serialize)
You have polyglot services (one proto file generates code for every language)
You want a strict contract between client and server

Stick with REST when:

Browsers are the primary client (gRPC-Web exists but adds complexity)
You need human-readable payloads for debugging
Your API is public-facing (REST is universal, gRPC requires generated stubs)
The team is small and simplicity beats performance
You're building a CRUD API where latency doesn't matter much

The honest answer for most teams: REST for external APIs, gRPC for internal service communication. Many companies run both side by side.

Common Mistakes

Not versioning your proto files. Once a proto file is in production, changing field numbers or removing fields breaks clients. Add new fields with new numbers. Use reserved to prevent accidental reuse of old field numbers. Blocking in streaming RPCs. If your server streaming method does a slow database query for every message, the stream stalls. Use async patterns or batch data. Forgetting deadlines. No deadline means the client waits forever. Always set timeouts on the client side. Sending too much data in one message. gRPC has a default message size limit of 4MB. For large payloads, use streaming instead of stuffing everything into one message. Not using health checks. Load balancers need to know if your gRPC server is alive. Implement the standard gRPC health checking protocol.

What's Next

Once you have the basics down, explore:

gRPC-Web for browser clients
TLS/mTLS for encrypted, authenticated connections
Load balancing strategies (client-side vs proxy-based)
Reflection for dynamic service discovery and debugging tools
buf as a modern replacement for protoc with linting and breaking change detection

gRPC is a significant step up in complexity from REST, but for the right use cases, the performance and developer experience improvements are worth it.

For more backend development tutorials and architecture guides, check out CodeUp.