Gemini API — common patterns
Multi-turn chat, streaming, system instructions (a separate parameter, not the first message), Files API for inputs over 100 MB, context caching for cost savings on repeated context, structured output with JSON schemas, and safety settings.
What this page covers#
Seven patterns you’ll reach for in almost every project:
- Multi-turn chat (the
contentslist shape) - Streaming responses
- System instructions (a separate parameter)
- The Files API (inputs over 100 MB, PDFs over 50 MB)
- Context caching (cost savings for repeated large context)
- Structured output (JSON with constrained decoding)
- Safety settings
Plus a brief note on the Live API for real-time voice/video.
1. Multi-turn chat#
The simplest path uses the chats helper:
from google import genai
client = genai.Client()
chat = client.chats.create(model="gemini-2.5-flash")
print(chat.send_message("My name is Sush.").text)
print(chat.send_message("What's my name?").text)
Under the hood, chats accumulates contents and resends them each turn. For more control, build the contents list yourself:
from google.genai import types
contents = [
types.Content(role="user", parts=[types.Part(text="Hi")]),
types.Content(role="model", parts=[types.Part(text="Hello!")]),
types.Content(role="user", parts=[types.Part(text="What did I just say?")]),
]
response = client.models.generate_content(
model="gemini-2.5-flash", contents=contents
)
role is "user" or "model" (not "assistant" like OpenAI). System messages don’t go in contents — see pattern 3.
2. Streaming responses#
for chunk in client.models.generate_content_stream(
model="gemini-2.5-flash",
contents="Tell me a story in 300 words."
):
print(chunk.text, end="", flush=True)
Underlying transport is Server-Sent Events for the REST API. The SDK gives you a Python iterator (or async iterator). For real-time voice / video, you want the Live API instead — see the bottom of this page.
3. System instructions#
System instructions go in a dedicated parameter, not as the first message in contents:
from google.genai import types
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="Why is the sky blue?",
config=types.GenerateContentConfig(
system_instruction="Respond in exactly one sentence.",
temperature=0.3,
max_output_tokens=100,
),
)
Why this matters:
- It persists across turns in a chat without taking a slot in
contents - It’s not subject to user-injected prompt mutation (cleaner separation)
- Switching system instructions mid-conversation is a single config change
This contrasts with OpenAI’s pattern (system message as the first item in messages); migrating from OpenAI requires lifting the system message out into the config.
4. Files API (over 100 MB / PDFs over 50 MB)#
Inline base64 encoding is fine for small files. For large media, use the Files API:
client = genai.Client()
# Upload once
file = client.files.upload(file="big-presentation.pdf")
# file.uri is now a gs:// reference Gemini can read
response = client.models.generate_content(
model="gemini-2.5-pro",
contents=[file, "Summarise the key arguments."],
)
print(response.text)
Key facts:
- Files persist for 48 hours then auto-delete; or call
client.files.delete(file.name) - Per-file max: 2 GB (general Files API limit); 50 MB threshold for inline PDFs; 100 MB total request size threshold
- Supported types: PDFs, images, audio, video, generic documents
For repeated reuse of a large input, also see context caching (next).
5. Context caching (cost optimisation for repeated context)#
If you’re feeding the same large input (a 200K-token codebase, a long PDF, a 1M-token book) into many prompts, cache it once:
from google.genai import types
cache = client.caches.create(
model="gemini-2.5-flash",
config=types.CreateCachedContentConfig(
contents=[file], # the big context
system_instruction="You are a helpful book reviewer.",
ttl="3600s", # 1 hour
),
)
# Subsequent calls reference the cache
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="What's the main argument of chapter 3?",
config=types.GenerateContentConfig(cached_content=cache.name),
)
Pricing for gemini-2.5-flash (as of mid-2026):
- Cache storage: $1.00 per 1M tokens per hour
- Cache read: $0.03 per 1M tokens (text/image/video) / $0.10 per 1M tokens (audio)
For workloads that re-read 100K+ tokens of context per request, caching can drop your bill by an order of magnitude.
6. Structured output (JSON with guaranteed schema)#
Set response_mime_type and response_json_schema. Gemini uses constrained decoding — the docs describe this as returning schema-valid JSON (rather than best-effort prompt-based JSON).
Python with Pydantic (cleanest)#
from google.genai import types
from pydantic import BaseModel
from typing import List, Optional
class Ingredient(BaseModel):
name: str
quantity: str
class Recipe(BaseModel):
recipe_name: str
prep_time_minutes: Optional[int]
ingredients: List[Ingredient]
instructions: List[str]
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="Extract this recipe text: ...",
config=types.GenerateContentConfig(
response_mime_type="application/json",
response_json_schema=Recipe.model_json_schema(),
),
)
recipe = Recipe.model_validate_json(response.text)
TypeScript with Zod#
import { GoogleGenAI } from "@google/genai";
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
const recipeSchema = z.object({
recipe_name: z.string(),
ingredients: z.array(z.object({ name: z.string(), quantity: z.string() })),
instructions: z.array(z.string()),
});
const response = await ai.models.generateContent({
model: "gemini-2.5-flash",
contents: "Extract this recipe: ...",
config: {
responseFormat: {
text: {
mimeType: "application/json",
schema: zodToJsonSchema(recipeSchema),
},
},
},
});
const recipe = recipeSchema.parse(JSON.parse(response.text));
Without a schema (just JSON mode)#
config=types.GenerateContentConfig(response_mime_type="application/json")
Output is JSON; structure is up to the model. Useful when the shape varies per request.
7. Safety settings#
Four adjustable categories:
HARM_CATEGORY_HARASSMENTHARM_CATEGORY_HATE_SPEECHHARM_CATEGORY_SEXUALLY_EXPLICITHARM_CATEGORY_DANGEROUS_CONTENT
Five block thresholds:
| API value | AI Studio label | Blocks |
|---|---|---|
OFF | Off | Nothing |
BLOCK_NONE | Block none | Always show |
BLOCK_ONLY_HIGH | Block few | Only HIGH probability |
BLOCK_MEDIUM_AND_ABOVE | Block some | MEDIUM + HIGH |
BLOCK_LOW_AND_ABOVE | Block most | LOW + MEDIUM + HIGH |
from google.genai import types
config = types.GenerateContentConfig(
safety_settings=[
types.SafetySetting(
category=types.HarmCategory.HARM_CATEGORY_HARASSMENT,
threshold=types.HarmBlockThreshold.BLOCK_ONLY_HIGH,
),
],
)
Key default: for Gemini 2.5 and 3 models, the adjustable filters default to OFF. Built-in non-adjustable blocks for child-safety harms remain on at all times. Filtering is probability-based, not severity-based — adjust if you find the defaults too lenient or too strict for your domain.
Live API (real-time voice + video)#
For low-latency conversational apps (think Gemini app’s voice mode), the Live API uses a WebSocket connection rather than HTTP:
- Input: 16-bit PCM audio at 16 kHz, JPEG images at ≤1 fps, text
- Output: 16-bit PCM audio at 24 kHz, text
- Supports interruption (“barge-in”) mid-response
- Function calling and Google Search grounding work over Live too
- 70+ languages
Models: gemini-3.1-flash-live-preview and gemini-2.5-flash-native-audio-preview-12-2025 are the current Live API models. (An earlier gemini-live-2.5-flash-preview was shut down in December 2025.) Test in AI Studio’s Realtime streaming tab before wiring it into your app.
What’s next#
- §GAPI.5 Tool use — function calling, Google Search grounding, code execution
- §GAPI.3 Models — pick the right model for your pattern
- §GAPI.2 Getting started — if you skipped here directly
Sources
- https://ai.google.dev/gemini-api/docs/quickstart
- https://ai.google.dev/gemini-api/docs/structured-output
- https://ai.google.dev/gemini-api/docs/files
- https://ai.google.dev/gemini-api/docs/long-context
- https://ai.google.dev/gemini-api/docs/safety-settings
- https://ai.google.dev/gemini-api/docs/live-api
- https://github.com/googleapis/python-genai