By Ryan Calloway. Updated May 2026.
Verdict at a glance
- Best for dataclasses: internal value objects, config you build in code, hot paths, zero-dependency libraries, Python 3.13’s new
__replace__protocol - Best for Pydantic v2: anything crossing a network or file boundary — FastAPI request models, JSON config, LLM output, webhook payloads
- Watch out for: Pydantic models are 4–6x larger in memory;
BaseModelimport time is non-trivial; v1→v2 migration still bites legacy projects - Use both for: Pydantic at the edges, dataclasses inside the domain — the hybrid pattern most production codebases land on after one cycle
Quick answer
Use a @dataclass when the data is built in code from types you already trust. Use a BaseModel when the data crosses a trust boundary — HTTP, files, environment, queues, LLM output, anything you did not just create in this process. Pydantic v2 is fast enough that “Pydantic is slow” is no longer a reason to pick dataclasses for API code; reach for dataclasses inside the domain because they are 5–15x faster to instantiate and ship in the standard library, not because v2 is slow. The rest of this post is the decision tree, the syntax side-by-side, the v2.10 features that move the line, and the hybrid pattern I keep landing on.
The short comparison
| Dimension | @dataclass (stdlib) |
Pydantic v2.10 BaseModel |
|---|---|---|
| Install | stdlib (Python 3.7+) | pip install pydantic (v2.10+) |
| Validates types at runtime | No | Yes (Rust core) |
Coerces "5" to 5 |
No | Yes (configurable) |
| JSON in/out | asdict() + json.dumps |
model_dump_json() / model_validate_json() |
| Instantiation speed | 5–15x faster than Pydantic | 2–3x slower than dataclass |
| JSON serialization speed | Slower (pure-Python json) |
Faster on non-trivial payloads (Rust serializer) |
| Memory per instance | Lower (especially with slots=True) |
4–6x larger |
| Custom validators | __post_init__ by hand |
@field_validator, @model_validator |
| Discriminated unions | Manual | First-class |
| JSON Schema / OpenAPI | Not built in | Built in |
Python 3.13 __replace__ |
Yes (PEP 736) | Yes (added in v2.10) |
Two columns, one philosophy difference. @dataclass is a typed tuple with a pretty __repr__. BaseModel is a parser that builds a typed object if and only if the input matches the schema.
The minimal syntax, side by side
from dataclasses import dataclass
@dataclass(slots=True)
class User:
id: int
email: str
is_active: bool = True
User(id=1, email="ryan@example.com")
# User(id=1, email='ryan@example.com', is_active=True)
User(id="1", email="ryan@example.com")
# Works! id is now the string "1". No validation.
from pydantic import BaseModel, EmailStr
class User(BaseModel):
id: int
email: EmailStr
is_active: bool = True
User(id=1, email="ryan@example.com")
# id=1 email='ryan@example.com' is_active=True
User(id="1", email="ryan@example.com")
# id=1 email='ryan@example.com' is_active=True
# Coerced "1" to 1; raises if it cannot.
User(id="not-an-int", email="ryan@example.com")
# pydantic.ValidationError: 1 validation error for User
# id: Input should be a valid integer...
The dataclass took the string "1" and wrote it to self.id as a string. Pydantic coerced it. Dataclass takes "not-an-int" and writes it. Pydantic raises a structured error with field paths — the same error FastAPI turns into a 422 response without you writing a line of validation code.
When to pick dataclasses
- Internal value objects. A
Point(x, y), aCacheKey(user_id, tenant), aDateRange(start, end). Data you build in code from data you already typed. - Config at module level. Typed constants used across the file or package.
- Hot paths where 5–15x instantiation overhead actually shows up. Rare in web code; common in numeric and game loops. Profile first.
- Libraries that want zero third-party dependencies.
dataclassesis stdlib; Pydantic is a pin. - Pattern matching. Dataclasses work cleanly with
match/casedestructuring (PEP 634). Pydantic models do too, but the dataclass case is the one teachers reach for.
from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class CacheKey:
user_id: int
tenant: str
flags: tuple[str, ...] = ()
CacheKey(1, "acme", ("beta", "billing"))
frozen=True makes instances immutable and hashable so you can use them as dict keys and in sets. slots=True avoids the per-instance __dict__, dropping memory and shaving attribute access. Use tuple for collection fields; the default list as a default value is the dataclass mutability footgun the docs warn about, and you will burn yourself on it exactly once.
Python 3.13 added the __replace__ protocol via PEP 736, which gives you copy.replace(obj, field=value) out of the box on dataclasses. Pydantic added the same protocol in v2.10 so you can swap dataclasses and Pydantic models without changing the call site.
When to pick Pydantic v2
- Anything crossing a network boundary. FastAPI request and response models, webhooks, inter-service RPC, gRPC-translated payloads.
- Config loaded from YAML, TOML, JSON, or environment variables. The
pydantic-settingscompanion library handles env-var overlays cleanly. - ORM or document rows that need validation on read. Pydantic + SQLAlchemy or SQLModel (Pydantic-based) where you want one class for both.
- LLM structured output. v2.10 added
experimental_allow_partialonTypeAdapter, so you can validate streaming JSON before the model finishes generating it. The use case the Pydantic team called out at launch. - Custom validators, discriminated unions, field aliases, JSON Schema. All native in Pydantic, painful to bolt onto dataclasses.
from pydantic import BaseModel, Field, EmailStr, field_validator
class CreateUser(BaseModel):
email: EmailStr
password: str = Field(min_length=12, max_length=128)
age: int = Field(ge=13, le=130)
@field_validator("password")
@classmethod
def must_have_digit(cls, v: str) -> str:
if not any(ch.isdigit() for ch in v):
raise ValueError("password must contain at least one digit")
return v
CreateUser.model_validate_json(b'{"email":"r@ex.com","password":"hunter12abcd","age":34}')
Three constraints, one custom validator, structured errors with field paths. The dataclass equivalent is a __post_init__ with five if branches that you have to keep in sync with the type hints by hand. By the third field you want Pydantic.
Performance: the numbers that survive in 2026
“Pydantic is slow” was true in 2022. Pydantic v2 rewrote the validation core in Rust as pydantic-core; the Pydantic team’s own benchmarks claim 5–50x speedup over v1 depending on the operation. That gap is real. The numbers that still matter in 2026:
- Instantiation: stdlib
@dataclassis roughly 5–15x faster thanBaseModelfor the same shape, depending on field count and types. Independent benchmarks land in this range; treat any specific multiplier with suspicion until you re-run on your hardware. - JSON deserialization: Pydantic’s
model_validate_jsongoes from JSON bytes straight to a typed object in Rust. For non-trivial payloads it can match or beatjson.loads+ manual dataclass construction, because the dataclass path runsjson.loadsin Python and then assigns each field individually. - JSON serialization: Pydantic v2’s
model_dump_jsonuses the same Rust core. On payloads with nested models, it is faster thandataclasses.asdict()followed byjson.dumps. - Memory: Pydantic instances carry validator metadata; expect 4–6x the per-instance footprint of a slotted dataclass. On a hundred objects this is noise; on ten million it is the whole problem.
- Import time:
from pydantic import BaseModelimports several hundred KB of compiled code. CLI tools that re-spawn Python on every invocation pay for this on every run; scripts that import once and stay alive do not.
If raw speed is the requirement and you do not need a real schema, msgspec is the third option. It validates, serializes JSON and MessagePack, and on its own benchmarks beats Pydantic and orjson by a meaningful margin. Real-world adoption is thinner; reach for it when profiling makes you reach for it.
Pydantic dataclasses: the third option you usually do not need
pydantic.dataclasses.dataclass wraps the standard @dataclass decorator and adds Pydantic validation. The shape is dataclass; the engine is Pydantic.
from pydantic.dataclasses import dataclass
from pydantic import Field
@dataclass
class Item:
name: str
price: float = Field(gt=0)
tags: tuple[str, ...] = ()
Item(name="widget", price="9.99")
# Item(name='widget', price=9.99, tags=()) # coerced
Item(name="widget", price=-1)
# pydantic.ValidationError: price must be greater than 0
Useful when you have an existing dataclass-shaped codebase and want to add validation at one or two boundaries without rewriting every model as BaseModel. Less useful in greenfield projects; if you want Pydantic features, use BaseModel directly. The v2.10 release added defer_build support and Python 3.13’s __replace__ protocol on Pydantic dataclasses; full notes in the v2.10.0 release.
The hybrid pattern most teams land on
Pydantic at the edges, dataclasses inside the domain. Validation runs once, at the boundary; everything past the boundary works against types it can trust.
from dataclasses import dataclass
from pydantic import BaseModel, EmailStr
# Edge: validates HTTP input
class CreateUserRequest(BaseModel):
email: EmailStr
age: int
# Domain: trusted types, fast, immutable
@dataclass(frozen=True, slots=True)
class User:
id: int
email: str
age: int
def create_user(req: CreateUserRequest) -> User:
new_id = next_id()
return User(id=new_id, email=req.email, age=req.age)
This is the pattern most FastAPI plus hexagonal-architecture projects converge on after one rewrite cycle. It also matches how pydantic-settings wants to be used: validate the YAML or environment once at startup, hand a frozen typed object to the rest of the application.
Migrating between dataclass and Pydantic (or back)
The constructor APIs are close enough that the swap is mostly mechanical. The differences that actually bite:
dataclasses.asdict(u)becomesu.model_dump();json.dumpswraps the dict for dataclasses, while Pydantic gives youu.model_dump_json()directly.- Mutation rules differ.
BaseModelinstances are mutable by default but frozen viamodel_config = ConfigDict(frozen=True);@dataclass(frozen=True)reverses the polarity. - Default factories: Pydantic uses
Field(default_factory=list), dataclasses usefield(default_factory=list). Same idea, different import. - Pydantic raises a structured
ValidationError; dataclasses raiseTypeErrorat most. Adapt the call site if it catches one. - Equality and hashing rules are different.
@dataclass(eq=True)generates value-based equality; Pydantic models compare by class and field values too, but the rules around extra fields and aliases are stricter.
FAQ
Does FastAPI require Pydantic?
For request and response models, yes. FastAPI’s auto-generated OpenAPI schema reads from Pydantic types directly; you cannot swap that out. Internal helpers, services, and domain types can be dataclasses, plain classes, or anything else. The Pydantic dependency is the request boundary, not the whole app.
Are NamedTuple and TypedDict a third and fourth option?
Sometimes. NamedTuple is immutable and positional, useful for small fixed records that benefit from tuple-style indexing. TypedDict is a static type hint over a dict; it documents shape for the type checker but does not validate at runtime. Both are lightweight; neither runs validators. Use them for narrow cases.
Should I use attrs instead?
attrs is the library that inspired stdlib dataclasses and stayed ahead in features (converters, more slots control, richer validator hooks, attrs.define with smart defaults). I would still only reach for it on a project already using it. New projects land on dataclasses for the stdlib reason or Pydantic for the validation reason; attrs sits between, and “between” is a hard sell when both ends are good.
Can Pydantic validate a stdlib dataclass?
Yes. TypeAdapter(MyDataclass).validate_python(data) takes any dict and produces a validated dataclass instance, applying type coercion and the same validation rules a BaseModel would. Useful for adding a single validation point on top of an existing dataclass-heavy codebase.
What about LLM streaming output?
v2.10’s experimental_allow_partial on TypeAdapter validates incomplete JSON. Useful when an LLM streams a structured response and you want to update the UI as fields arrive. Pydantic-only feature; dataclasses cannot do this without rolling your own partial parser.
How do I migrate from Pydantic v1 to v2?
Run bump-pydantic for the mechanical pass. The footguns that survive: .dict() renamed to .model_dump(), parse_obj renamed to model_validate, Config inner class replaced by model_config = ConfigDict(...), and @validator replaced by @field_validator with stricter signatures. The official migration guide covers the rest.
Sources and further reading
- Python
dataclassesreference — stdlib documentation - PEP 557 — Data Classes — the original specification
- PEP 736 — Shorthand syntax for keyword arguments — context for the 3.13
__replace__protocol - Pydantic v2 documentation — the canonical reference
- Pydantic v2.10 release announcement — partial validation, default factories, Python 3.13 support
pydantic-coreon GitHub — the Rust validator core- Pydantic performance guide — benchmarks and optimisation tips
- msgspec documentation — the high-performance third option
- attrs documentation — the library that inspired
dataclasses
For the data-loading side of this — fetching JSON from an API or parsing scraped HTML before it ever reaches a model — see the Python web scraping with BeautifulSoup tutorial.