2. Core Concepts¶
2.1 Document Definition¶
Butty documents are Pydantic models that inherit from Document[ID_T], where the generic parameter specifies the
identity type (e.g., int, str, or ObjectId). These behave as standard Pydantic models while adding MongoDB
persistence capabilities through the generic CRUD API built into the Document base class, parameterized with identity
type and concrete Document type. This gives IDE and type tooling the ability to check function calls and infer result
types, whether for single documents or sequences. The class definition should include an identity field declaration,
corresponding with the generic parameter, annotated with IdentityField().
When acting as a base class for all concrete Documents in the application, it should be marked as ABC to prevent
inclusion in the document registry for autobinding.
Example of custom str identity Document:
class BaseDocument(Document[str], ABC):
id: Annotated[str | None, IdentityField(identity_provider=lambda: str(uuid4()))] = None
Example of MongoDB ObjectId identity Document:
class BaseDocument(Document[ObjectId], ABC):
id: Annotated[ObjectId | None, IdentityField(alias="_id")] = None
model_config = ConfigDict(
arbitrary_types_allowed=True,
)
Note:
arbitrary_types_allowedoption in model config (v2 syntax is used here), becauseObjectIdis not supported by Pydantic by default. Full model configuration to support e.g. JSON serialization ofObjectIdis up to developers and out of scope of this guide.
The collection name for a bound Document is generated automatically, and the name of the class itself by default.
Collection naming can be customized either through the collection_name_format callable during engine setup or via the
collection_name and collection_name_from_model fields of the DocumentConfig class (see 2.3 Document Config).
Example of DocumentConfig with collection_name:
class User(BaseDocument):
name: str
class DocumentConfig(DocumentConfigBase):
collection_name = "users_collection"
2.2 Identity Management¶
Butty offers flexible identity management for both MongoDB-native and custom identifiers. Each document must declare
exactly one identity field marked with IdentityField(), with its type matching the document’s ID_T generic
parameter. This field acts as the primary key for all persistence operations.
The identity lifecycle differentiates between transient documents (with None identity) and persisted ones, requiring
the identity field to be optional in most cases. When saving a transient document, the system generates a new identity
based on the configured strategy and inserts the document. For documents with existing identities, save operations
perform updates by default, though this can be configured via the save() method’s mode parameter.
Three primary identity strategies are supported through IdentityField() configuration:
Native MongoDB _id¶
The strategy is automatically detected when the _id alias is present in the identity field declaration. While offering
optimal database performance, this requires additional Pydantic configuration for JSON serialization and ties the
application to MongoDB’s identifier format.
Example of native MongoDB identity field declaration:
class BaseDocument(Document[ObjectId], ABC):
id: Annotated[ObjectId | None, IdentityField(alias="_id")] = None
Custom identity¶
Supports synchronous and asynchronous providers through identity_provider and identity_provider_factory parameters.
The factory pattern enables class-aware identity sequences, maintaining JSON compatibility while supporting
application-controlled identifiers like auto-incremented integers.
Example of identity provider factory, which returns async coroutine for serial id generation:
class SerialIDCounter(Document[str]):
name: Annotated[str, IdentityField()]
count: int
def _serial(doc_model: DocModel) -> Callable[[], Awaitable[int]]:
async def identity_provider() -> int:
return (
await SerialIDCounter.update_document(
doc_model.__collection__.name,
Inc({F(SerialIDCounter.count): 1}),
upsert=True,
)
).count
return identity_provider
class SerialIDDocument(Document[int], ABC):
id: Annotated[int | None, IdentityField(identity_provider_factory=_serial)] = None
Provided identities¶
Identity can be managed outside the Butty engine. For example, the identity field can use default_factory from
Pydantic’s model configuration. Documents with existing identities are treated as persistent by default, requiring
explicit insert mode specification for new documents.
Example of provided identity field:
class SerialIDCounter(Document[str]):
name: Annotated[str, IdentityField()]
count: int
2.3 Document Linking¶
Direct Links¶
Butty implements document linking through standard Pydantic model references without requiring proprietary wrapper
types. When a field is annotated with another Document type, the system automatically establishes a relationship
between documents. This approach maintains full type safety while allowing flexible relationship configurations.
By default, linked documents are stored as references (using the target document’s identity value). To store a document
as an embedded object instead, the link must be marked with LinkField(link_ignore=True). The link value is
automatically determined from the destination document’s identity field.
The MongoDB document field name representing the link can be configured either:
Per field using
LinkField(link_name=...)Globally via the engine’s
link_name_formatcallable
When unspecified, the field alias becomes the storage field name. This allows semantic naming patterns like storing as
user_id while declaring the field as user: User.
Special collection types are supported for document linking:
list[Document]ortuple[Document]- Represents arrays of referencesdict[str, Document]- Represents dictionaries of references
All other field types, including nested Pydantic BaseModel instances, are always stored as embedded documents - even
when they contain Document-typed fields themselves.
Example of linked documents:
class Department(BaseDocument):
name: Annotated[str]
class User(BaseDocument):
department: Department
name: str
Backlinks¶
While automatic linking and LinkField() handle forward relationships, backlinks enable reverse document traversal. For
example, when OrderItem references its parent Order, backlinks allow querying all OrderItems belonging to a
specific Order.
Backlinks require explicit declaration with BackLinkField() and must be:
Optional fields (marked with
| None)Typed as collections (
list[Document]ortuple[Document])Paired with exactly one forward reference from the target document type
The system automatically maintains backlink integrity without duplicating reference data in storage.
Example of backlinks definition:
class Order(BaseDocument):
order_items: Annotated[list[OrderItem] | None, BackLinkField()] = None
class OrderItem(BaseDocument):
order: Order
product_name: str
amount: float
Order.model_rebuild()
Note: Backlinks require
ForwardRefupdates for mutual model references. In Pydantic v1 (withfrom __future__ import annotations), due to issue #10509, the syntaxorder_items: Annotated[list[OrderItem] | None, BackLinkField()] = Noneisn’t supported - useorder_items: list[OrderItem] | None = BackLinkField(None)instead.
Automatic Pipeline Generation¶
By analyzing the complete document relationship graph during initialization, Butty automatically generates MongoDB aggregation pipelines with nested lookups. This forms the core value of the engine, enabling efficient joins across related documents while maintaining type safety. The pipelines are generated statically and cannot be dynamically adjusted during read operations - all nesting levels and lookup paths are predetermined. This design means the relationship graph cannot contain circular references or self-references, ensuring all document connections can be resolved through a finite series of lookups.
2.4 Query Building¶
The query builder provides type-safe construction of MongoDB queries through operator overloading and logical
composition. During engine setup, standard attributes of Document models are replaced with ButtyField instances that
maintain original field aliases and document nesting structure. These references support attribute-style navigation
through nested documents and collections while maintaining proper MongoDB field path notation. The system preserves
original field names from Pydantic models while handling alias translation for database operations.
Comparison operations implement standard Python comparison operators (==, >, >=, <, <=, !=) which generate
appropriate MongoDB query operators ($eq, $gt, $gte, $lt, $lte, $ne). The modulo operator (%) provides
regular expression matching support, translating to MongoDB’s $regex operator with configurable options. Each
comparison operation produces a query leaf node containing the field reference, operator, and comparison value.
Logical operators combine query components using & (AND) and | (OR) operators, building nested query structures that
translate to MongoDB’s $and and $or operators. The query builder maintains proper operator precedence through
explicit grouping, ensuring logical expressions evaluate as intended. Complex queries can combine multiple levels of
logical operations with various comparison conditions.
The query interface requires all components to implement conversion to native MongoDB query syntax through the
to_mongo_query() method. This polymorphic design allows mixing raw dictionary queries with builder-constructed queries
while maintaining consistent output format. The system automatically handles field alias substitution when converting
builder queries to database syntax.
Utility functions F() and Q() provide explicit type conversion points for static type checkers. F() casts model
fields to ButtyField instances for query building, while Q() finalizes query construction by converting builder
objects or hybrid dictionaries to pure MongoDB query syntax. These functions serve as integration points between the
type-safe builder and raw dictionary queries.
Update operations follow a similar pattern with dedicated operators like Set and Inc that construct MongoDB update
documents. These operations maintain field reference integrity while supporting both direct values and nested update
expressions. The update builder ensures proper syntax generation for atomic update operations.
Queries example:
class Product(BaseDocument):
name: str
price: float
class Order(BaseDocument):
order_items: Annotated[list[OrderItem] | None, BackLinkField()] = None
class OrderItem(BaseDocument):
order: Order
product: Product
Order.model_rebuild()
async def main():
await Product.find(F(Product.name) == "Chair")
await Product.find({F(Product.name): "Chair"})
await Product.find(F(Product.price) > 100)
await Product.find((F(Product.price) > 100) & (F(Product.name) == "Chair"))
await Order.find(F(Order.order_items[...].product.name) == "Chair")
Note: The array query syntax
[...]primarily serves to satisfy IDE attributes validation. The expression can alternatively be written asF(Order.order_items).product.name- this bypasses IDE attribute resolution while remaining fully valid MongoDB query syntax.
2.5 Collection Views¶
Butty supports MongoDB collection views through the document configuration system. Multiple document types can reference the same underlying MongoDB collection while presenting different schemas and validation rules. This enables scenarios where different application components need varying perspectives on the same data.
Collection views are configured by specifying the source document class in the collection_name_from_model field of the
DocumentConfig. The view document inherits the collection binding of its source while maintaining independent schema
validation and field definitions. All documents sharing a collection must use compatible identity types.
When saving documents through a view, only fields explicitly defined in that view document will be modified in the database. All other fields in the underlying collection remain untouched.
Example of collection view:
class User(BaseDocument):
name: str
password: str
class UserView(BaseDocument):
name: str
class DocumentConfig(DocumentConfigBase):
collection_name_from_model = User
2.6 Versioning for Optimistic Concurrency Control¶
Butty implements optimistic concurrency control through configurable version fields. The version field can use any
comparable type (integer, string, etc.) annotated with VersionField, with custom logic for generating new version
values. During save operations, the system performs an atomic check comparing the document’s current version against the
stored value, rejecting the update if they don’t match.
Version value generation is fully customizable through the version provider function, which receives the current value
and returns the next version. This allows implementations ranging from simple counters to UUID-based schemes or
timestamp versions. Failed updates due to version conflicts raise DocumentNotFound, while successful updates
atomically persist both the new document state and its updated version identifier.
Example of BaseDocument with version field:
class BaseDocument(OIDDocument):
version: Annotated[int | None, VersionField(version_provider=lambda v: 0 if v is None else v + 1)] = None
2.7 Document Config¶
Butty provides document configuration through a nested DocumentConfig class that should inherit from
DocumentConfigBase. This inheritance enables IDE autocompletion and type checking for the available configuration
options:
collection_name: Explicit MongoDB collection namecollection_name_from_model: Document class whose collection should be reused (creates a view)
2.8 Fields Declaration¶
Butty supports two syntax variations for field definitions, both using native Pydantic field declarations enhanced with specialized field information:
Annotatedstyle:Annotated[FieldType, FieldInformation(...)]Default value style:
FieldType = FieldInformation(...)
Available field information types provide document-specific capabilities:
IdentityField(): Designates the primary key field. Supports custom identity providers throughidentity_providerandidentity_provider_factoryparameters.VersionField(): Enables optimistic concurrency control. Requires aversion_providerfunction to generate version values.LinkField(): Defines document relationships. Configurable with:link_name: Custom storage field nameon_delete: Cascade behavior (“nothing”, “cascade”, “propagate”)link_ignore: Skip link processing
BackLinkField(): Creates reverse references from linked documents.IndexedField(): Specifies fields for MongoDB indexing. Supportsuniqueconstraint flag.
All field information types maintain compatibility with standard Pydantic field arguments while extending functionality for MongoDB operations.
Example of IndexedField declarations:
class Department(BaseDocument):
name: Annotated[str, IndexedField()] = "unknown"
# or
class Department(BaseDocument):
name: str = IndexedField("unknown")
2.8 Error handling¶
Butty uses a hierarchy of exceptions while also propagating relevant MongoDB driver exceptions for operational
integrity. All Butty-specific errors inherit from ButtyError, providing consistent error handling while maintaining
separation from other exception types. Certain operations may raise native MongoDB exceptions like DuplicateKeyError
alongside Butty’s exception types to reflect database-level constraints.
Key error types include:
ButtyError: Base class for all Butty-specific exceptionsButtyValueError: Indicates invalid field values or operation parametersDocumentNotFound: Signals missing documents during get/update operations (containsdoc_model,op, andqueryattributes)MongoDB driver exceptions: Including
DuplicateKeyErrorfor identity conflicts during insert operations