Data modeling best practices
This guide covers best practices for modeling data in Fauna.
Use indexes for commonly accessed data
Indexes are the most important and effective tool to increase performance and reduce the cost of your queries.
Avoid uncovered queries whenever possible. To reduce document reads, include any frequently queried fields in indexes.
See Indexes |
---|
Use computed fields to reduce storage
If you’re storing a large amount of data, you can use computed fields to reduce storage where applicable.
Computed fields aren’t part of the original document or persistently stored. Instead, the field’s value is computed on each read.
To avoid unneeded computes on read, use projection to only request computed fields when needed in queries.
See FSL collection schema: Computed field definitions |
---|
Use schema to progressively enforce document types
Use a collection schema’s document type to check the presence and type of document field values on write. You use document types to enforce enumerated field values and allow arbitrary ad hoc fields.
If your application’s data model changes, you can use zero-downtime migrations to add field definitions for ad hoc fields and normalize field values. This lets you move from a permissive document type to strict one (or the reverse).
See Schema |
---|
Validate data with constraints
Use constraints to validate field values using predefined rules.
For example, you can use a unique constraint to ensure each end user has a unique email address.
Similarly, you can use a check constraint to apply other business logic. For example, you can ensure:
-
age
field values are greater than zero -
Projects are scheduled in the future
-
Purchases don’t reduce a user’s
balance
to a negative number
See FSL collection schema: Unique constraint definitions, FSL collection schema: Check constraint definitions |
---|
Use ttl
for document retention
Use the optional ttl
(time-to-live) document metadata field to automatically
clean up completed or obsolete documents. ttl
Sets an expiration timestamp for
the document.
Set a default retention period for a collection’s documents using the
collection schema's ttl_days
field.
See Document time-to-live (TTL) |
---|
For multi-tenant apps, use a child database per tenant
You can use FQL queries or the Fauna CLI to programmatically create a child database per tenant. Databases are instantly allocated.
Using child databases lets you build multi-tenant applications with strong isolation guarantees. Each database is logically isolated from its peers, with separate access controls. You can manage all tenants from a single parent database.
Avoid creating a collection for each tenant. Collections don’t offer strong isolation or separate access controls. A database can only contain 1,024 collections. |
Use CI/CD to manage schema across databases
An FSL schema is scoped to a single database and doesn’t apply to its peer or child databases.
If you have a multi-tenant application, you can copy and deploy schema across databases using FSL and a CI/CD pipeline. See Manage schema with a CI/CD pipeline.
Avoid concurrent schema changes
Concurrent unstaged schema changes can cause contended transactions, even if the changes affect different resources. This includes unstaged changes made using:
-
The Fauna CLI
-
The Fauna Dashboard
-
The Fauna Core HTTP API’s Schema endpoints
A schema change triggers a transaction that validates the entire database schema. To avoid errors, do one of the following instead:
-
Perform unstaged schema changes sequentially
Modeling relational data
Relational data represents connections between different pieces of data in your database. In Fauna, you can model these relationships in two ways:
-
Storing a reference to a related document, similar to a foreign key in a traditional relational database.
-
Embedding related data directly in a parent document.
What is embedding?
Embedding means storing non-scalar data directly inside a document, rather than in separate documents. This can include Arrays, Objects, or any composition of those two structures.
For example, instead of creating separate documents for a customer’s address details, you might embed them directly in the customer document:
Customer.create({
name: "Jane Doe",
email: "jdoe@example.com",
// Instead of creating a separate `Address` collection
// document, address information is embedded directly
// in the `address` field.
address: {
street: "5 Troy Trail",
city: "Washington",
state: "DC",
postalCode: "20220",
country: "US"
}
})
When to use document references
Generally, we recommend using document references when:
-
The data is referenced across many documents.
-
The referenced data is frequently updated.
-
The relationship(s) may change.
In most cases, using document references optimizes for faster, less expensive writes at the cost of slower, more expensive reads. See Comparing document references and embedding.
When to use embedding
We recommend using embedding when:
-
The referenced data is small.
-
The referenced data is tightly coupled to its parent document.
-
The parent document(s) and the referenced data are typically accessed together.
In most cases, embedding optimizes for faster, less expensive read at the cost of slower, more expensive writes. See Comparing document references and embedding.
Mixing approaches
The choice to embed related data or use document references doesn’t affect your ability to constrain a document type using schema. You can mix and match, such as using field definitions to constrain embedded data or storing document references in schemaless documents.
Comparing document references and embedding
The following table outlines the major differences between using document references and embedding and to model relational data.
Difference | Document references | Embedding |
---|---|---|
Reads and indexing |
Potentially slower and more expensive. Traversing document references requires a read of the document. You can’t index a referenced document’s field values. |
Potentially faster and less expensive. Embedded field values can be indexed and retrieved without a document read. See Indexes. |
Writes |
Potentially faster and less expensive. Updating a referenced document doesn’t affect documents that contain the reference. |
Potentially slower and more expensive. Updating embedded data requires a rewrite of the entire parent document. |
Referential integrity |
Easier to maintain referential integrity. The referenced document acts as a single source of truth. However, deleting the referenced document can create a dangling reference. |
Risks violating referential integrity if the embedded data is duplicated across many documents and not kept in sync. |
Storage |
Typically more efficient if the referenced data is shared across multiple documents. |
Typically more efficient if the referenced data is tightly coupled with its parent document(s) and not duplicated across multiple documents. |
Embedding examples
Embed an Array of objects on one side of the relation
To model one-to-many or many-to-many relationships, you can embed data as an array of objects:
film.create({
title: "Academy Dinosaur",
actors: [
{
name: {
first: "Penelope",
last: "Guinness"
}
},
{
name: {
first: "Johnny",
last: "Lollobrigida"
}
},
]
})
By replacing an association table/collection with an embedded Array, the querying of the data becomes rather simple:
// assuming you have a index `byTitle`
film.byTitle("Academy Dinosaur").first() {
actors
}
This pattern satisfies the need for a many-to-many relationship. You can optimize for queries starting with the other side of the joins (start with actors, not films).
// Unoptimized query
film.where(.actors.map(.name.first).includes("Penelope"))
You can use ad-hoc filtering of the fields inside the Array of objects by introducing an index on a computed field.
collection film {
compute actorsByFirstName = (.actors.map(item => item.name.first))
index filmsByActor {
terms [mva(.actorsByFirstName)]
}
}
Query to find all the films by an actor:
// Optimized query
film.filmsByActor("Penelope") {
name
}
Advantages | Disadvantages |
---|---|
Efficient for both reads & writes. Lowest latency and simplified operations for both. |
Storage likely increases due to data duplication. Although storage is cheap and the duplicated fields are small. |
Querying is flexible. You can start query from either side (film or actor). You can filter by either. All fields for the return Set are available from both sides. |
Increased effort for updating values. You’d need to apply that to all locations if an actor’s name were to change. |
The least number of operations for both reads and writes (1), and the least compute effort. |
Write concurrency. Updating both actors and films with high concurrency could cause contention. |
Embed an Array of references on one side of the relation
Instead of embedding the entire document in parent, store references in the parent document.
film.create({
title: "Giant",
actors: [
actor.byId("406683323649228873"),
actor.byId("416683323649229986")
]
})
Querying the data is similar to the previous example:
film.where(.actors.map(a => a == actor.byId("406683323649228873")).first()) {
title
}
Advantages | Disadvantages |
---|---|
Less modeling complexity than association tables. |
Requires more read Input/Output operations to gather query data. In this case each of the actors selected into the result Set would need an additional IO to gather their data. This would inflate the number of reads for a query from 1 per film to 1 per film plus 1 per actor. |
Overall storage should be about the same as an association table. |
Indexing is no longer available on the embedded items' raw values, increasing query complexity. In this case a query starting from the actor side would need to use a nested query pattern (sub-queries). |
Updating the foreign record (actor in this example) is independent and fast. |
|
Data duplication is less than the basic embedded pattern |
|
Changing the Array of values in the parent document (list of actors in a film in our example) is optimized as it would be far less data to transfer and update. |
|
Allows for the foreign data to change in the future (if we wanted to add fields to the actor’s data, like middle name, place of birth, etc) compared to the basic embedded pattern. |
Embed on both side of relations
Another potential pattern for modeling many-to-many relationships with Fauna is to embed Arrays of references into the documents on both sides of the relationship.
This approach is most suitable when relationships are relatively static. One main drawback of this pattern is data redundancy.
Your document structure would look like this:
// film document
{
"id": "12323",
"title": "The Great Adventure",
"release_year": 2024,
"genre": "Adventure",
"actors": [
Ref<actor>("122"),
Ref<actor>("123"),
]
}
// actor document
{
"id": "222",
"name": "John Smith",
"birthdate": "1980-05-20",
"films": [
Ref<film>("12323"),
Ref<film>("12324"),
]
}
Is this article helpful?
Tell Fauna how the article can be improved:
Visit Fauna's forums
or email docs@fauna.com
Thank you for your feedback!