Data modeling best practices

This guide covers best practices for modeling data in Fauna.

Use indexes for commonly accessed data

Indexes are the most important and effective tool to increase performance and reduce the cost of your queries.

Avoid uncovered queries whenever possible. To reduce document reads, include any frequently queried fields in indexes.

Avoid storing unneeded history

A collection schema's history_days setting defines the number of days of history to retain as document snapshots. You can use these historical snapshots to run temporal queries or replay events in event feeds and event streams.

Avoid storing unnecessary history. A high history_days setting has several impacts:

Increased read ops:

To support temporal queries, indexes cover field values from both current documents and their historical document snapshots.

To enable quicker sorting and range searches, current and historical index entries are stored together, sorted by index values. All indexes implicitly include an ascending document id as the index’s last value.

When you read data from an index, including the collection.all() index, Fauna must read from both current and historical index entries to determine if they apply to the query. Fauna then filters out any data not returned by the query.

You are charged for any Transactional Read Operations (TROs) used to read current or historical index data, including data not returned by the query.

You are not charged for any historical data older than the retention period set by the history_days setting.
Longer index build times: Because indexes include historical data, a high history_days setting can increase the index build times.
Increased query latency on indexes: If an indexed field value changes frequently, the index must retain more historical data. A high history_days setting can increase query latency on the index.
Increased storage: More document snapshots and historical index data is retained, consuming additional database storage and increasing storage costs.

Use computed fields to reduce storage

If you’re storing a large amount of data, you can use computed fields to reduce storage where applicable.

Computed fields aren’t part of the original document or persistently stored. Instead, the field’s value is computed on each read.

To avoid unneeded computes on read, use projection to only request computed fields when needed in queries.

See FSL collection schema: Computed field definitions

Use schema to progressively enforce document types

Use a collection schema’s document type to check the presence and type of document field values on write. You use document types to enforce enumerated field values and allow arbitrary ad hoc fields.

If your application’s data model changes, you can use zero-downtime migrations to add field definitions for ad hoc fields and normalize field values. This lets you move from a permissive document type to strict one (or the reverse).

See Schema

Validate data with constraints

Use constraints to validate field values using predefined rules.

For example, you can use a unique constraint to ensure each end user has a unique email address.

Similarly, you can use a check constraint to apply other business logic. For example, you can ensure:

age field values are greater than zero
Projects are scheduled in the future
Purchases don’t reduce a user’s balance to a negative number

See FSL collection schema: Unique constraint definitions, FSL collection schema: Check constraint definitions

Use `ttl` for document retention

Use the optional ttl (time-to-live) document metadata field to automatically clean up completed or obsolete documents. ttl Sets an expiration timestamp for the document.

Set a default retention period for a collection’s documents using the collection schema's ttl_days field.

See Document time-to-live (TTL)

For multi-tenant apps, use a child database per tenant

You can use FQL queries or the Fauna CLI to programmatically create a child database per tenant. Databases are instantly allocated.

Using child databases lets you build multi-tenant applications with strong isolation guarantees. Each database is logically isolated from its peers, with separate access controls. You can manage all tenants from a single parent database.

Avoid creating a collection for each tenant. Collections don’t offer strong isolation or separate access controls. A database can only contain 1,024 collections.

Use CI/CD to manage schema across databases

An FSL schema is scoped to a single database and doesn’t apply to its peer or child databases.

If you have a multi-tenant application, you can copy and deploy schema across databases using FSL and a CI/CD pipeline. See Manage schema with a CI/CD pipeline.

Avoid concurrent schema changes

Concurrent unstaged schema changes can cause contended transactions, even if the changes affect different resources. This includes unstaged changes made using:

The Fauna CLI
The Fauna Dashboard
The Fauna Core HTTP API’s Schema endpoints
FQL schema methods

A schema change triggers a transaction that validates the entire database schema. To avoid errors, do one of the following instead:

Run staged schema changes
Perform unstaged schema changes sequentially

Modeling relational data

Relational data represents connections between different pieces of data in your database. In Fauna, you can model these relationships in two ways:

Storing a reference to a related document, similar to a foreign key in a traditional relational database.
Embedding related data directly in a parent document.

What is embedding?

Embedding means storing non-scalar data directly inside a document, rather than in separate documents. This can include Arrays, Objects, or any composition of those two structures.

For example, instead of creating separate documents for a customer’s address details, you might embed them directly in the customer document:

Customer.create({
  name: "Jane Doe",
  email: "jdoe@example.com",
  // Instead of creating a separate `Address` collection
  // document, address information is embedded directly
  // in the `address` field.
  address: {
    street: "5 Troy Trail",
    city: "Washington",
    state: "DC",
    postalCode: "20220",
    country: "US"
  }
})

When to use document references

Generally, we recommend using document references when:

The data is referenced across many documents.
The referenced data is frequently updated.
The relationship(s) may change.

In most cases, using document references optimizes for faster, less expensive writes at the cost of slower, more expensive reads. See Comparing document references and embedding.

When to use embedding

We recommend using embedding when:

The referenced data is small.
The referenced data is tightly coupled to its parent document.
The parent document(s) and the referenced data are typically accessed together.

In most cases, embedding optimizes for faster, less expensive read at the cost of slower, more expensive writes. See Comparing document references and embedding.

Mixing approaches

The choice to embed related data or use document references doesn’t affect your ability to constrain a document type using schema. You can mix and match, such as using field definitions to constrain embedded data or storing document references in schemaless documents.

Comparing document references and embedding

The following table outlines the major differences between using document references and embedding and to model relational data.

Difference	Document references	Embedding
Reads and indexing	Potentially slower and more expensive. Resolving document references requires a read of the document. You can’t index a referenced document’s field values.	Potentially faster and less expensive. Embedded field values can be indexed and retrieved without a document read. See Indexes.
Writes	Potentially faster and less expensive. Updating a referenced document doesn’t affect documents that contain the reference.	Potentially slower and more expensive. Updating embedded data requires a rewrite of the entire parent document.
Referential integrity	Easier to maintain referential integrity. The referenced document acts as a single source of truth. However, deleting the referenced document can create a dangling reference.	Risks violating referential integrity if the embedded data is duplicated across many documents and not kept in sync.
Storage	Typically more efficient if the referenced data is shared across multiple documents.	Typically more efficient if the referenced data is tightly coupled with its parent document(s) and not duplicated across multiple documents.

Difference

Document references

Embedding

Reads and indexing

Potentially slower and more expensive. Resolving document references requires a read of the document. You can’t index a referenced document’s field values.

Potentially faster and less expensive. Embedded field values can be indexed and retrieved without a document read. See Indexes.

Writes

Potentially faster and less expensive. Updating a referenced document doesn’t affect documents that contain the reference.

Potentially slower and more expensive. Updating embedded data requires a rewrite of the entire parent document.

Referential integrity

Easier to maintain referential integrity. The referenced document acts as a single source of truth. However, deleting the referenced document can create a dangling reference.

Risks violating referential integrity if the embedded data is duplicated across many documents and not kept in sync.

Storage

Typically more efficient if the referenced data is shared across multiple documents.

Typically more efficient if the referenced data is tightly coupled with its parent document(s) and not duplicated across multiple documents.

Embedding examples

Embed an Array of objects on one side of the relation

To model one-to-many or many-to-many relationships, you can embed data as an array of objects:

film.create({
  title: "Academy Dinosaur",
  actors: [
    {
      name: {
        first: "Penelope",
        last: "Guinness"
      }
    },
    {
      name: {
        first: "Johnny",
        last: "Lollobrigida"
      }
    },
  ]
})

By replacing an association table/collection with an embedded Array, the querying of the data becomes rather simple:

// assuming you have a index `byTitle`

film.byTitle("Academy Dinosaur").first() {
  actors
}

This pattern satisfies the need for a many-to-many relationship. You can optimize for queries starting with the other side of the joins (start with actors, not films).

// Unoptimized query
film.where(.actors.map(.name.first).includes("Penelope"))

You can use ad-hoc filtering of the fields inside the Array of objects by introducing an index on a computed field.

collection film {

  compute actorsByFirstName = (.actors.map(item => item.name.first))

  index filmsByActor {
    terms [mva(.actorsByFirstName)]
  }
}

Query to find all the films by an actor:

// Optimized query
film.filmsByActor("Penelope") {
  name
}

Advantages	Disadvantages
Efficient for both reads & writes. Lowest latency and simplified operations for both.	Storage likely increases due to data duplication. Although storage is cheap and the duplicated fields are small.
Querying is flexible. You can start query from either side (film or actor). You can filter by either. All fields for the return Set are available from both sides.	Increased effort for updating values. You’d need to apply that to all locations if an actor’s name were to change.
The least number of operations for both reads and writes (1), and the least compute effort.	Write concurrency. Updating both actors and films with high concurrency could cause contention.

Advantages

Disadvantages

Efficient for both reads & writes. Lowest latency and simplified operations for both.

Storage likely increases due to data duplication. Although storage is cheap and the duplicated fields are small.

Querying is flexible. You can start query from either side (film or actor). You can filter by either. All fields for the return Set are available from both sides.

Increased effort for updating values. You’d need to apply that to all locations if an actor’s name were to change.

The least number of operations for both reads and writes (1), and the least compute effort.

Write concurrency. Updating both actors and films with high concurrency could cause contention.

Embed an Array of references on one side of the relation

Instead of embedding the entire document in parent, store references in the parent document.

film.create({
  title: "Giant",
  actors: [
    actor.byId("406683323649228873"),
    actor.byId("416683323649229986")
   ]
})

Querying the data is similar to the previous example:

film.where(.actors.map(a => a == actor.byId("406683323649228873")).first()) {
  title
}

Advantages	Disadvantages
Less modeling complexity than association tables.	Requires more read Input/Output operations to gather query data. In this case each of the actors selected into the result Set would need an additional IO to gather their data. This would inflate the number of reads for a query from 1 per film to 1 per film plus 1 per actor.
Overall storage should be about the same as an association table.	Indexing is no longer available on the embedded items' raw values, increasing query complexity. In this case a query starting from the actor side would need to use a nested query pattern (sub-queries).
Updating the foreign record (actor in this example) is independent and fast.
Data duplication is less than the basic embedded pattern
Changing the Array of values in the parent document (list of actors in a film in our example) is optimized as it would be far less data to transfer and update.
Allows for the foreign data to change in the future (if we wanted to add fields to the actor’s data, like middle name, place of birth, etc) compared to the basic embedded pattern.

Advantages

Disadvantages

Less modeling complexity than association tables.

Requires more read Input/Output operations to gather query data. In this case each of the actors selected into the result Set would need an additional IO to gather their data. This would inflate the number of reads for a query from 1 per film to 1 per film plus 1 per actor.

Overall storage should be about the same as an association table.

Indexing is no longer available on the embedded items' raw values, increasing query complexity. In this case a query starting from the actor side would need to use a nested query pattern (sub-queries).

Updating the foreign record (actor in this example) is independent and fast.

Data duplication is less than the basic embedded pattern

Changing the Array of values in the parent document (list of actors in a film in our example) is optimized as it would be far less data to transfer and update.

Allows for the foreign data to change in the future (if we wanted to add fields to the actor’s data, like middle name, place of birth, etc) compared to the basic embedded pattern.

Embed on both side of relations

Another potential pattern for modeling many-to-many relationships with Fauna is to embed Arrays of references into the documents on both sides of the relationship.

This approach is most suitable when relationships are relatively static. One main drawback of this pattern is data redundancy.

Your document structure would look like this:

// film document
{
 "id": "12323",
 "title": "The Great Adventure",
 "release_year": 2024,
 "genre": "Adventure",
 "actors": [
   Ref<actor>("122"),
   Ref<actor>("123"),
 ]
}

// actor document
{
 "id": "222",
 "name": "John Smith",
 "birthdate": "1980-05-20",
 "films": [
   Ref<film>("12323"),
   Ref<film>("12324"),
 ]
}