The ugly mug of the author

By Kai Sassnowski

GithubTwitter

I’m writing a book!

If you learned something from my blog or enjoy my style of writing you should check out the book I’m writing. It’s called Developer’s Guide to Database Indexing and will be stock full of all the things you should know about database indexing as a developer.

Sign up for the mailing list and be the first to know when the first chapters go online!

Check it out

System Versioning in MariaDB

Keeping track of changes to a resource is something a lot of applications have to deal with. It might be to help with debugging, allowing the user to jump between different revisions of a document, or keeping an audit log.

Document revisions, the manual way

When we implement an audit log, or any sort of change log, we're usually interested in three things:

  1. Who made the change
  2. What was changed
  3. When did it happen

One possible way to implement this in a Laravel application would be to fire an event after updating the resource in question. We could then register an event listener to update the audit log.

class DocumentsController
{
    public function update(
        UpdateDocumentRequest $request,
        Document $document
    ): RedirectResponse {
        $document->update($request->validated());

        event(new DocumentUpdated($document, $request->user(), now()));

        return redirect()
          ->back()
          ->with('success', __('Document updated successfully.'));
    }
}

Here's what the DocumentUpdated event looks like.

final class DocumentUpdated
{
    public function __construct(
        public Document $document,
        public User $user,
        public Carbon $changedAt
    ) {}
}

And here’s what a potential event listener might look like:

use App\Events\DocumentUpdated;
use App\Models\DocumentRevisions;
use Illuminate\Events\Dispatcher;

final class DocumentSubscriber
{
    public function subscribe(Dispatcher $events): void
    {
        $events->listen(
            DocumentUpdated::class,
            [DocumentSubscriber::class, 'onDocumentUpdated']
        );
    }

    public function onDocumentUpdated(DocumentUpdated $event): void
    {
        $document = $event->document;

        DocumentRevision::create([
            'user_id' => $event->user->id,
            'document_id' => $document->id,
            'title' => $document->title,
            'content' => $document->content,
            'changedAt' => $event->changedAt,
        ]);
    }
}

Whenever a document gets updated, the DocumentSubscriber adds another row to the document_revisions table. I only implemented a listener for the update method, but it would be easy to add listeners for the create and delete methods as well.

With this in place, we can query the database for past revisions of a document like this:

SELECT *
  FROM document_revisions
 WHERE document_id = 1;

Viewing all revisions made by a particular user is also really straight-forward. We simply filter on the user_id column.

SELECT *
  FROM document_revisions
 WHERE user_id = 1;

These are pretty boring queries, however. The cool thing about having a revision history like this is that it allows us to time travel! We can see what a document looked like at a given point in time. Unfortunately, this is also where our queries start to become a bit more finicky. Say we want to view the state of a document as it was on 12th February, 2021 at 1 PM. Here's what that query would look like:

  SELECT *
    FROM document_revisions
   WHERE document_id = 1
     AND changedAt <= '2021-02-12T13:00:00'
ORDER BY changedAt DESC
   LIMIT 1;

What we end up having to do is fetching all revisions that happened either before or exactly on that date. That’s not really what we want, however. We want to know the valid revision of a document at that point in time. There will only ever be at most one revision for any given point in time. So we order all revisions by their changedAt timestamp in descending order and pick the first one.

This is annoying because a LIMIT doesn't actually limit the number of rows the database as to look at. It still does the work of finding all records matching the WHERE clause. Then, at the very end, it discards everything but the first row. Talk about inefficient! Since we usually won't have thousands of revisions for a document, this probably won't be a problem. Still, it doesn't exactly feel elegant.

The other annoying thing is that the document_revisions table ends being basically a copy of the documents table. We also have to remember to update it every time some part of our code creates, changes or deletes a document.

If only there was a better way!

System-versioned tables

Turns out, if you're using MariaDB, there is a better solution! Well, at least a different solution. Starting from version 10.3.4, MariaDB supports System-Versioned Tables. System-Versioned tables are a kind of temporal table, that not only store the data which is currently valid, but also a history of all changes that were made to this table. It's basically like our homegrown audit log from above, but built into the database!

Here's how to create a table with system versioning.

CREATE TABLE documents (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `title` varchar(255) NOT NULL,
  `text` longtext NOT NULL,
  `last_edited_by` bigint(20) unsigned DEFAULT NULL,
  `start_timestamp` timestamp(6) GENERATED ALWAYS AS ROW START
  `end_timestamp` timestamp(6) GENERATED ALWAYS AS ROW END
  PERIOD FOR SYSTEM_TIME(start_timestamp, end_timestamp)
) WITH SYSTEM VERSIONING;

The table must have two generated columns, a period and a special WITH SYSTEM VERSIONING table option clause. The two generated columns in our example would be start_timestamp and end_timestamp. Since this is quite verbose, MariaDB also provides a handy shortcut:

CREATE TABLE documents (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `title` varchar(255) NOT NULL,
  `text` longtext NOT NULL,
  `last_edited_by` bigint(20) unsigned DEFAULT NULL
) WITH SYSTEM VERSIONING;

If we specify WITH SYSTEM VERSIONING, MariaDB will automatically create the required columns and period. These columns won't be visible in the table output however.

SELECT *
  FROM documents;
***************** 1. row *****************
            id: 1
         title: System Versioning with MariaDB
          text: Keeping track of changes…
last_edited_by: 1
    created_at: 2021-02-24 07:00:00
    updated_at: 2021-02-24 07:00:00

1 row in set (0.000 sec)

The table seemingly didn't change. Let’s make a change to a record in this table and see what happens.

UPDATE documents
   SET title = 'System-Versioned Tables with MariaDB'
 WHERE id = 1;

Now we select our record from the database again and see if anything cool happened.

SELECT *
  FROM documents
 WHERE id = 1;
***************** 1. row *****************
            id: 1
         title: System-Versioned Tables with MariaDB
          text: Keeping track of changes…
last_edited_by: 1
    created_at: 2021-02-24 07:00:00
    updated_at: 2021-02-24 07:15:00

1 row in set (0.000 sec)

Our row was changed but other than that nothing seems to be different to before we added system versioning. That’s because normal queries don’t work any differently on a system-versioned table. But we can now query the table for all versions of a particular record like so:

SELECT *
  FROM documents FOR SYSTEM_TIME ALL
 WHERE id = 1;
***************** 1. row *****************
            id: 1
         title: System Versioning with MariaDB
          text: Keeping track of changes…
last_edited_by: 1
    created_at: 2021-02-24 07:00:00
    updated_at: 2021-02-24 07:00:00
***************** 2. row *****************
            id: 1
         title: System-Versioned Tables with MariaDB
          text: Keeping track of changes…
last_edited_by: 1
    created_at: 2021-02-24 07:00:00
    updated_at: 2021-02-24 07:15:00

2 rows in set (0.000 sec)

By adding FOR SYSTEM_TIME ALL to our query, we get back all versions of a given document. Notice that even in this output the generated columns do not appear. If we want to include them in the output, we have to explicitly add them to the SELECT clause.

SELECT *, ROW_START, ROW_END
  FROM documents FOR SYSTEM_TIME ALL
 WHERE id = 1;
***************** 1. row *****************
            id: 1
         title: System Versioning with MariaDB
          text: Keeping track of changes…
last_edited_by: 1
    created_at: 2021-02-24 07:00:00
    updated_at: 2021-02-24 07:00:00
     ROW_START: 2021-02-24 07:00:00.092448
       ROW_END: 2021-02-24 07:15:00.841388
***************** 2. row *****************
            id: 1
         title: System-Versioned Tables with MariaDB
          text: Keeping track of changes…
last_edited_by: 1
    created_at: 2021-02-24 07:00:00
    updated_at: 2021-02-24 07:15:00
     ROW_START: 2021-02-24 07:15:00.841388
       ROW_END: 2038-01-19 04:14:07.999999

2 rows in set (0.000 sec)

Now we can see the ROW_START and ROW_END times for each row, telling us for which time period a version was valid.

Querying for a specific point in time is also incredibly easy. In fact, that’s one of the main use-cases of temporal tables! Say we want to know what this document looked like on the 24th of February, 2021 at 7:10am.

SELECT *
  FROM documents FOR SYSTEM_TIME AS OF TIMESTAMP '2021-02-24 07:10:00'
 WHERE id = 1;
***************** 1. row *****************
            id: 1
         title: System Versioning with MariaDB
          text: Keeping track of changes…
last_edited_by: 1
    created_at: 2021-02-24 07:00:00
    updated_at: 2021-02-24 07:00:00

1 row in set (0.000 sec)

Isn't this cool? We can now query the state of this document at any point in time without having to keep track of it manually! This means we can get rid of the document_revisions table entirely since MariaDB has got our back.

What happens if we query for a date before the document was even created?

SELECT *
  FROM documents FOR SYSTEM_TIME AS OF TIMESTAMP '2021-02-24 05:00:00'
 WHERE id = 1;
Empty set (0.000 sec)

No result, makes sense. Another cool thing is that this works even after we delete a record. It only deletes the record from the table but not the history so we can still query for historical data. If we query for a time after the record was deleted, we also get back an empty set.

Versioning specific columns

The example above kept track of changes to every column in the documents table. This might not always be desireable, however. If you want more control over when a revision gets created, you can version only specific columns using WITH SYSTEM VERSIONING in the column definition.

CREATE TABLE documents (
  `title` varchar(255) NOT NULL WITH SYSTEM VERSIONING,
  `text` longtext NOT NULL WITH SYSTEM VERSIONING,
  /* ... */
);

This would only create a new revision when either the title or text of a document changes. It is also possible to exclude certain columns using WITHOUT SYSTEM VERSIONING. Say we don't care if the title of a document gets changed.

CREATE TABLE documents (
  `title` varchar(255) NOT NULL WITHOUT SYSTEM VERSIONING,
  /* ... */
) WITH SYSTEM VERSIONING;

Note that when excluding columns, we still have to add the WITH SYSTEM VERSIONING option to the table declaration. When including specific columns, on the other hand, the table is implicitly system-versioned as soon as one column is versioned.

Removing history

Since keeping a record of all changes to records in a table can require a large amount of storage, we might want to clean up old history from time to time. To do so, we can write a query that deletes all history before a certain point in time.

DELETE
  FROM documents BEFORE SYSTEM_TIME '2021-02-01 00:00:00';

Note that this operation requires the DELETE HISTORY privilege.

It is, of course, also possible to completely remove system versioning from a table.

ALTER TABLE documents
       DROP SYSTEM VERSIONING;

A word on performance

While we no longer have to explicitly keep track of document revisions, they obviously still get stored somewhere. In fact, we saw that all revisions are part of the documents table itself. All queries by default only operate on the most recent version of a record so usually we don't have to worry about this.

One thing to keep in mind, however, is that adding system versioning to a table implicitly adds the row_end column to the table's primary key.

SHOW INDEX
FROM documents;
***************** 1. row *****************
        Table: articles
   Non_unique: 0
     Key_name: PRIMARY
 Seq_in_index: 1
  Column_name: id
    Collation: A
  Cardinality: 6
     Sub_part: NULL
       Packed: NULL
         Null:
   Index_type: BTREE
      Comment:
Index_comment:
***************** 2. row *****************
        Table: articles
   Non_unique: 0
     Key_name: PRIMARY
 Seq_in_index: 2
  Column_name: row_end
    Collation: A
  Cardinality: 6
     Sub_part: NULL
       Packed: NULL
         Null:
   Index_type: BTREE
      Comment:
Index_comment:

This means that queries on the document table—both table scans and index searches—take more time because they have to look through more data. To mitigate this, the documenation suggests storing the data separately using partitions. Check out the documentions if you’re interested in the details about this.

There’s always a catch, isn’t there?

If something sounds too good to be true, it unfortunately usually is and system-versioned tables are no exception to this. There are actually quite a few drawbacks and trade-offs you should be aware of before deciding to use system-versioned tables.

  1. MariaDB (not MySQL!) is the only FOSS database that supports this and only since version 10.3.4.
  2. By default, the history is not included in the database dump. So if you're used to running mysqldump to back up your tables, you need to look at other options. The MariaDB documentation mentions mariadb-enterprise-backup as a solution. The regular, non-enterprise version of mariabackup might also work, but I haven’t tested it.
  3. As mentioned above, performance might be impacted because we're keeping a lot of additional data around.

There is also the more general question about keeping this logic inside the code instead of the database. But that’s a whole other can of worms that I am not going to open here.

Conclusion

While not a silver bullet, system versioning in MariaDB is a really cool way to add an audit log to tables in our database. Be sure to check out the official documentation on this feature.

That’s it folks, that’s the post. I hope you found this useful or at least interesting. If you have any comments, feel free to reach out to me on Twitter.