The GENERATED column constraint
We saw certain limitations of the DEFAULT
column constraint clause, that it can't generate dynamic values based on other columns or some other dynamic popularities. For overcomming those kind of constraints, there is one more specific constraint, the GENERATED column constraint.
You can think of it as a DEFAULT
but having more than static values, it is evaluated and computed each time a new row is inserted. However they come with one catch, you can't later update the values, you need to update the values of the dependent columns first, so it will generate ALWAYS a update is made to the row(relevant columns in that row).
Let's create a simple table to understand the GENERATED
column constraint:
CREATE TABLE posts (
id INTEGER PRIMARY KEY,
content TEXT,
word_count INTEGER GENERATED ALWAYS AS (length(content) - length(replace(content, ' ', '')) + 1),
read_time_minutes INTEGER GENERATED ALWAYS AS (ceil(word_count / 200.0))
);
The syntax of the GENERATED
constraint is
column_name <type> GENERATED ALWAYS AS <expression>
Nothing fancy, just a content field as a text, then based on that value, the word_count
and read_time_minutes
is always calculated/generated based on the value of the content
and the read_time_minutes
is inturn dependent on the word_count
value. So, the change in content
will make a update on the word_count
that will inturn update the read_time_minutes
, like a chain reaction of sorts.
Let's insert a sample post, note, we can't insert a generated value.
INSERT INTO posts(content) VALUES('A sample post');
This will create a record with the id
as 1, the content
as A sample post
, and the word_count
will be generated/calculated as the (length(content) - length(replace(content, ' ', ''))+1)
as the number of spaces + 1. The read_time_minutes
is generated with the word_count
divided by 200 which is the average word read in a minute by a human (let's not go into the nity-grity) but you can get the point of this. These are just mathematical values, and those are dependent on the value of the other columns of the same record/row.
SELECT * FROM posts;
As, you can see here, the value of word_count
is 3
and the value of read_time_minutes
is 1
.
Let's insert a few more values and check how these generated values behave.
Let's insert a bit longer text.
INSERT INTO posts(content) VALUES('A SQLITE Post for the generated column constraint.');
SELECT * FROM posts;
As, you can see the value of the word_count
is 8
and the value of read_time_minutes
is 1
.
Let's update the content
and check if the word_count
and the read_time_minutes
are updated.
UPDATE posts SET content = content || 'New content
some more content' where id = 2;
SELECT * FROM posts;
As you can see the value of the word_count
is 11
and the value of read_time_minutes
is 1
still. That is because, until we write 200
words it will ceil off to 1.
Let's add a record with more than 200
words, and check if the read_time_minutes
is updated.
INSERT INTO posts(content) VALUES('New post');
SELECT * FROM posts;
The current value of the word_count
is 2
and the read_time_minutes
is 1
, let's update it by adding the value of Adding some extra words here.
.
UPDATE posts
SET content = content || replace(
printf('%200c', ' '),
' ',
' Adding some extra words here.'
)
WHERE id = 3;
Ok, this might be too much, but let's break it down:
- First, we are updating the
content
by adding the value ofAdding some extra words here.
to the end of the existing value ofcontent
. - In the replace function, the parameters are original string, the value to replace, and the value to replace it with.
- So in this case, we are taking the string
printf('%200c', ' ')
, which is a string of 200 spaces, basically adding a single space character, but it will be right-justified within a field of 200 characters. - The second parameter is the character to replace, in this case it is single white space or
' '
character. - The third parameter is the text to replace that
' '
space character with, here it isAdding some extra words here
. - So, essentially we are taking
200
white spaces and addingAdding some extra words here.
for each space to the end of the existing value ofcontent
.
Hence we will have around ~200 * 5
words i.e. 1000 words, this way we can check the value of the read_time_minutes
is updated.
SELECT * FROM posts;
As you can see the value of the word_count
is 1002
and the value of read_time_minutes
is 6
. That is perfect. And it is getting updated for each updation to the content
column.
Here's a step by step output log: Its a SQLog :)
sqlite> CREATE TABLE posts (
id INTEGER PRIMARY KEY,
content TEXT,
word_count INTEGER GENERATED ALWAYS AS (length(content) - length(replace(content, ' ', '')) + 1),
read_time_minutes INTEGER GENERATED ALWAYS AS (ceil(word_count / 200.0))
);
sqlite> .mode table
sqlite> INSERT INTO posts(content) VALUES('A sample post');
sqlite> SELECT * FROM posts;
+----+---------------+------------+-------------------+
| id | content | word_count | read_time_minutes |
+----+---------------+------------+-------------------+
| 1 | A sample post | 3 | 1 |
+----+---------------+------------+-------------------+
sqlite> INSERT INTO posts(content) VALUES('A SQLITE Post for the generated column constraint.');
sqlite> SELECT * FROM posts;
+----+----------------------------------------------------+------------+-------------------+
| id | content | word_count | read_time_minutes |
+----+----------------------------------------------------+------------+-------------------+
| 1 | A sample post | 3 | 1 |
| 2 | A SQLITE Post for the generated column constraint. | 8 | 1 |
+----+----------------------------------------------------+------------+-------------------+
sqlite> UPDATE posts SET content = content || 'New content
' ...> some more content' where id = 2;
sqlite> SELECT * FROM posts;
+----+--------------------------------------------------------------+------------+-------------------+
| id | content | word_count | read_time_minutes |
+----+--------------------------------------------------------------+------------+-------------------+
| 1 | A sample post | 3 | 1 |
+----+--------------------------------------------------------------+------------+-------------------+
| 2 | A SQLITE Post for the generated column constraint.New conten | 11 | 1 |
| | t | | |
| | some more content | | |
+----+--------------------------------------------------------------+------------+-------------------+
sqlite> INSERT INTO posts(content) VALUES('New post');
sqlite> SELECT * FROM posts;
+----+--------------------------------------------------------------+------------+-------------------+
| id | content | word_count | read_time_minutes |
+----+--------------------------------------------------------------+------------+-------------------+
| 1 | A sample post | 3 | 1 |
+----+--------------------------------------------------------------+------------+-------------------+
| 2 | A SQLITE Post for the generated column constraint.New conten | 11 | 1 |
| | t | | |
| | some more content | | |
+----+--------------------------------------------------------------+------------+-------------------+
| 3 | New post | 2 | 1 |
+----+--------------------------------------------------------------+------------+-------------------+
sqlite> UPDATE posts SET content = content || replace(printf('%200c', ' '), ' ', ' Adding some extra words here.') WHERE id = 3;
sqlite> SELECT * FROM posts;
+----+--------------------------------------------------------------+------------+-------------------+
| id | content | word_count | read_time_minutes |
+----+--------------------------------------------------------------+------------+-------------------+
| 1 | A sample post | 3 | 1 |
+----+--------------------------------------------------------------+------------+-------------------+
| 2 | A SQLITE Post for the generated column constraint.New conten | 11 | 1 |
| | t | | |
| | some more content | | |
+----+--------------------------------------------------------------+------------+-------------------+
| 3 | New post Adding some extra words here. Adding some extra wor | 1002 | 6 |
| | ds here. Adding some extra words here. Adding some extra wor | | |
| | ds here. Adding some extra words here. Adding some extra wor | | |
| |..............................................................| | |
| | ds here. Adding some extra words here. Adding some extra wor | | |
| | ds here. | | |
+----+--------------------------------------------------------------+------------+-------------------+
Generated columns cannot be updated manually
If you tried to update the columns with GENERATED ALWAYS
constraint, you will get a constraint failed
error. As the constraint itself says, it is GENERATED ALWAYS, hence not to be updated or inserted.
UPDATE posts SET word_count = 10 WHERE id = 1;
It will fail with cannot UPDATE generated column "word_count"
as mentioned.
UPDATE posts SET read_time_minutes = 10 WHERE id = 1;
Similarly, this will also fail, as it is also a GENERATED ALWAYS
constrained column.
sqlite> UPDATE posts SET word_count = 10 WHERE id = 1;
Parse error: cannot UPDATE generated column "word_count"
sqlite> UPDATE posts SET read_time_minutes = 10 WHERE id = 1;
Parse error: cannot UPDATE generated column "read_time_minutes"
STORED VS VIRTUAL
Now, the Generated columns can be STORED
as in stored in the database to the disk file, or they could be VIRTUAL
as in computed (generated,calculated each time). There are trade-offs and I will definitely add the meme here IT ACTUALLY DEPENDS
!
The STORED
column will take up space but the queries will be quick, since it doesn't need to calculate it each time, only it needs to do it at each updation (that can't be avoided).
The VIRTUAL
column will not take up space but the queries will be a little slower, depending on the number of data to compute, since it doesn't store the column values, it will have to compute each time and it as well need to update it each time.
The default is
VIRTUAL
if not specified.
The commonality of STORED
and VIRTUAL
is that they are both updated at the updation of the column it relates to or the updation of the row, the only difference is that one stores it the other calculates it for each query made to it.
We'll see the difference how is that the case, it generates each time in the case of the VIRTUAL
column GENERATED
constraint.
I'll show the most basic example continuing with the same table, but with the VIRTUAL
clause on the GENERATED ALWAYS
constraint.
Virtual generated column
By default, it is VIRTUAL
it doesn't matter if you add it or not (let's add it to make it clear).
I'll specify the word_count
as the VIRTUAL
column with the GENERATED ALWAYS
constraint with the same formula to get the word count and at the end specify the VIRTUAL
keyword (its optional).
Let's skip the
read_time_minutes
for now, you can add it won't make it any difference (it was a little extreme example to use 200 words to update its value)
CREATE TABLE posts_virtual (
id INTEGER PRIMARY KEY,
content TEXT,
word_count INTEGER GENERATED ALWAYS AS (
length(content) - length(replace(content, ' ', '')) + 1
) VIRTUAL
);
Now, we'll add some data to the table and check the value of the word_count
column.
INSERT INTO posts_virtual(content) VALUES('A sample post');
INSERT INTO posts_virtual(content) VALUES('A SQLITE Post for the virtual generated column constraint.');
SELECT * FROM posts_virtual;
This we already know, the plain old word_count
is 3
and 9
for the two posts respectively.
But let's see how is the word_count
column is actually stored and how slow or fast it actually is.
PRAGMA table_info(posts_virtual);
HH? There is no word_count
column here, it is not stored, it is a VIRTUAL
column.
sqlite> PRAGMA table_info(posts_virtual);
+-----+---------+---------+---------+------------+----+
| cid | name | type | notnull | dflt_value | pk |
+-----+---------+---------+---------+------------+----+
| 0 | id | INTEGER | 0 | | 1 |
| 1 | content | TEXT | 0 | | 0 |
+-----+---------+---------+---------+------------+----+
sqlite>
What about extra info?
PRAGMA table_xinfo(posts_virtual);
Yes, that is indeed there, and its hidden? Nope, its generated
a special type.
0
is not hidden1
is hidden2
is generated
sqlite> PRAGMA table_xinfo(posts_virtual);
+-----+------------+---------+---------+------------+----+--------+
| cid | name | type | notnull | dflt_value | pk | hidden |
+-----+------------+---------+---------+------------+----+--------+
| 0 | id | INTEGER | 0 | | 1 | 0 |
| 1 | content | TEXT | 0 | | 0 | 0 |
| 2 | word_count | INTEGER | 0 | | 0 | 2 |
+-----+------------+---------+---------+------------+----+--------+
sqlite>
It's time to benchmark it then, let's bulk insert 100000 posts.
WITH RECURSIVE cnt(x) AS (
SELECT 1
UNION ALL
SELECT x+1 FROM cnt WHERE x < 100000
)
INSERT INTO posts_virtual(content)
SELECT 'This is a sample post number ' || x || ' with some words repeated multiple times.'
FROM cnt;
If we see now the posts_virtual
table has over 100000 rows.
SELECT COUNT(*) FROM posts_virtual;
We already had 2
records, and we added 100000
more, so the total is 100002
.
Let's check the sum of the word_count
to be sure it inserted it right.
SELECT SUM(word_count) FROM posts_virtual;
Ok that shows the sum of the word_count
is 1300012
.
Let's check with the timer on and evaluate the query.
sqlite> WITH RECURSIVE cnt(x) AS (
SELECT 1
UNION ALL
SELECT x+1 FROM cnt WHERE x < 100000
)
INSERT INTO posts_virtual(content)
SELECT 'This is a sample post number ' || x || ' with some words repeated multiple times.'
FROM cnt;
sqlite> PRAGMA table_xinfo(posts_virtual);
+-----+------------+---------+---------+------------+----+--------+
| cid | name | type | notnull | dflt_value | pk | hidden |
+-----+------------+---------+---------+------------+----+--------+
| 0 | id | INTEGER | 0 | | 1 | 0 |
| 1 | content | TEXT | 0 | | 0 | 0 |
| 2 | word_count | INTEGER | 0 | | 0 | 2 |
+-----+------------+---------+---------+------------+----+--------+
sqlite> .timer on
sqlite> SELECT COUNT(*) FROM posts_virtual;
+----------+
| COUNT(*) |
+----------+
| 100002 |
+----------+
Run Time: real 0.001 user 0.000352 sys 0.000103
sqlite> SELECT SUM(word_count) FROM posts_virtual;
+-----------------+
| SUM(word_count) |
+-----------------+
| 1300012 |
+-----------------+
Run Time: real 0.041 user 0.040682 sys 0.000418
sqlite> PRAGMA page_count;
+------------+
| page_count |
+------------+
| 2086 |
+------------+
Run Time: real 0.000 user 0.000000 sys 0.000078
sqlite> PRAGMA page_size;
+-----------+
| page_size |
+-----------+
| 4096 |
+-----------+
Run Time: real 0.000 user 0.000025 sys 0.000006
sqlite> UPDATE posts_virtual SET content = content || ' extra text' WHERE id % 1000 = 0;
Run Time: real 0.010 user 0.009353 sys 0.000161
I think we'll leave it here and move onto creating the STORED
column as a generated value
Stored generated column
Similar to the VIRTUAL
column, we'll create the STORED
column with the GENERATED ALWAYS
constraint.
This will be the same table named as posts_stored
but with the word_count
being a STORED
column rather than a VIRTUAL
column.
CREATE TABLE posts_stored (
id INTEGER PRIMARY KEY,
content TEXT,
word_count INTEGER GENERATED ALWAYS AS (
length(content) - length(replace(content, ' ', '')) + 1
) STORED
);
Now, we'll add some data to the table.
INSERT INTO posts_stored(content) VALUES('A sample post');
INSERT INTO posts_stored(content) VALUES('A SQLITE Post for the virtual generated column constraint.');
SELECT * FROM posts_stored;
This is cool, working as expected, we have already seen it, we need to take a look at how this values are stored now.
Let's take a look at the table_info
for the posts_stored
table.
PRAGMA table_info(posts_stored);
It still doesn't have the GENERATED
column word_count
.
Let's take a look at the table_xinfo
for the posts_stored
table.
PRAGMA table_xinfo(posts_stored);
Ok, wiredly it has the GENERATED ALWAYS
column word_count
as well but the hidden
value is 3
.
We knew it was 2
for hidden but 3
is for? You guessed it right, stored and generated column.
Now, let's also populate the posts_stored
table with some data, bulk inserting the data.
Let's add 100000
rows in it.
WITH RECURSIVE cnt(x) AS (
SELECT 1
UNION ALL
SELECT x+1 FROM cnt WHERE x < 100000
)
INSERT INTO posts_stored(content)
SELECT 'This is a sample post number ' || x || ' with some words repeated multiple times.'
FROM cnt;
This is the same query except its inserting in the posts_stored
table.
Let's check if it has inserted the right number of rows.
SELECT COUNT(*) FROM posts_stored;
Indeed there are 100000
rows in the posts_stored
table.
Now, let's verify that the inserted values are rightly populated.
SELECT SUM(word_count) FROM posts_stored;
This gives 1300012
as the sum of the word_count
from the posts_stored
table. Which matches the sum of the word_count
from the posts_virtual
table.
sqlite> CREATE TABLE posts_stored (
id INTEGER PRIMARY KEY,
content TEXT,
word_count INTEGER GENERATED ALWAYS AS (
length(content) - length(replace(content, ' ', '')) + 1
) STORED
);
Run Time: real 0.000 user 0.000258 sys 0.000027
sqlite> INSERT INTO posts_stored(content) VALUES('A sample post');
sqlite> INSERT INTO posts_stored(content) VALUES('A SQLITE Post for the virtual generated column constraint.');
SELECT * FROM posts_stored;
+----+------------------------------------------------------------+------------+
| id | content | word_count |
+----+------------------------------------------------------------+------------+
| 1 | A sample post | 3 |
| 2 | A SQLITE Post for the virtual generated column constraint. | 9 |
+----+------------------------------------------------------------+------------+
Run Time: real 0.000 user 0.000224 sys 0.000000
sqlite> PRAGMA table_info(posts_stored);
+-----+---------+---------+---------+------------+----+
| cid | name | type | notnull | dflt_value | pk |
+-----+---------+---------+---------+------------+----+
| 0 | id | INTEGER | 0 | | 1 |
| 1 | content | TEXT | 0 | | 0 |
+-----+---------+---------+---------+------------+----+
sqlite> PRAGMA table_xinfo(posts_stored);
+-----+------------+---------+---------+------------+----+--------+
| cid | name | type | notnull | dflt_value | pk | hidden |
+-----+------------+---------+---------+------------+----+--------+
| 0 | id | INTEGER | 0 | | 1 | 0 |
| 1 | content | TEXT | 0 | | 0 | 0 |
| 2 | word_count | INTEGER | 0 | | 0 | 3 |
+-----+------------+---------+---------+------------+----+--------+
Run Time: real 0.000 user 0.000013 sys 0.000322
sqlite> SELECT COUNT(*) FROM posts_stored;
+----------+
| COUNT(*) |
+----------+
| 2 |
+----------+
Run Time: real 0.000 user 0.000108 sys 0.000012
sqlite> WITH RECURSIVE cnt(x) AS (
SELECT 1
UNION ALL
SELECT x+1 FROM cnt WHERE x < 100000
)
INSERT INTO posts_stored(content)
SELECT 'This is a sample post number ' || x || ' with some words repeated multiple times.'
FROM cnt;
Run Time: real 0.071 user 0.067858 sys 0.002929
sqlite> SELECT COUNT(*) FROM posts_stored;
+----------+
| COUNT(*) |
+----------+
| 100002 |
+----------+
Run Time: real 0.000 user 0.000367 sys 0.000000
sqlite> SELECT SUM(word_count) FROM posts_stored;
+-----------------+
| SUM(word_count) |
+-----------------+
| 1300012 |
+-----------------+
Run Time: real 0.008 user 0.006492 sys 0.000941
This is all setup for the comparison of both the tables.
The difference
Now, let's check the difference between the STORED
and VIRTUAL
column.
Let's sum the word_count
from each of the tables posts_stored
and posts_virtual
First, lets set the timer on, which will give the breakdown of the time for each ran query.
.timer on
The timer will give three time durations, the real, user and the system. The breakdown is like this:
- real time : The full clock time form the start to the end of the query
- user time : The CPU time in the user space (I/O) operations on the sqlite side, computations on the obtained data, or preprocessing the data before inserting.
- system time: The CPU time in the kernel space (I/O) operations on the host side, it could be reading or writing to the actual db file, etc.
And then run the query for the posts_stored
table.
SELECT SUM(word_count) FROM posts_stored;
This is giving the result in around ~0.008
seconds might be approximately 0.01
in certain cases.
We can even get the breakdown of the time for each ran query.
- full query execution time:
0.008
- user time:
0.006492
- system time:
0.000941
So, here in this case, the system and user time are around the same, however there is some system time for reading from the database file.
And then run the query for the posts_virtual
table.
SELECT SUM(word_count) FROM posts_virtual;
This is giving the result in around ~0.039
seconds might be approximately 0.04
in certain cases.
We can even get the breakdown of the time for each ran query.
- full query execution time:
0.039
- user time:
0.039366
- system time:
0.000000
this can go sometimes to0.000652
, almost negligible, but if it happens, in that case the full query time will increase by a bit.
So, it is taking a quite lot of time in the user space by the CPU right? And almost 0 time in the kernel level (disk operation) to read from the database file.
That is so clear, as this is a virtual column, and we are only reading a single column, and that too is a virtual column, we don't need to hit a read to the database file. Clever right? But that costs some CPU to compute the sum for the 100002
rows.
This is the trade-off
- Time spent on the CPU on the computation vs the time spent on the disk operation to read the data.
- These can be different depending on the query.
sqlite> SELECT SUM(word_count) FROM posts_stored;
+-----------------+
| SUM(word_count) |
+-----------------+
| 1300012 |
+-----------------+
Run Time: real 0.008 user 0.006492 sys 0.000941
sqlite> SELECT SUM(word_count) FROM posts_virtual;
+-----------------+
| SUM(word_count) |
+-----------------+
| 1300012 |
+-----------------+
Run Time: real 0.039 user 0.039366 sys 0.000000
You can clearly see the difference, the STORED
column is almost 4
to 5
times faster than the VIRTUAL
column.
Key observations
- If you have lot of processing to do in the generated column, better to go with the
STORED
column. - If you have a lot of data to compute and not necessarily complex, better to go with the
VIRTUAL
column. - It really depends on the data too, and we cannot say which one is better over the other without knowing the structure and the design of the queries.
There are lot of details as well beyond this, it will require specific guides on how to do certain things and busting certain myths in the documentation.