The PRIMARY KEY column constraint
The PRIMARY KEY constraint is not actually a column constraint, it is a table constraint. But in this section, we'll only learn about how to handle single column-level PRIMARY KEY
.
The PRIMARY KEY column constraint is a constraint that ensures that a column contains unique values and is the PRIMARY
way to distinguish between all the rows of that table.
From the documentation:
Each table in SQLite may have at most one PRIMARY KEY. If the keywords PRIMARY KEY are added to a column definition, then the primary key for the table consists of that single column. Or, if a PRIMARY KEY clause is specified as a table-constraint, then the primary key of the table consists of the list of columns specified as part of the PRIMARY KEY clause. The PRIMARY KEY clause must contain only column names — the use of expressions in an indexed-column of a PRIMARY KEY is not supported. An error is raised if more than one PRIMARY KEY clause appears in a CREATE TABLE statement. The PRIMARY KEY is optional for ordinary tables but is required for WITHOUT ROWID tables. If a table has a single column primary key and the declared type of that column is "INTEGER" and the table is not a WITHOUT ROWID table, then the column is known as an INTEGER PRIMARY KEY. See below for a description of the special properties and behaviors associated with an INTEGER PRIMARY KEY. Each row in a table with a primary key must have a unique combination of values in its primary key columns. For the purposes of determining the uniqueness of primary key values, NULL values are considered distinct from all other values, including other NULLs. If an INSERT or UPDATE statement attempts to modify the table content so that two or more rows have identical primary key values, that is a constraint violation. According to the SQL standard, PRIMARY KEY should always imply NOT NULL. Unfortunately, due to a bug in some early versions, this is not the case in SQLite. Unless the column is an INTEGER PRIMARY KEY or the table is a WITHOUT ROWID table or a STRICT table or the column is declared NOT NULL, SQLite allows NULL values in a PRIMARY KEY column. SQLite could be fixed to conform to the standard, but doing so might break legacy applications. Hence, it has been decided to merely document the fact that SQLite allows NULLs in most PRIMARY KEY columns.
Here is what I think in short:
- Used to identify the rows of a table. Hence must be unique
- At most one primary key per table (default rowid, multiple primary key not allowed)
- Primary key can be a combination of multiple columns or even single column
- Only column names allowed (not expressions)
- Required for WITHOUT ROWID
- INTEGER PRIMARY KEY special behavior
- NULL behavior (SQLite quirk vs SQL standard)
Creating a simple table
By default, if you don't create a column in a table as a PRIMARY KEY
, it is automatically assumed that rowid
will be used as the unique identifier for that table.
Let's create a simple table with columns like username
and email
, non of which are PRIMARY KEY
explicitly mentioned.
CREATE TABLE users (
username TEXT NOT NULL,
email TEXT NOT NULL
);
Now, let's insert a few columns and check what we can observe.
INSERT INTO users(username, email) VALUES('jim', 'jim@abc.com');
INSERT INTO users(username, email) VALUES('jack', 'jack@abc.com');
SELECT * FROM users;
As, you can see, we have populated the table with username
and email
columns.
But wait, there should be rowid
right? It is there but its a hidden column (do not touch it).
SELECT rowid, * FROM users;
You can see the rowid
column in the output.
sqlite> CREATE TABLE users (
username TEXT NOT NULL,
email TEXT NOT NULL
);
sqlite> .mode table
sqlite> INSERT INTO users(username, email) VALUES('jim', 'jim@abc.com');
sqlite> INSERT INTO users(username, email) VALUES('jack', 'jack@abc.com');
sqlite> SELECT * FROM users;
+----------+--------------+
| username | email |
+----------+--------------+
| jim | jim@abc.com |
| jack | jack@abc.com |
+----------+--------------+
sqlite> SELECT rowid, * FROM users;
+-------+----------+--------------+
| rowid | username | email |
+-------+----------+--------------+
| 1 | jim | jim@abc.com |
| 2 | jack | jack@abc.com |
+-------+----------+--------------+
sqlite>
The rowid is used as a primary key for the table, this means it will take the max of the current state of the table and use the next number as its rowid.
We can verify that if we try to insert a duplicate value/record/row in the users
table:
Let's insert the jack
as the user and check if it inserts it, as it would have errored if the username or the email was the primary key, as this value is duplicated. But the username
and email
are not the PRIMARY KEY
in this table, hence it will happily insert it.
INSERT INTO users(username, email) VALUES('jack', 'jack@abc.com');
SELECT rowid, * FROM users;
As, you can see it inserted the value and the rowid
is now 3
.
sqlite> SELECT rowid, * FROM users;
+-------+----------+--------------+
| rowid | username | email |
+-------+----------+--------------+
| 1 | jim | jim@abc.com |
| 2 | jack | jack@abc.com |
+-------+----------+--------------+
sqlite> INSERT INTO users(username, email) VALUES('jack', 'jack@abc.com');
sqlite> SELECT rowid, * FROM users;
+-------+----------+--------------+
| rowid | username | email |
+-------+----------+--------------+
| 1 | jim | jim@abc.com |
| 2 | jack | jack@abc.com |
| 3 | jack | jack@abc.com |
+-------+----------+--------------+
sqlite>
What would happen if the middle values are deleted?
let's delete the row with rowid
2
:
DELETE FROM users WHERE rowid = 2;
SELECT rowid, * FROM users;
As, you can see, the rowid
2
is deleted. The rowid
will still take the max of the current state of the table, and will use the next number as the rowid
which in this state should be 4
.
Let's insert a new record and check if the rowid
is the max of the current state of the table.
INSERT INTO users(username, email) VALUES('jill', 'jill@abc.com');
SELECT rowid, * FROM users;
As you can see, the rowid
is now 4
.
Let's delete that and insert one more.
DELETE FROM users WHERE rowid = 4;
SELECT rowid, * FROM users;
Now, that the max row of the table is 3
but we had 4
earlier, we need to check what could be the next rowid?
INSERT INTO users(username, email) VALUES('jill', 'jill@abc.com');
SELECT rowid, * FROM users;
Ok, as you can see, the rowid
is now 4
. It is reusing the previous rowid
value, which might be an issue, very rarely but this is a expected behaviour.
Which gives us the answer that there is maybe a quirk or a limitation of rowid
that it can insert values which are max of the rowid in the current state.
sqlite> SELECT rowid, * FROM users;
+-------+----------+--------------+
| rowid | username | email |
+-------+----------+--------------+
| 1 | jim | jim@abc.com |
| 2 | jack | jack@abc.com |
| 3 | jack | jack@abc.com |
+-------+----------+--------------+
sqlite> DELETE FROM users WHERE rowid = 2;
sqlite> SELECT rowid, * FROM users;
+-------+----------+--------------+
| rowid | username | email |
+-------+----------+--------------+
| 1 | jim | jim@abc.com |
| 3 | jack | jack@abc.com |
+-------+----------+--------------+
sqlite> INSERT INTO users(username, email) VALUES('jill', 'jill@abc.com');
sqlite> SELECT rowid, * FROM users;
+-------+----------+--------------+
| rowid | username | email |
+-------+----------+--------------+
| 1 | jim | jim@abc.com |
| 3 | jack | jack@abc.com |
| 4 | jill | jill@abc.com |
+-------+----------+--------------+
sqlite> DELETE FROM users WHERE rowid = 4;
sqlite> SELECT rowid, * FROM users;
+-------+----------+--------------+
| rowid | username | email |
+-------+----------+--------------+
| 1 | jim | jim@abc.com |
| 3 | jack | jack@abc.com |
+-------+----------+--------------+
sqlite> INSERT INTO users(username, email) VALUES('joe', 'joe@abc.com');
sqlite> SELECT rowid, * FROM users;
+-------+----------+--------------+
| rowid | username | email |
+-------+----------+--------------+
| 1 | jim | jim@abc.com |
| 3 | jack | jack@abc.com |
| 4 | joe | joe@abc.com |
+-------+----------+--------------+
sqlite>
Ok, that was a wired tangent, but worth noting.
We will check how to fix it with the AUTOINCREMENT
clause in the future posts.
Now, let's create a PRIMARY KEY
column that we can safely call a primary key.
Creating a column as primary key
To create a column as a PRIMARY KEY
we need to simply add the PRIMARY KEY
constraint to the column.
Let's create a simple table with columns like username
and email
, and here email
is the primary key.
CREATE TABLE users (
username TEXT NOT NULL,
email TEXT PRIMARY KEY
);
Mentioning PRIMARY KEY
with email means that each record needs to have a unique email
.
If you attach a PRIMARY KEY
to a column, the UNIQUE
constraint is also added automatically. So, saying email TEXT PRIMARY KEY UNIQUE
is redundant.
Let's try inserting default values to the users.
INSERT INTO users DEFAULT VALUES;
As you can see, this will fail as the username
is a NOT NULL
column.
OK, let's just satisfy that NOT NULL
constraint for username.
INSERT INTO users(username) VALUES('james');
OK, that inserted the value.
SELECT * FROM users;
But, the email is empty! Becuase we didn't specify it? Yes, it is a text field, and a primary key, so you need to handle that. What about the rowid?
SELECT rowid, * FROM users;
As you can see, the rowid is 1
and email is empty.
That's worth noting that rowid
will be populated independently of the PRIMARY KEY
column. But if speicifed as WITHOUT ROWID
in the table creation, it won't be there.
Now, let's insert the same record again. We have said the email
is a PRIMARY KEY
, so it will be unique right? RIGHT?
INSERT INTO users(username) VALUES('james');
Surprisingly, it inserted that too.
SELECT * FROM users;
The email is empty for both the rows, but the username is same, so the email
is NULL, and you guessed it right, NULL
is identified as UNIQUE
. That is a bad schema design. The users table had a text key as a primary key, and we need to add NOT NULL
to the PRIMARY KEY
column.
sqlite> CREATE TABLE users (
username TEXT NOT NULL,
email TEXT PRIMARY KEY
);
sqlite> INSERT INTO users DEFAULT VALUES;
Runtime error: NOT NULL constraint failed: users.username (19)
sqlite> INSERT INTO users(username) VALUES('james');
sqlite> SELECT * FROM users;
+----------+-------+
| username | email |
+----------+-------+
| james | |
+----------+-------+
sqlite> INSERT INTO users(username) VALUES('james');
sqlite> SELECT * FROM users;
+----------+-------+
| username | email |
+----------+-------+
| james | |
| james | |
+----------+-------+
sqlite> INSERT INTO users(username) VALUES('jack');
sqlite> SELECT * FROM users;
+----------+-------+
| username | email |
+----------+-------+
| james | |
| james | |
| jack | |
+----------+-------+
sqlite>
NOTE: NOT NULL is only added automatically for
INTEGER PRIMARY KEY
column, which means the column becomes an alias for therowid
table.
PRIMARY KEY with NOT NULL
It turns out that NOT NULL
is only added automatically for INTEGER PRIMARY KEY
column, so for rest of the combination of PRIMARY KEY
columns we need to explicitly add the NOT NULL
constraint.
So, let's add that:
DROP TABLE users;
CREATE TABLE users (
username TEXT NOT NULL,
email TEXT PRIMARY KEY NOT NULL
);
Now, let's try inserting default values to the users.
INSERT INTO users DEFAULT VALUES;
As you can see, this will fail as the username
is a NOT NULL
column. As we knew in the first attempt.
INSERT INTO users(username) VALUES('james');
OK, that is what we expected, it should fail, as have not specified the email
column, as that is a primary key and it cannot be null. I mean it could be if we wanted to, but that is not expected to behave.
INSERT INTO users(username, email) VALUES('james', 'james@abc.com');
Ok, this should insert it successfully.
SELECT rowid, * FROM users;
As you can see, we have populated the table with username
and email
columns. This is the perfect thing to setup.
The rowid
will still be there as discussed, the email
now is the PRIMARY KEY
which means any duplicate string email will be constraint violation.
INSERT INTO users(username, email) VALUES('james', 'james@abc.com');
As you can see, this will fail, as we have already inserted a record with the same email. This is expected as we noted that the PRIMARY KEY
automatically adds the UNIQUE
constraint to the column.
Similary, we can check a few other things.
Like if we try to insert a record with the same username but different email:
INSERT INTO users(username, email) VALUES('james', 'james_new@abc.com');
This will succeed, as we have not specified the username
as primary key, and it is ok to not have unique values for it.
sqlite> CREATE TABLE users (
username TEXT NOT NULL,
email TEXT PRIMARY KEY NOT NULL
);
sqlite> INSERT INTO users DEFAULT VALUES;
Runtime error: NOT NULL constraint failed: users.username (19)
sqlite> INSERT INTO users(username) VALUES('james');
Runtime error: NOT NULL constraint failed: users.email (19)
sqlite> INSERT INTO users(username, email) VALUES('james', 'james@abc.com');
sqlite> SELECT rowid, * FROM users;
+-------+----------+---------------+
| rowid | username | email |
+-------+----------+---------------+
| 1 | james | james@abc.com |
+-------+----------+---------------+
sqlite> INSERT INTO users(username, email) VALUES('james', 'james@abc.com');
Runtime error: UNIQUE constraint failed: users.email (19)
sqlite> INSERT INTO users(username, email) VALUES('james', 'james_new@abc.com');
sqlite> SELECT rowid, * FROM users;
+-------+----------+-------------------+
| rowid | username | email |
+-------+----------+-------------------+
| 1 | james | james@abc.com |
| 2 | james | james_new@abc.com |
+-------+----------+-------------------+
sqlite> INSERT INTO users(username, email) VALUES('jill', 'james_new@abc.com');
Runtime error: UNIQUE constraint failed: users.email (19)
sqlite> INSERT INTO users(username, email) VALUES('jill', 'jill@abc.com');
sqlite> SELECT rowid, * FROM users;
+-------+----------+-------------------+
| rowid | username | email |
+-------+----------+-------------------+
| 1 | james | james@abc.com |
| 2 | james | james_new@abc.com |
| 3 | jill | jill@abc.com |
+-------+----------+-------------------+
sqlite>
That is how we primary key works, atleast the basic of it.
INTEGER PRIMARY KEY
This is a special case where the column defined as PRIMARY KEY
with INTEGER type becomes an alias to the rowid
and hence we will have both UNIQUE
and NOT NULL
constraints added to it.
Let's create a table with INTEGER PRIMARY KEY
column.
CREATE TABLE users (
user_id INTEGER PRIMARY KEY,
username TEXT NOT NULL,
email TEXT NOT NULL
);
In this table users
we have user_id
which is INTEGER PRIMARY KEY
column. It is an alias for the rowid
table as we'll see.
Now, let's insert some data and check what we can observe.
INSERT INTO users DEFAULT VALUES;
Obivously, this will fail, as we have not specified the username
and email
as NOT NULL
, and we haven't specified in the insertion.
Let's specify both and check if it can violate any constraints.
INSERT INTO users(username, email) VALUES('james', 'james@abc.com');
As you can see, we successfully inserted the record, as we have specified the username
and email
as NOT NULL
and the primary key which is user_id
and it will automatically get populated. The user_id
will be automatically populated as it will be a alias to the rowid
column, and we know it will populate automatically for each inserted value for each row, by incrementing the max row id at that point.
SELECT * FROM users;
As you can see, the user_id
is 1
and username
is james
and email
is james@abc.com
. But what is rowid
?
Let's check that too.
SELECT rowid, * FROM users;
As you can see, the user_id
is 1
but the rowid has become the user_id
which is basically an alias for the rowid
.
Let's see if we can add more records.
Let's try adding the same record again.
INSERT INTO users(username, email) VALUES('james', 'james@abc.com');
Interestingly, this works, because we have not specified the user_id
as PRIMARY KEY
and inserting duplicate username and email should not violate any constraints unless we have specified as UNIQUE
or others.
sqlite> CREATE TABLE users (
user_id INTEGER PRIMARY KEY,
username TEXT NOT NULL,
email TEXT NOT NULL
);
sqlite> INSERT INTO users DEFAULT VALUES;
Runtime error: NOT NULL constraint failed: users.username (19)
sqlite> INSERT INTO users(username) VALUES('james');
Runtime error: NOT NULL constraint failed: users.email (19)
sqlite> INSERT INTO users(username, email) VALUES('james', 'james@abc.com');
sqlite> SELECT * FROM users;
+---------+----------+---------------+
| user_id | username | email |
+---------+----------+---------------+
| 1 | james | james@abc.com |
+---------+----------+---------------+
sqlite> SELECT rowid, * FROM users;
+---------+---------+----------+---------------+
| user_id | user_id | username | email |
+---------+---------+----------+---------------+
| 1 | 1 | james | james@abc.com |
+---------+---------+----------+---------------+
sqlite> INSERT INTO users(username, email) VALUES('james', 'james@abc.com');
sqlite> SELECT rowid, * FROM users;
+---------+---------+----------+---------------+
| user_id | user_id | username | email |
+---------+---------+----------+---------------+
| 1 | 1 | james | james@abc.com |
| 2 | 2 | james | james@abc.com |
+---------+---------+----------+---------------+
sqlite> INSERT INTO users(username, email) VALUES('jill', 'jill@abc.com');
sqlite> SELECT rowid, * FROM users;
+---------+---------+----------+---------------+
| user_id | user_id | username | email |
+---------+---------+----------+---------------+
| 1 | 1 | james | james@abc.com |
| 2 | 2 | james | james@abc.com |
| 3 | 3 | jill | jill@abc.com |
+---------+---------+----------+---------------+
sqlite>
So that is how INTEGER PRIMARY KEY
works.
You can add PRIMARY KEY
to any column, but you need to ensure that it doesn't violate the constraint of allowing duplicate enties.
We know that the PRIMARY KEY
is not a column specific constraint. It is table wide constraint, we'll check out the table wide constraint that is combinational constraint in the next section.