The MySQL DISTINCT
clause (or keyword) is used to remove duplicate values from the result set of your query, making sure that all rows in the result set are unique records.
To use the keyword, you just need to specify the columns next to the DISTINCT
keyword as follows:
SELECT DISTINCT [column], [column] FROM [table];
You can query as many columns as you need just like a normal SELECT
statement.
Let’s see an example of DISTINCT
in action. Suppose you have a table called members
with the following data:
+----+------------+------+
| id | first_name | age |
+----+------------+------+
| 1 | Jessie | 23 |
| 2 | Ann | 25 |
| 3 | Joe | 22 |
| 4 | Ann | 23 |
| 5 | Peter | 29 |
+----+------------+------+
As you can see, there are some duplicates in the first_name
and age
columns.
You can use the DISTINCT
keyword to filter the duplicate values as shown below:
SELECT DISTINCT first_name FROM members;
-- +------------+
-- | first_name |
-- +------------+
-- | Jessie |
-- | Ann |
-- | Joe |
-- | Peter |
-- +------------+
The first_name
query above returns no duplicate row. The DISTINCT
keyword can be used with columns of any data type.
The keyword will also take NULL
value into account when filtering your result table.
Here’s an example of filtering the age
column with INT
type:
+------+
| age |
+------+
| 23 |
| 25 |
| 22 |
| 29 |
+------+
The DISTINCT
keyword will make sure that each row in the result set of your query is unique.
This means that when you query more than one column, some of the columns can still be duplicates.
Consider the following example where both first_name
and age
are selected:
SELECT DISTINCT first_name, age FROM members;
-- +------------+------+
-- | first_name | age |
-- +------------+------+
-- | Jessie | 23 |
-- | Ann | 25 |
-- | Joe | 22 |
-- | Ann | 23 |
-- | Peter | 29 |
-- +------------+------+
The above example shows that some of the column values still have duplicates. There are two Ann
and two 23
values.
This is because DISTINCT
takes into account the whole row for filtering the result, and it considers Ann
of age 25
different from Ann
of age 23
.
Let’s add one more row of Peter
with age 29
in the table:
+----+------------+------+
| id | first_name | age |
+----+------------+------+
| 1 | Jessie | 23 |
| 2 | Ann | 25 |
| 3 | Joe | 22 |
| 4 | Ann | 23 |
| 5 | Peter | 29 |
| 6 | Peter | 29 |
+----+------------+------+
The two Peter
above are completely identical. Only one will be shown when you use the DISTINCT
keyword without selecting the id
column:
SELECT DISTINCT first_name, age FROM members;
-- +------------+------+
-- | first_name | age |
-- +------------+------+
-- | Jessie | 23 |
-- | Ann | 25 |
-- | Joe | 22 |
-- | Ann | 23 |
-- | Peter | 29 |
-- +------------+------+
But if you also select the id
column, then there will be two Peter
rows because they have different ids:
SELECT DISTINCT id,first_name, age FROM members;
-- +----+------------+------+
-- | id | first_name | age |
-- +----+------------+------+
-- | 1 | Jessie | 23 |
-- | 2 | Ann | 25 |
-- | 3 | Joe | 22 |
-- | 4 | Ann | 23 |
-- | 5 | Peter | 29 |
-- | 6 | Peter | 29 |
-- +----+------------+------+
And that’s how the DISTINCT
keyword works in MySQL.
Please note that the DISTINCT
keyword can only be used with SELECT
statements, so you can’t use it with other statements like UPDATE
or DELETE
statements.
But the DISTINCT
statement can still be used with a SELECT
subquery as follows:
SELECT first_name, age
FROM members
WHERE age IN (SELECT DISTINCT age FROM members)
The WHERE
clause above will have no effect on the result set at all, but it gives you an idea of how to use the DISTINCT
keyword in subqueries.