Joins in SQL

We perform joins in SQL when we combine and integrate data from multiple tables. In order for a join to be possible, the tables you want to join must have rows that share the same fields.

The output will be represented as a table but it is virtual, not an actual table. (If you wish to actually create a table or a view off the back of a join operation you should combine the join with the CREATE TABLE command etc.)

Inner joins

An inner join combines rows that have matching values in two or more tables. This join is based on the existence of a common field between the tables and is used to retrieve data that appears in both tables.

The following is a good scenario for creating an inner join:

We want to create a list of the name of all computers that have been sold and when they were sold.

We will be working from the following tables.

The sales table:

saleIdmodelIdsaleDateemployeeIdprice
1442020-07-27tbishop40.99
2222021-02-05tbishop986.99
3142022-11-16tgnomay1248.99

The model table:

modelIdmodelName
44Raspberry Pi Model A
22Apple MacBook Air
14Lenovo ThinkPad P15

The data points we need are not provided in a single table. sales gives us the saleDate but not the modelName. In order to meet our requirements we must perform a join that combines modelName from the model table and saleDate from the sale table to give us a table like the following:

modelNamesale_date
Raspberry Pi Model A2015-02-01
Apple Macbook Air2018-11-01
Lenovo ThinkPad P152018-11-01

We can do this by utilising the field they have in common: modelId.

Implementation

The general syntax for creating an inner join from two tables that share a common field is as follows:

SELECT column_name(s)
FROM table1
INNER JOIN table2
ON table1.columnName = table2.columnName

So basically, look at table1 and table2 return the rows where the fields for table1 and table2 match, where these fields are the ones specified in the SELECT.

Applied to our example:

SELECT model.modelName, sales.saleDate -- Name the fields
FROM model -- Specify the first table
INNER JOIN sales -- Specify the second table
ON model.model_id = sales.model_id; -- Specify the match criteria

It is important to note that the match must be a match. Both tables must have a row with the same value for modelId for the join to be applied. If there is a modelId of X in sales but no corresponding X for modelId in model, nothing will be returned for that row.

We can represent the logical relationship that obtains between the sales and model tables as follows:

Outer joins

Outer joins are joins that return matched values and unmatched values from either or both tables.

To explain outer joins we need to expand our data set:

The sales table:

saleIdmodelIdsaleDateemployeeIdprice
1442020-07-27tbishop40.99
2222021-02-05tbishop986.99
3142022-11-16tgnomay1248.99
42020-03-01msmelk699.99

The model table:

modelIdmodelName
44Raspberry Pi Model A
22Apple MacBook Air
14Lenovo ThinkPad P15
66Google Chromebook

We will assume that sales is our left table and model our right.

Left outer join

A left outer join returns all records from the left or first table and the matching records from the right or second table. If there are not matches on the right then only the left data is returned.

In the context of our dataset, in the sales database we have a row where there is a sale with no corresponding modelId. If we performed a left outer join this would give us the following table in return:

modelNamesale_date
Raspberry Pi Model A2015-02-01
Apple Macbook Air2018-11-01
Lenovo ThinkPad P152018-11-01
2020-03-01

The logical relationship sustained between sales and model by a left inner join is represented in the following diagram:

Implementation

The schematic form:

SELECT column_name(s)
FROM table1
LEFT JOIN table2
ON table1.column_name = table2.column_name;

Applied to our data set:

SELECT model.modelName, sales.saleDate
FROM sales.saleDate
LEFT JOIN model
ON sales.modelId = model.modelId

Right outer join

This is the inverse of the right outer join. A right outer join returns all records from the right or second table and the matching records from the left or first table. If there are not matches on the left then only the right data is returned.

In the context of our dataset, in the model database we have a row where there is a modelId without a corresponding entry in the sales table. If we performed a right outer join this would give us the following table in return:

modelNamesale_date
Raspberry Pi Model A2015-02-01
Apple Macbook Air2018-11-01
Lenovo ThinkPad P152018-11-01
Google Chromebook

The logical relationship sustained between sales and model by a right inner join is represented in the following diagram:

Implementation

The schematic form:

SELECT column_name(s)
FROM table1
RIGHT JOIN table2
ON table1.column_name = table2.column_name;

Applied to our data set:

SELECT model.modelName, sales.saleDate
FROM sales.saleDate
RIGHT JOIN model
ON sales.modelId = model.modelId

Full outer join

The full outer join returns all rows when there is a match in either the left or the right table. So, for table A and table B the full join returns all rows in A and all rows in B including those where there is a match between A and B. This is the broadest type of join and can often result in large data sets.

In the context of our dataset it would result in the following table being generated:

modelNamesale_date
Raspberry Pi Model A2015-02-01
Apple Macbook Air2018-11-01
Lenovo ThinkPad P152018-11-01
2020-03-01
Google Chromebook

Represented by the following diagram:

Implementation

The schematic form:

SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2
ON table1.column_name = table2.column_name;

Applied to our data set:

SELECT model.modelName, sales.saleDate
FROM sales.saleDate
FULL OUTER JOIN model
ON sales.modelId = model.modelId

This type of join is used when you want to discern when there is not a match between two fields across tables. For example: imagine that you wanted a list of computers that had never been sold. In this case, you would be interested in rows where there is a model_id without a corresponding sales_id .

In SQL this would be achieved with:

SELECT model.name, sales.sale_date
FROM model
LEFT JOIN sales on model.model_id = sales.model_id;

Note that this would return all the model names but where there isn’t a sale data, NULL would be returned. This is an important distinction : the outer join method doesn’t just return the rows with a NULL value for sale_date as we might expect. It returns all models along with those that have not been sold. This is because it is oriented to the “left” table, equivalent to the table in the SQL that we cited first with the on keyword.

A left outer join returns all the records from the left (model) table and those that match in the right (sales) table. Where there are no matching records in the right (sales) table, a NULL value is returned.

A right outer join, often referred to as a right join, is the opposite of a left outer; it returns all the records from the right table and those that match in the left table. In our scenario this would be all the models that had a sale_date including models that didn’t have a sale_date , i.e which returned NULL

Finally, a full outer join returns all the records from both tables, and where a record cannot be matched, a NULL value is returned. So this would mean there could be NULLs in both fields of the returned rows.

We can combine multiple types of join in the same SQL query:

SELECT model.name, sales.sale_date, manufacturer.url
FROM model
LEFT JOIN sales on model.model_id = sales.model_id
INNER JOIN manufacturer on model.manufacturer_id = manufacturer.manufacturer_id;