Practical SELECT Queries in SQL


Few SUPER IMPORTANT things before we start:

  • We’ll use SQL Sakila database to run queries against, please make sure to download it here and import it and familiarise yourself with the tables.
  • Read the database overview section BELOW before you write any code.
  • Remember it’s totally nomal for a database to look confusing at first glance.., so make sure you spend some good time trying to make sense of it in general, it’s actually really, really important you do that. If you don’t understand the database and how things are connected then each query will seem confusing. To get an understanding of how database is designed and the tables are connected read each table’s description here. ATLEASE ONCE!

How to make this post ACTUALLY useful for YOURSELF:

To make the most of this post please do yourself a favor and make sure you have basic understand of how this database is organised. I’d also recommdend to avoid looking at the queries. Instead read the problem, open your SQL editor and see if you can solve it yourself first. Please spend atleast 30 mins before you look at the code.

LETS GOO!!

Database Overview – Renting Movies

It’s a database to store the information about Renting Movie(s).

Imagine this..

John Doe has just returned home after a loooong 12 hours shift. It’s Saturday night and it’s cold outside and he doesn’t have any plans either.. After spending an hour or so binge eating he decided to pay a visit to a local movie rental store.

Movie rental store as the name implies rents movies in a form of CD/DVD for agreed time.

John browses through different categories for 10 minutes and finds a few movies that seem interesting.. He walks up to the till, pays £2.50 and rents “Terminator” for a month.

Meanwhile the cashier notes down the details about the payment. For example, the movie that’s rented, who rented it, their details, the amount paid, cashier’s (or staff) own details, time of renting, store details, city etc..

Unforunately cashier has to enter all these details manually into the system.

Once the information is submitted into the system Cashier waves and say “Thank you and enjoy the movie!”.

John Doe heads back home.

Note: Sakila database comes with tables design file called “sakila.mwb” and it can be imported into Mysql Workbench. Once you’ve imported the file you’ll be able to see how the tables are connected to each other in a form of diagrams.

Real Problems & Solutions

Run a query to join two tables, customers and payments. I want to visually see customers who got no records in the payment table. Not just the matching values.

Screenshot below shows us the customers who have payments but also customers who got no payments at all.

There’s going to be a lot of null values once you run your query.

Please delete some records from payment table before writing your query. You can delete records like so:

delete from payment limit 500

Solution:

SELECT 
    *
FROM
    customer
        LEFT JOIN
    payment ON customer.customer_id = payment.customer_id

Get the Total Amount a Customer made in payments. Group by Customer id. Default currency is USD. Add another column that shows GBP. Use GBP rate at the time of writing query.

Expected result:

SELECT 
    customer.first_name AS 'First Name',
    customer.email AS 'Email',
    SUM(amount) AS 'total (USD)',
    SUM(amount) * 0.833 AS 'Total (GBP)'
FROM
    sakila.payment
        JOIN
    customer ON payment.customer_id = customer.customer_id
GROUP BY customer.customer_id

Get a list of customers who never returned the films.

SELECT 
    customer.first_name, customer.last_name, customer.email, rental.rental_date
FROM
    customer
        JOIN
    rental ON customer.customer_id = rental.customer_id
WHERE
    return_date IS NULL

We want to see how much money a store has taken in payment each month.

In other words..

Display the total amount of payment for the store at “47 MySakila Drive” has taken. It must be grouped and ordered by latest month.

Using JOIN

SELECT 
    SUM(payment.amount) AS total,
    DATE_FORMAT(payment.payment_date, '%Y-%m') date
FROM
    payment
        JOIN
    staff ON payment.staff_id = staff.staff_id
        JOIN
    store ON staff.store_id = store.store_id
        JOIN
    address ON store.address_id = address.address_id
WHERE
    address.address = '47 MySakila Drive'
GROUP BY date
ORDER BY date DESC

Using SELECT Subqueries

SELECT 
    SUM(amount) total, DATE_FORMAT(payment_date, '%Y-%m') date
FROM
    payment
WHERE
    staff_id IN (SELECT 
            staff_id
        FROM
            staff
        WHERE
            store_id IN (SELECT 
                    store_id
                FROM
                    store
                WHERE
                    address_id IN (SELECT 
                            address_id
                        FROM
                            sakila.address
                        WHERE
                            address = '47 MySakila Drive')))
GROUP BY date
ORDER BY date DESC;

We want to see the store that has performed better than anyone.

In othe words:

Display the store’s address which has the generated the most money along with the total.

SELECT 
    SUM(amount) total, address.address
FROM
    payment
        JOIN
    staff ON payment.staff_id = staff.staff_id
        JOIN
    store ON staff.store_id = store.store_id
        JOIN
    address ON store.address_id = address.address_id
GROUP BY address.address
ORDER BY total DESC
LIMIT 1

The above query should return total of: 33927.04 and address as: 28 MySQL Boulevard. Pick this address and run another query to verify that above query’s total.

SELECT 
    SUM(amount)
FROM
    payment
WHERE
    staff_id IN (SELECT 
            staff_id
        FROM
            staff
        WHERE
            staff.store_id IN (SELECT 
                    staff.store_id
                FROM
                    staff
                        JOIN
                    store ON staff.store_id = store.store_id
                        JOIN
                    address ON store.address_id = address.address_id
                WHERE
                    address.address = '28 MySQL Boulevard'))

How much money EACH store made so far.

In other words.. we want to see the payment received by each store in 2005 and 2006.

addressamountdate
47 MySakila Drive33255.382005
47 MySakila Drive234.092006
28 MySQL Boulevard33646.952005
28 MySQL Boulevard280.092006
SELECT 
    address.address address,
    SUM(amount) amount,
    DATE_FORMAT(payment_date, '%Y') AS date
FROM
    payment
        JOIN
    staff ON payment.staff_id = staff.staff_id
        JOIN
    store ON staff.store_id = store.store_id
        JOIN
    address ON store.address_id = address.address_id
GROUP BY address , date;

The only problem with above result is that it’s not readable. It would make more sense if ‘date’ was columns. Like in table below..

address20052006
47 MySakila Drive33255.38234.09
28 MySQL Boulevard33646.95280.09
SELECT 
    address.address address,
    SUM(CASE
        WHEN DATE_FORMAT(payment_date, '%y') = '05' THEN payment.amount
        ELSE 0
    END) AS '2005',
    SUM(CASE
        WHEN DATE_FORMAT(payment_date, '%y') = '06' THEN payment.amount
        ELSE 0
    END) AS '2006'
FROM
    payment
        JOIN
    staff ON payment.staff_id = staff.staff_id
        JOIN
    store ON staff.store_id = store.store_id
        JOIN
    address ON staff.address_id = address.address_id
GROUP BY address

Get the list of customers who didn’t return the DVDs on time at ’47 MySakila Drive’ store.

In other words.. I want we want to find the customers who don’t return DVDs on time..

SELECT 
    customer.first_name `First name`,
    customer.last_name `Last name`,
    DATE_FORMAT(rental_date, '%Y-%m-%d') as `Rental date`,
    DATE_FORMAT(rental.return_date, '%Y-%m-%d') `Returned on`,
    DATEDIFF(return_date, rental_date) - film.rental_duration `Overdue by`,
    film.rental_duration `Rental Duration`
FROM
    sakila.rental
        JOIN
    customer ON rental.customer_id = customer.customer_id
        JOIN
    inventory ON rental.inventory_id = inventory.inventory_id
        JOIN
    film ON inventory.film_id = film.film_id
        JOIN
    store ON inventory.store_id = store.store_id
        JOIN
    address ON store.address_id = address.address_id
        AND address.address = '47 MySakila Drive'
WHERE
    DATEDIFF(return_date, rental_date) > film.rental_duration

A store owner is trying to increase the sales. He’s come up with the following strategy to bring back old customers and enocurage new ones to rent more.

Offer discount to anyone who’s rented atleast 5 DVDs in past. Store owner should be able to look up whether discount is applicable by entering their email. Lets use following addresses for now: [email protected], [email protected]

Note: Realistically we’d need to keep track of customers have already used their discount so we don’t end up giving discount each time they rent a movie. But let’s ignore that bit and focus on the selection of the data.

Aim for result like this. Tell whether they’re eligible or not, a short comment.. and how many DVDs they’ve rented so far.

EligibilityReasonRented
Not eligibleCustomer must rent 3 more DVDs to be eligible for discount17
SELECT 
    CASE
        WHEN COUNT(*) >= 20 THEN 'Eligible'
        ELSE 'Not eligible'
    END AS 'Eligibility',
    CASE
        WHEN
            COUNT(*) < 20
        THEN
            CONCAT('Customer must rent ',
                    20 - COUNT(*),
                    ' more DVDs to be eligible for discount')
    END AS 'Reason',
    COUNT(*) 'Rented'
FROM
    rental
WHERE
    customer_id IN (SELECT 
            customer_id
        FROM
            customer
        WHERE
            email = '[email protected]');

The store has changed their policy to not provide discount to anyone who didn’t return DVDs on time in past. We can achieve this by adding few more queries to above queries. Note the bit where I sum the total rental duration and subtract the difference of return date and rental date.

SUM(CAST(film.rental_duration AS SIGNED)) - SUM(DATEDIFF(rental.return_date, rental.rental_date)) AS `Rented vs returned difference`,

Complete query:

SELECT 
    SUM(CAST(film.rental_duration AS SIGNED)) - SUM(DATEDIFF(rental.return_date, rental.rental_date)) AS `Rented vs returned difference`,
    CASE
        WHEN
            (COUNT(rental.rental_id) >= 20
                AND (SUM(CAST(film.rental_duration AS SIGNED)) - SUM(DATEDIFF(rental.return_date, rental.rental_date)) >= 0))
        THEN
            'Eligible'
        ELSE 'Not eligible'
    END AS 'Eligibility',
    COUNT(rental.rental_id) AS 'Total movies rented'
FROM
    rental
        JOIN
    inventory ON rental.inventory_id = inventory.inventory_id
        JOIN
    film ON inventory.film_id = film.film_id
WHERE
    customer_id IN (SELECT 
            customer_id
        FROM
            customer
        WHERE
            email = '[email protected]');

Find the most popular category and order it by popularity.

The result may look like below:

nametotal rented
Sports1179
Animation1166
Action1112
Sci-Fi1101
Family1096
SELECT 
    category.name,
    COUNT(inventory.inventory_id) AS `Total rented`
FROM
    sakila.film
        JOIN
    film_category ON film.film_id = film_category.film_id
        JOIN
    category ON film_category.category_id = category.category_id
        JOIN
    inventory ON film.film_id = inventory.film_id
        JOIN
    rental ON inventory.inventory_id = rental.inventory_id
GROUP BY film_category.category_id
ORDER BY COUNT(inventory.inventory_id) DESC;

Lets now verify if our result is correct by writing simple subqueries and hardcoding category_id.

-- category_id = 1 is for Action movies
-- category_id = 15 is for Sports movies
-- Try them both and match with the join above.
SELECT 
    COUNT(rental.inventory_id)
FROM
    rental
WHERE
    inventory_id IN (SELECT 
            inventory.inventory_id
        FROM
            inventory
        WHERE
            film_id IN (SELECT 
                    film_id
                FROM
                    film_category
                WHERE
                    category_id = 1));

Find the most popular actor

SELECT 
    actor.first_name, actor.last_name
FROM
    actor
        JOIN
    film_actor ON actor.actor_id = (SELECT 
            actor_id
        FROM
            film_actor
        GROUP BY actor_id
        ORDER BY COUNT(*) DESC
        LIMIT 1)
LIMIT 1;

Find the most popular actor in each category

Result should look something like this:

category_idmax(actor.actor_id)
16199
15200
14200
13200
12200
11197
10198
SELECT 
    film_category.category_id, MAX(actor.actor_id)
FROM
    category
        JOIN
    film_category ON category.category_id = film_category.category_id
        JOIN
    film_actor ON film_category.film_id = film_actor.film_id
        JOIN
    actor ON film_actor.actor_id = actor.actor_id
GROUP BY film_category.category_id
ORDER BY COUNT(actor.actor_id) DESC

There is going to be a campaign to let customers know about upcoming movies.

We want a list of customer by spend. For more context, we want to maybe pick first 100 customers from this list and send them promotional emails. So lets get customers list by spend and order it by most spend.

Aim for result like this:

first_namelast_nameemailspend
KARLSEAL[email protected]221.55
ELEANORHUNT[email protected]216.54
CLARASHAW[email protected]195.58
RHONDAKENNEDY[email protected]194.61
MARIONSNYDER[email protected]194.61
SELECT 
    customer.first_name,
    customer.last_name,
    customer.email,
    SUM(payment.amount) AS spend
FROM
    payment
        JOIN
    customer ON payment.customer_id = customer.customer_id
WHERE
    payment.customer_id IN (SELECT 
            customer_id
        FROM
            rental)
GROUP BY payment.customer_id
ORDER BY spend DESC;

Due to Covid-19 regulation the store owner has decided to inform the customers to avoid visiting the store during the busiest hours.

Find out the busiest hours for store: 23 Workhaven Lane. Limit the results by 5.

Aim for result like this

hourpayments taken
15436
03372
00368
08359
07354
SELECT 
    DATE_FORMAT(payment_date, '%H') `hour`,
    COUNT(*) `payments taken`
FROM
    sakila.payment
WHERE
    staff_id IN (SELECT 
            store_id
        FROM
            staff
        WHERE
            address_id IN (SELECT 
                    address_id
                FROM
                    address
                WHERE
                    address = '23 Workhaven Lane'))
GROUP BY DATE_FORMAT(payment_date, '%H')
ORDER BY COUNT(*) DESC
LIMIT 5

How much money “47 MySakila Drive store made each month in 2005. Show the increase in percentage comparing each month with next one.

For example, what was the percentage increase between Feb and May?

Aim for result like this:

monthpercent
0582.176571
06193.184350
07-15.352475
08

SELECT 
    DATE_FORMAT(payment_date, '%m') AS month,
    ((((SELECT 
            SUM(p2.amount) AS p2_amount
        FROM
            payment p2
        WHERE
            DATE_FORMAT(p2.payment_date, '%Y') = 2005
                AND DATE_FORMAT(p2.payment_date, '%m') > month
                AND staff_id = (SELECT 
                    staff.staff_id
                FROM
                    staff
                        JOIN
                    store ON staff.store_id = store.store_id
                        JOIN
                    address ON store.address_id = address.address_id
                        AND address.address = '47 MySakila Drive'
                LIMIT 1)
        GROUP BY DATE_FORMAT(p2.payment_date, '%m')
        ORDER BY month
        LIMIT 1) - SUM(p1.amount)) / SUM(p1.amount)) * 100) AS percent
FROM
    payment p1
WHERE
    DATE_FORMAT(p1.payment_date, '%Y') = 2005
        AND staff_id = (SELECT 
            staff.staff_id
        FROM
            staff
                JOIN
            store ON staff.store_id = store.store_id
                JOIN
            address ON store.address_id = address.address_id
                AND address.address = '47 MySakila Drive'
        LIMIT 1)
GROUP BY month
ORDER BY month

Get all the movies that were NEVER rented

-- Get the grouped by film_id of all the movies that have been rented.
-- Then get the film_ids of all movies that aren't in "have been rented" list above. This will give us a list of movies that were never rented
-- Note: Some movies, even though they have been rented, will not be in 'rental' table.. simply because a film is referenced by it's UNIQUE inventory_id in the 'rental' table, not by 'film_id'.
SELECT 
    *
FROM
    film
WHERE
    film.film_id NOT IN (SELECT 
            film.film_id
        FROM
            inventory
                JOIN
            film ON inventory.film_id = film.film_id
                JOIN
            rental ON inventory.inventory_id = rental.inventory_id
        GROUP BY film_id);

Search film ‘title’ by a keyword for example “alien” and order the results by popularity.

Aim for result like this:

film_idtitlecount(rental.rental_id)
418HOBBIT ALIEN31
15ALIEN CENTER22
223DESIRE ALIEN8
SELECT 
    film.film_id, film.title, COUNT(rental.rental_id)
FROM
    sakila.film
        JOIN
    inventory ON film.film_id = inventory.film_id
        JOIN
    rental ON inventory.inventory_id = rental.inventory_id
WHERE
    film.title LIKE '%alien%'
GROUP BY film_id , title
ORDER BY COUNT(rental.rental_id) DESC;


Leave a Reply

Your email address will not be published. Required fields are marked *