Few SUPER IMPORTANT things before we start:
- We’ll use SQL Sakila database to run queries against, please make sure to download it here and import it and familiarise yourself with the tables.
- Read the database overview section BELOW before you write any code.
- Remember it’s totally nomal for a database to look confusing at first glance.., so make sure you spend some good time trying to make sense of it in general, it’s actually really, really important you do that. If you don’t understand the database and how things are connected then each query will seem confusing. To get an understanding of how database is designed and the tables are connected read each table’s description here. ATLEASE ONCE!
How to make this post ACTUALLY useful for YOURSELF:
To make the most of this post please do yourself a favor and make sure you have basic understand of how this database is organised. I’d also recommdend to avoid looking at the queries. Instead read the problem, open your SQL editor and see if you can solve it yourself first. Please spend atleast 30 mins before you look at the code.
LETS GOO!!
Database Overview – Renting Movies
It’s a database to store the information about Renting Movie(s).
Imagine this..
John Doe has just returned home after a loooong 12 hours shift. It’s Saturday night and it’s cold outside and he doesn’t have any plans either.. After spending an hour or so binge eating he decided to pay a visit to a local movie rental store.
Movie rental store as the name implies rents movies in a form of CD/DVD for agreed time.
John browses through different categories for 10 minutes and finds a few movies that seem interesting.. He walks up to the till, pays £2.50 and rents “Terminator” for a month.
Meanwhile the cashier notes down the details about the payment. For example, the movie that’s rented, who rented it, their details, the amount paid, cashier’s (or staff) own details, time of renting, store details, city etc..
Unforunately cashier has to enter all these details manually into the system.
Once the information is submitted into the system Cashier waves and say “Thank you and enjoy the movie!”.
John Doe heads back home.
Note: Sakila database comes with tables design file called “sakila.mwb” and it can be imported into Mysql Workbench. Once you’ve imported the file you’ll be able to see how the tables are connected to each other in a form of diagrams.
Real Problems & Solutions
Run a query to join two tables, customers and payments. I want to visually see customers who got no records in the payment table. Not just the matching values.
Screenshot below shows us the customers who have payments but also customers who got no payments at all.
There’s going to be a lot of null values once you run your query.
Please delete some records from payment table before writing your query. You can delete records like so:
delete from payment limit 500
Solution:
SELECT * FROM customer LEFT JOIN payment ON customer.customer_id = payment.customer_id
Get the Total Amount a Customer made in payments. Group by Customer id. Default currency is USD. Add another column that shows GBP. Use GBP rate at the time of writing query.
Expected result:
SELECT customer.first_name AS 'First Name', customer.email AS 'Email', SUM(amount) AS 'total (USD)', SUM(amount) * 0.833 AS 'Total (GBP)' FROM sakila.payment JOIN customer ON payment.customer_id = customer.customer_id GROUP BY customer.customer_id
Get a list of customers who never returned the films.
SELECT customer.first_name, customer.last_name, customer.email, rental.rental_date FROM customer JOIN rental ON customer.customer_id = rental.customer_id WHERE return_date IS NULL
We want to see how much money a store has taken in payment each month.
In other words..
Display the total amount of payment for the store at “47 MySakila Drive” has taken. It must be grouped and ordered by latest month.
Using JOIN
SELECT SUM(payment.amount) AS total, DATE_FORMAT(payment.payment_date, '%Y-%m') date FROM payment JOIN staff ON payment.staff_id = staff.staff_id JOIN store ON staff.store_id = store.store_id JOIN address ON store.address_id = address.address_id WHERE address.address = '47 MySakila Drive' GROUP BY date ORDER BY date DESC
Using SELECT Subqueries
SELECT SUM(amount) total, DATE_FORMAT(payment_date, '%Y-%m') date FROM payment WHERE staff_id IN (SELECT staff_id FROM staff WHERE store_id IN (SELECT store_id FROM store WHERE address_id IN (SELECT address_id FROM sakila.address WHERE address = '47 MySakila Drive'))) GROUP BY date ORDER BY date DESC;
We want to see the store that has performed better than anyone.
In othe words:
Display the store’s address which has the generated the most money along with the total.
SELECT SUM(amount) total, address.address FROM payment JOIN staff ON payment.staff_id = staff.staff_id JOIN store ON staff.store_id = store.store_id JOIN address ON store.address_id = address.address_id GROUP BY address.address ORDER BY total DESC LIMIT 1
The above query should return total of: 33927.04 and address as: 28 MySQL Boulevard. Pick this address and run another query to verify that above query’s total.
SELECT SUM(amount) FROM payment WHERE staff_id IN (SELECT staff_id FROM staff WHERE staff.store_id IN (SELECT staff.store_id FROM staff JOIN store ON staff.store_id = store.store_id JOIN address ON store.address_id = address.address_id WHERE address.address = '28 MySQL Boulevard'))
How much money EACH store made so far.
In other words.. we want to see the payment received by each store in 2005 and 2006.
address | amount | date |
47 MySakila Drive | 33255.38 | 2005 |
47 MySakila Drive | 234.09 | 2006 |
28 MySQL Boulevard | 33646.95 | 2005 |
28 MySQL Boulevard | 280.09 | 2006 |
SELECT address.address address, SUM(amount) amount, DATE_FORMAT(payment_date, '%Y') AS date FROM payment JOIN staff ON payment.staff_id = staff.staff_id JOIN store ON staff.store_id = store.store_id JOIN address ON store.address_id = address.address_id GROUP BY address , date;
The only problem with above result is that it’s not readable. It would make more sense if ‘date’ was columns. Like in table below..
address | 2005 | 2006 |
47 MySakila Drive | 33255.38 | 234.09 |
28 MySQL Boulevard | 33646.95 | 280.09 |
SELECT address.address address, SUM(CASE WHEN DATE_FORMAT(payment_date, '%y') = '05' THEN payment.amount ELSE 0 END) AS '2005', SUM(CASE WHEN DATE_FORMAT(payment_date, '%y') = '06' THEN payment.amount ELSE 0 END) AS '2006' FROM payment JOIN staff ON payment.staff_id = staff.staff_id JOIN store ON staff.store_id = store.store_id JOIN address ON staff.address_id = address.address_id GROUP BY address
Get the list of customers who didn’t return the DVDs on time at ’47 MySakila Drive’ store.
In other words.. I want we want to find the customers who don’t return DVDs on time..
SELECT customer.first_name `First name`, customer.last_name `Last name`, DATE_FORMAT(rental_date, '%Y-%m-%d') as `Rental date`, DATE_FORMAT(rental.return_date, '%Y-%m-%d') `Returned on`, DATEDIFF(return_date, rental_date) - film.rental_duration `Overdue by`, film.rental_duration `Rental Duration` FROM sakila.rental JOIN customer ON rental.customer_id = customer.customer_id JOIN inventory ON rental.inventory_id = inventory.inventory_id JOIN film ON inventory.film_id = film.film_id JOIN store ON inventory.store_id = store.store_id JOIN address ON store.address_id = address.address_id AND address.address = '47 MySakila Drive' WHERE DATEDIFF(return_date, rental_date) > film.rental_duration
A store owner is trying to increase the sales. He’s come up with the following strategy to bring back old customers and enocurage new ones to rent more.
Offer discount to anyone who’s rented atleast 5 DVDs in past. Store owner should be able to look up whether discount is applicable by entering their email. Lets use following addresses for now: [email protected], [email protected]
Note: Realistically we’d need to keep track of customers have already used their discount so we don’t end up giving discount each time they rent a movie. But let’s ignore that bit and focus on the selection of the data.
Aim for result like this. Tell whether they’re eligible or not, a short comment.. and how many DVDs they’ve rented so far.
Eligibility | Reason | Rented |
Not eligible | Customer must rent 3 more DVDs to be eligible for discount | 17 |
SELECT CASE WHEN COUNT(*) >= 20 THEN 'Eligible' ELSE 'Not eligible' END AS 'Eligibility', CASE WHEN COUNT(*) < 20 THEN CONCAT('Customer must rent ', 20 - COUNT(*), ' more DVDs to be eligible for discount') END AS 'Reason', COUNT(*) 'Rented' FROM rental WHERE customer_id IN (SELECT customer_id FROM customer WHERE email = '[email protected]');
The store has changed their policy to not provide discount to anyone who didn’t return DVDs on time in past. We can achieve this by adding few more queries to above queries. Note the bit where I sum the total rental duration and subtract the difference of return date and rental date.
SUM(CAST(film.rental_duration AS SIGNED)) - SUM(DATEDIFF(rental.return_date, rental.rental_date)) AS `Rented vs returned difference`,
Complete query:
SELECT SUM(CAST(film.rental_duration AS SIGNED)) - SUM(DATEDIFF(rental.return_date, rental.rental_date)) AS `Rented vs returned difference`, CASE WHEN (COUNT(rental.rental_id) >= 20 AND (SUM(CAST(film.rental_duration AS SIGNED)) - SUM(DATEDIFF(rental.return_date, rental.rental_date)) >= 0)) THEN 'Eligible' ELSE 'Not eligible' END AS 'Eligibility', COUNT(rental.rental_id) AS 'Total movies rented' FROM rental JOIN inventory ON rental.inventory_id = inventory.inventory_id JOIN film ON inventory.film_id = film.film_id WHERE customer_id IN (SELECT customer_id FROM customer WHERE email = '[email protected]');
Find the most popular category and order it by popularity.
The result may look like below:
name | total rented |
Sports | 1179 |
Animation | 1166 |
Action | 1112 |
Sci-Fi | 1101 |
Family | 1096 |
SELECT category.name, COUNT(inventory.inventory_id) AS `Total rented` FROM sakila.film JOIN film_category ON film.film_id = film_category.film_id JOIN category ON film_category.category_id = category.category_id JOIN inventory ON film.film_id = inventory.film_id JOIN rental ON inventory.inventory_id = rental.inventory_id GROUP BY film_category.category_id ORDER BY COUNT(inventory.inventory_id) DESC;
Lets now verify if our result is correct by writing simple subqueries and hardcoding category_id.
-- category_id = 1 is for Action movies -- category_id = 15 is for Sports movies -- Try them both and match with the join above. SELECT COUNT(rental.inventory_id) FROM rental WHERE inventory_id IN (SELECT inventory.inventory_id FROM inventory WHERE film_id IN (SELECT film_id FROM film_category WHERE category_id = 1));
Find the most popular actor
SELECT actor.first_name, actor.last_name FROM actor JOIN film_actor ON actor.actor_id = (SELECT actor_id FROM film_actor GROUP BY actor_id ORDER BY COUNT(*) DESC LIMIT 1) LIMIT 1;
Find the most popular actor in each category
Result should look something like this:
category_id | max(actor.actor_id) |
16 | 199 |
15 | 200 |
14 | 200 |
13 | 200 |
12 | 200 |
11 | 197 |
10 | 198 |
SELECT film_category.category_id, MAX(actor.actor_id) FROM category JOIN film_category ON category.category_id = film_category.category_id JOIN film_actor ON film_category.film_id = film_actor.film_id JOIN actor ON film_actor.actor_id = actor.actor_id GROUP BY film_category.category_id ORDER BY COUNT(actor.actor_id) DESC
There is going to be a campaign to let customers know about upcoming movies.
We want a list of customer by spend. For more context, we want to maybe pick first 100 customers from this list and send them promotional emails. So lets get customers list by spend and order it by most spend.
Aim for result like this:
first_name | last_name | spend | |
KARL | SEAL | [email protected] | 221.55 |
ELEANOR | HUNT | [email protected] | 216.54 |
CLARA | SHAW | [email protected] | 195.58 |
RHONDA | KENNEDY | [email protected] | 194.61 |
MARION | SNYDER | [email protected] | 194.61 |
SELECT customer.first_name, customer.last_name, customer.email, SUM(payment.amount) AS spend FROM payment JOIN customer ON payment.customer_id = customer.customer_id WHERE payment.customer_id IN (SELECT customer_id FROM rental) GROUP BY payment.customer_id ORDER BY spend DESC;
Due to Covid-19 regulation the store owner has decided to inform the customers to avoid visiting the store during the busiest hours.
Find out the busiest hours for store: 23 Workhaven Lane. Limit the results by 5.
Aim for result like this
hour | payments taken |
15 | 436 |
03 | 372 |
00 | 368 |
08 | 359 |
07 | 354 |
SELECT DATE_FORMAT(payment_date, '%H') `hour`, COUNT(*) `payments taken` FROM sakila.payment WHERE staff_id IN (SELECT store_id FROM staff WHERE address_id IN (SELECT address_id FROM address WHERE address = '23 Workhaven Lane')) GROUP BY DATE_FORMAT(payment_date, '%H') ORDER BY COUNT(*) DESC LIMIT 5
How much money “47 MySakila Drive“ store made each month in 2005. Show the increase in percentage comparing each month with next one.
For example, what was the percentage increase between Feb and May?
Aim for result like this:
month | percent |
05 | 82.176571 |
06 | 193.184350 |
07 | -15.352475 |
08 |
SELECT DATE_FORMAT(payment_date, '%m') AS month, ((((SELECT SUM(p2.amount) AS p2_amount FROM payment p2 WHERE DATE_FORMAT(p2.payment_date, '%Y') = 2005 AND DATE_FORMAT(p2.payment_date, '%m') > month AND staff_id = (SELECT staff.staff_id FROM staff JOIN store ON staff.store_id = store.store_id JOIN address ON store.address_id = address.address_id AND address.address = '47 MySakila Drive' LIMIT 1) GROUP BY DATE_FORMAT(p2.payment_date, '%m') ORDER BY month LIMIT 1) - SUM(p1.amount)) / SUM(p1.amount)) * 100) AS percent FROM payment p1 WHERE DATE_FORMAT(p1.payment_date, '%Y') = 2005 AND staff_id = (SELECT staff.staff_id FROM staff JOIN store ON staff.store_id = store.store_id JOIN address ON store.address_id = address.address_id AND address.address = '47 MySakila Drive' LIMIT 1) GROUP BY month ORDER BY month
Get all the movies that were NEVER rented
-- Get the grouped by film_id of all the movies that have been rented. -- Then get the film_ids of all movies that aren't in "have been rented" list above. This will give us a list of movies that were never rented -- Note: Some movies, even though they have been rented, will not be in 'rental' table.. simply because a film is referenced by it's UNIQUE inventory_id in the 'rental' table, not by 'film_id'. SELECT * FROM film WHERE film.film_id NOT IN (SELECT film.film_id FROM inventory JOIN film ON inventory.film_id = film.film_id JOIN rental ON inventory.inventory_id = rental.inventory_id GROUP BY film_id);
Search film ‘title’ by a keyword for example “alien” and order the results by popularity.
Aim for result like this:
film_id | title | count(rental.rental_id) |
418 | HOBBIT ALIEN | 31 |
15 | ALIEN CENTER | 22 |
223 | DESIRE ALIEN | 8 |
SELECT film.film_id, film.title, COUNT(rental.rental_id) FROM sakila.film JOIN inventory ON film.film_id = inventory.film_id JOIN rental ON inventory.inventory_id = rental.inventory_id WHERE film.title LIKE '%alien%' GROUP BY film_id , title ORDER BY COUNT(rental.rental_id) DESC;