top of page

Super Market Sales Analysis - MySQL Project

I generated a database for sales data of a supermarket in order to practice determining KPI's using MySQL queries.


See the Creation Queries.sql or sales.csv to view or download the data used for this analysis, those are provided on my Kaggle profile here.


My analysis is separated into 3 categories of analysis style:

  • Sales and Revenue Analysis

  • Customer Behavior Analysis

  • Product Performance Analysis


Sales and Revenue Analysis

This analysis includes queries for the following categories:

  • Total Revenue

    • The total amount of money earned from sales.

  • Average Basket Size

    • The average amount spent by customers per transaction.


Total Revenue

First I want to find the total revenue by city and branch for this fictional super market.

To find the total revenue by city, I select the city and total_price columns, then I group by the city column. I then repeat this same convention for finding the total revenue by branch, subbing out the city column with the branch column.

SELECT 
    city,
    ROUND(SUM(total_price),0) AS total_revenue_per_location
FROM Sales
GROUP BY city;

# Findings 
# Total Revenue for New York = 40,227
# Total Revenue for Los Angeles = 35,772
# Total Revenue for Chicago = 42,585
SELECT 
    branch,
    ROUND(SUM(total_price),0) AS total_revenue_per_branch
FROM Sales
GROUP BY branch;

# Findings 
# Total Revenue for Branch A = 82,812
# Total Revenue for Branch B = 35,772
Average Customer Basket Size

Now I want to find the average customer basket size per transaction for each city and each branch. To find the average customer basket size by city, I select the city column and the quantity column and the group by the city column. I then repeat this same convention for finding the total revenue by branch, subbing out the city column with the branch column.

SELECT 
    city,
    ROUND(AVG(quantity),1) AS average_quantity_per_location
FROM Sales
GROUP BY city;

# Findings
# Average basket size per customer in New York = 9.7
# Average basket size per customer in Los Angeles = 10.3
# Average basket size per customer in Chicago = 11.0
SELECT 
    branch,
    ROUND(AVG(quantity),1) AS average_quantity_per_branch
FROM Sales
GROUP BY branch;

# Findings
# Average basket size per customer in Branch A = 10.4
# Average basket size per customer in Branch B = 10.3
Revenue and Sales Analysis Conclusions
  • Chicago is the best performing store by total revenue and also has the highest average items per customer.

  • Despite being the only store in Branch B the Los Angeles location has around the same average basket size as both stores in Branch A.

    • This finding shows, that if the company was thinking of expanding to two stores in Branch B then they can expect a slightly below average performing story.


Customer Behavior Analysis

This analysis includes queries for the following categories:

  • Customer Footfall

    • Query to find the number of customers visiting each store.

  • Loyalty

    • Query to find the percentage of customers that are members for each store.

  • Gender Bias

    • Queries to find the percentage split of Male to Female customers.

  • Big Spenders

    • Query to find the total number of transactions over $100 per store.

Customer Footfall

I want to find the amount of customer traffic for each location. To find the customer traffic for each location, I select the city column and the count of sales and then group by the city column.

SELECT 
    city,
    COUNT(DISTINCT sale_id) AS total_customers
FROM Sales
GROUP BY city;

# Findings
# Total Customers for Chicago location = 330
# Total Customers for Los Angeles location = 326
# Total Customers for New York location = 344

Loyalty

The goal of this query is to find the percentage of customers that are part of the loyalty program per location. I do this by grabbing the city column and then use a CASE statement to parse whether each row is a member (1) or not a member (0) and then divide that by the count of all rows multiplied by 100. Then I group by the city column.

SELECT 
    city,
    ROUND((SUM(CASE WHEN customer_type = 'Member' THEN 1 ELSE 0 END) / COUNT(*)) * 100, 1) AS percentage_members
FROM Sales
GROUP BY city;

# Findings
# Percentage of Customers that are Members for New York = 51.5%
# Percentage of Customers that are Members for Los Angeles = 49.4%
# Percentage of Customers that are Members for Chicago = 53.9%

Gender Bias

For this query, I want to find the male and female split for each location. I do this by grabbing the city column, and then use a Case statement to parse when a row has 'male' in the gender column (1) or when the row has 'female' in the gender column (0). I then use this same method but vice versa for the percentage of females. Finally I group by the city column.

SELECT 
    city,
    ROUND((SUM(CASE WHEN gender = 'Male' THEN 1 ELSE 0 END) / COUNT(*)) * 100, 1) AS percentage_of_male_customers,
    ROUND((SUM(CASE WHEN gender = 'Female' THEN 1 ELSE 0 END) / COUNT(*)) * 100, 1) AS percentage_of_female_customers
FROM Sales
GROUP BY city;

# Findings
# Male vs Female Customers for New York location = 50% Male and 50% Female
# Male vs Female Customers for Los Angeles location = 52.8% Male and 47.2% Female
# Male vs Female Customers for Chicago location = 55.8% Male and 44.2% Female

Big Spenders

The Big Spenders query is to find the amount of transactions over $100 for each location. I do this by selecting the city column and the count of all rows, then in the WHERE clause I only look for rows with over 100 in the total_price column. Then I group by the city column.

SELECT 
    city,
    COUNT(*) AS orders_over_100_dollars
FROM Sales
WHERE total_price > 100
GROUP BY city;

# Findings
# Total Big Spender Transactions for Chicago location = 170
# Total Big Spender Transactions for Los Angeles location = 136
# Total Big Spender Transactions for New York location = 160

Customer Behavior Analysis Conclusions

New York Location Customer Behavior KPIs Conclusions
  • New York has the highest amount of customers out of all locations, but is behind Chicago in Membership Percentage and Big Spenders.

    • If I were to recommend improvements for New York, it would be to increase their Membership push, more Members = more returning customers.

Chicago Location Customer Behavior KPIs Conclusions
  • Chicago rates the highest in Membership Percentage and Big Spenders, but is second behind New York in total customers.

  • If I were to recommend improvements for Chicago, it would be to increase the amount of female centric products.

    • The numbers show that only 44.2% of customers are female, so the store can gain more female customers with these type of products. More female products = more total customers.

Los Angeles Location Customer Behavior KPIs Conclusions
  • Los Angeles is the lowest performing store, so there is tons of room for improvement.

  • Los Angeles biggest area of improvement is with the amount of Big Spenders.

  • Despite only having 4 less total customers than Chicago, Los Angeles has 24 less Big Spenders than Chicago.

    • To improve on this, Los Angeles could advertise more expensive items at the front of the store.

    • However I think the answer is to boost their Membership count. That way when Members need to buy one of these big ticket items, they would be more likely to choose this store so they can earn more rewards points.


Product Performance Analysis

This analysis includes queries for the following categories:

  • Top Selling Products

    • A query to show the top selling products by total sales for this company.

  • Top Selling Products Per Store

    • A query to show the top selling products by total sales for each store location of this company.

  • Top Selling Product Categories

    • A query to show the top selling product categories by total sales for this company.

  • Top Selling Product Categories Per Store

    • A query to show the top selling product categories by total sales for each store location of this company.

Top Selling Products

I want to find the top three selling products for all stores combined. To do this I select the product_name column and the sum of all the values in the total_price column. Then I group by the product_name column and order by the total_sales column descending and limit by 3 to get the top three selling products.

SELECT 
    product_name, 
    SUM(total_price) AS total_sales
FROM Sales
GROUP BY product_name
ORDER BY total_sales DESC LIMIT 3;

# Findings
# Highest sales contribution: Shampoo - $27,041.36
# 2nd Highest sales contribution: Notebook - $24,792.98
# 3rd Highest sales contribution: Orange Juice - $24,686.46
Top Selling Products by Store

For this query I want to find the top two selling products per location. To do this I select the city column, product_name column, and then get the sum of all values from the total_price column. Then I group by the city column and product_name column. Finally I used order by on the city column and total_sales column descending.

SELECT 
    city,
    product_name,
    SUM(total_price) AS total_sales
FROM Sales
GROUP BY city, product_name
ORDER BY city, total_sales DESC;

# Findings
# Top 2 Sales Contributors for Chicago
# Orange Juice - $10,123.48
# Shampoo - $10,112.73

# Top 2 Sales Contributors for Los Angeles
# Orange Juice - $8,564.29
# Notebook - $7,607.34

# Top 2 Sales Contributors for New York
# Shampoo - $9,879.68
# Notebook - $9,251.19
Top Selling Product Categories

I want to find the top three selling categories for all stores combined. To do this, I select the product_category column and the sum of all the values in the total_price column. Then I group by the product_category column and order by the total_sales column descending.

SELECT 
    product_category, 
    SUM(total_price) AS total_sales
FROM Sales
GROUP BY product_category
ORDER BY total_sales DESC;

# Findings 
# Top Category: Personal Care - $27,050.18
# 2nd Top Category: Fruits - $26,197.45
# 3rd Top Category: Beverages - $22,983.32
Top Selling Product Categories by Store

For this query I want to find the top selling products per location. To do this I select the city column, product_category column, and then the total_sales column. I select those columns from a sub-query that I will use to find the rank of the categories by total_price.


In the sub-query I select the city column, product_category column, the sum of all values in the total_price column, and then use the ROW_NUMBER() function. I use the ROW_NUMBER() function over a partition by city, and then order by the sum of the total_price column descending. This will give me the rank of the highest selling category by location. In the WHERE clause I find the category ranked 1st and order by location.

SELECT city, product_category, total_sales
FROM (
    SELECT 
        city, 
        product_category, 
        SUM(total_price) AS total_sales,
        ROW_NUMBER() OVER (PARTITION BY city ORDER BY SUM(total_price) DESC) AS category_rank
    FROM Sales
    GROUP BY city, product_category
) AS ranked
WHERE category_rank = 1
ORDER BY city;

# Findings
# Chicago highest contributing category is Stationary at $9,810.67
# Los Angeles highest contributing category is Fruits at $8,864.57
# New York highest contributing category is Personal Care at $10,175.68
Product Performance Analysis Conclusions
Overall Company Product Performance KPIs
  • Shampoo is the companies highest performing product and the highest performing category is Personal Care

    • To capitalize off of this, I suggest the company order conditioners, body wash, and possibly shower caps.

    • If customers are buying shampoo, they will likely need these similar products as well. This will bring return customers for this type of product.

    • It would be useful to pursue Personal Care products outside of bath wear, such as fragrances or dental products.

Product Performances KPIs by Store Locations
  • Orange Juice is the top product in both the Chicago and Los Angeles locations.

    • Given the above finding, I suggest expanding to other juice beverages, such as lemonade or grape juice.

    • If the company needs to test out new beverage products, these locations would be a good suitor for that testing.

  • Personal Care is the highest contributing category for the New York location.

    • If the company intends to expand their Personal Care products, the New York location would be a good suitor for that testing.

Comments


bottom of page