Welcome Back

Google icon Sign in with Google
OR
I agree to abide by Pharmadaily Terms of Service and its Privacy Policy

Create Account

Google icon Sign up with Google
OR
By signing up, you agree to our Terms of Service and Privacy Policy
Instagram
youtube
Facebook

Case Study: Sales or Healthcare Dataset

A case study helps apply the concepts of data analysis to a real-world dataset. In this example, we will explore a simple sales dataset to demonstrate how to perform basic exploratory data analysis, data manipulation, and visualization in R.

Suppose we have a sales dataset that contains information about products, categories, quantity sold, and revenue.

# Create a sample sales dataset
sales_data <- data.frame(
  product = c("A", "B", "C", "A", "B", "C", "A", "B"),
  category = c("Electronics", "Clothing", "Electronics", 
               "Clothing", "Electronics", "Clothing", 
               "Electronics", "Clothing"),
  quantity = c(10, 15, 8, 12, 20, 5, 18, 9),
  revenue = c(500, 300, 400, 250, 800, 150, 900, 200)
)

The first step is to inspect the structure and summary of the dataset.

str(sales_data)
summary(sales_data)

Next, we can calculate total revenue by category using dplyr.

library(dplyr)

sales_data %>%
  group_by(category) %>%
  summarise(total_revenue = sum(revenue))

We can also calculate the average quantity sold for each product.

sales_data %>%
  group_by(product) %>%
  summarise(avg_quantity = mean(quantity))

Visualization helps in understanding patterns in the data. For example, a bar chart can be used to show total revenue by category.

library(ggplot2)

ggplot(sales_data, aes(x = category, y = revenue, fill = category)) +
  geom_bar(stat = "identity") +
  theme_minimal()

A scatter plot can also be used to examine the relationship between quantity and revenue.

ggplot(sales_data, aes(x = quantity, y = revenue)) +
  geom_point(color = "blue") +
  theme_minimal()

This case study demonstrates how data can be explored, summarized, and visualized using R. The same workflow can be applied to other datasets, such as healthcare data, to analyze patient records, treatment outcomes, or resource utilization.