Case Study: Sales or Healthcare Dataset
Join our community on Telegram!
Join the biggest community of Pharma students and professionals.
A case study helps apply the concepts of data analysis to a real-world dataset. In this example, we will explore a simple sales dataset to demonstrate how to perform basic exploratory data analysis, data manipulation, and visualization in R.
Suppose we have a sales dataset that contains information about products, categories, quantity sold, and revenue.
# Create a sample sales dataset
sales_data <- data.frame(
product = c("A", "B", "C", "A", "B", "C", "A", "B"),
category = c("Electronics", "Clothing", "Electronics",
"Clothing", "Electronics", "Clothing",
"Electronics", "Clothing"),
quantity = c(10, 15, 8, 12, 20, 5, 18, 9),
revenue = c(500, 300, 400, 250, 800, 150, 900, 200)
)
The first step is to inspect the structure and summary of the dataset.
str(sales_data)
summary(sales_data)
Next, we can calculate total revenue by category using dplyr.
library(dplyr)
sales_data %>%
group_by(category) %>%
summarise(total_revenue = sum(revenue))
We can also calculate the average quantity sold for each product.
sales_data %>%
group_by(product) %>%
summarise(avg_quantity = mean(quantity))
Visualization helps in understanding patterns in the data. For example, a bar chart can be used to show total revenue by category.
library(ggplot2)
ggplot(sales_data, aes(x = category, y = revenue, fill = category)) +
geom_bar(stat = "identity") +
theme_minimal()
A scatter plot can also be used to examine the relationship between quantity and revenue.
ggplot(sales_data, aes(x = quantity, y = revenue)) +
geom_point(color = "blue") +
theme_minimal()
This case study demonstrates how data can be explored, summarized, and visualized using R. The same workflow can be applied to other datasets, such as healthcare data, to analyze patient records, treatment outcomes, or resource utilization.
