Descriptive Statistics
Join our community on Telegram!
Join the biggest community of Pharma students and professionals.
Descriptive statistics are used to summarize and describe the main features of a dataset. Instead of examining every individual data value, descriptive statistics provide simple numerical summaries that help in understanding the overall structure and behavior of the data.
Common descriptive statistics include measures of central tendency and measures of dispersion. Measures of central tendency describe the typical or average value in the dataset, while measures of dispersion show how spread out the data is.
In R, descriptive statistics can be calculated using built-in functions.
# Example numeric vector
data <- c(10, 15, 20, 25, 30)
The mean represents the average value of the dataset.
mean(data)
The median represents the middle value when the data is arranged in order.
median(data)
The mode represents the most frequently occurring value. R does not have a built-in mode function for statistical mode, but it can be calculated manually if needed.
mode_value <- names(sort(table(data), decreasing = TRUE))[1]
mode_value
Measures of dispersion help describe the spread of the data.
| Measure | Description | R Function |
|---|---|---|
| Range | Difference between the maximum and minimum values | range(data) |
| Variance | Average of squared deviations from the mean | var(data) |
| Standard Deviation | Square root of the variance | sd(data) |
Example calculations for measures of dispersion:
range(data)
var(data)
sd(data)
Descriptive statistics are an essential first step in data analysis. They help in understanding the distribution, central values, and variability of the data before applying more advanced statistical methods or models.
