Correlation and Regression Analysis
Join our community on Telegram!
Join the biggest community of Pharma students and professionals.
Correlation and regression are statistical techniques used to study relationships between variables. They help analysts understand how changes in one variable are associated with changes in another.
Correlation measures the strength and direction of the relationship between two numerical variables. The most common measure is the correlation coefficient, which ranges from -1 to +1.
| Correlation Value | Interpretation |
|---|---|
| +1 | Perfect positive relationship |
| 0 | No relationship |
| -1 | Perfect negative relationship |
In R, correlation can be calculated using the cor() function.
# Example data
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 6, 8, 10)
# Calculate correlation
cor(x, y)
Regression analysis is used to model the relationship between a dependent variable and one or more independent variables. It helps predict values and understand how variables influence each other.
The most common type is linear regression, which fits a straight line to the data.
# Create a linear regression model
model <- lm(y ~ x)
# View model summary
summary(model)
The output of the regression model includes important information such as coefficients, p-values, and the R-squared value. The coefficients show the relationship between variables, while the R-squared value indicates how well the model fits the data.
A regression line can also be visualized using ggplot2.
library(ggplot2)
ggplot(data = data.frame(x, y), aes(x = x, y = y)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE)
Correlation and regression analysis are essential tools in data science. They help identify relationships, build predictive models, and support data-driven decision-making.
