Welcome Back

Google icon Sign in with Google
OR
I agree to abide by Pharmadaily Terms of Service and its Privacy Policy

Create Account

Google icon Sign up with Google
OR
By signing up, you agree to our Terms of Service and Privacy Policy
Instagram
youtube
Facebook

Correlation and Regression Analysis

Correlation and regression are statistical techniques used to study relationships between variables. They help analysts understand how changes in one variable are associated with changes in another.

Correlation measures the strength and direction of the relationship between two numerical variables. The most common measure is the correlation coefficient, which ranges from -1 to +1.

Correlation Value Interpretation
+1 Perfect positive relationship
0 No relationship
-1 Perfect negative relationship

In R, correlation can be calculated using the cor() function.

# Example data
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 6, 8, 10)

# Calculate correlation
cor(x, y)

Regression analysis is used to model the relationship between a dependent variable and one or more independent variables. It helps predict values and understand how variables influence each other.

The most common type is linear regression, which fits a straight line to the data.

# Create a linear regression model
model <- lm(y ~ x)

# View model summary
summary(model)

The output of the regression model includes important information such as coefficients, p-values, and the R-squared value. The coefficients show the relationship between variables, while the R-squared value indicates how well the model fits the data.

A regression line can also be visualized using ggplot2.

library(ggplot2)

ggplot(data = data.frame(x, y), aes(x = x, y = y)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)

Correlation and regression analysis are essential tools in data science. They help identify relationships, build predictive models, and support data-driven decision-making.