Introduction to Machine Learning in R
Join our community on Telegram!
Join the biggest community of Pharma students and professionals.
Machine learning is a branch of data science that focuses on building models that can learn patterns from data and make predictions or decisions without being explicitly programmed for each task. It is widely used in fields such as healthcare, finance, marketing, and pharmaceutical research.
In R, machine learning can be performed using various packages that provide tools for data preparation, model building, and evaluation. Some commonly used packages include caret, randomForest, and e1071.
Machine learning problems are generally divided into two main categories.
| Type | Description | Example |
|---|---|---|
| Supervised Learning | Models are trained using labeled data | Predicting disease outcome |
| Unsupervised Learning | Models find patterns in unlabeled data | Customer segmentation |
Supervised learning includes tasks such as classification and regression. Classification predicts categories, while regression predicts numerical values.
| Task | Description | Example |
|---|---|---|
| Classification | Predicts a category or class | Spam or not spam |
| Regression | Predicts a continuous value | House price prediction |
Before building a machine learning model, the data must be prepared. This usually involves cleaning the data, handling missing values, and splitting the dataset into training and testing sets.
# Example dataset
data <- mtcars
# Split data into training and testing sets
set.seed(123)
train_index <- sample(1:nrow(data), 0.7 * nrow(data))
train_data <- data[train_index, ]
test_data <- data[-train_index, ]
A simple example of a linear regression model in R is shown below.
# Train model
model <- lm(mpg ~ wt + hp, data = train_data)
# View model summary
summary(model)
Predictions can then be made using the test dataset.
# Make predictions
predictions <- predict(model, newdata = test_data)
Machine learning in R provides powerful tools for building predictive models and discovering patterns in data. It is widely used in research, business analytics, and scientific studies to support data-driven decision-making.
