R Programming for Data Analysis and Pharmaceutical Research Tutorial

Factors and Categorical Data

❮ Previous Next ❯

In R, categorical data represents values that belong to a specific group or category rather than numeric measurements. Examples of categorical data include gender, colors, education levels, product types, or survey responses like “Yes” and “No.” To handle such data efficiently, R uses a special data type called a factor.

A factor is used to store categorical variables. Instead of treating categories as simple text, factors store them as levels. These levels represent the different possible categories within the data. For example, if you have a variable for size with values like “Small,” “Medium,” and “Large,” R stores these as factor levels.

Factors are created using the factor() function. For example, if you write size <- factor(c("Small", "Medium", "Large", "Small")), R creates a factor variable where the possible levels are “Large,” “Medium,” and “Small.” Internally, R stores these as numeric codes, but it displays the category names for better readability.

Factors can also be ordered when the categories have a meaningful sequence. For example, education levels such as “High School,” “Bachelor,” “Master,” and “PhD” have a natural order. In such cases, you can create an ordered factor so R understands the ranking between categories. This is useful in statistical analysis and modeling.

Below is a table showing common operations with factors:

Operation	Description	Example
Create Factor	Convert data into a factor	`factor(c("Male","Female","Male"))`
Check Levels	View all categories	`levels(gender)`
Count Categories	Count frequency of each level	`table(gender)`
Ordered Factor	Create factor with order	`factor(size, ordered=TRUE)`
Change Levels	Rename categories	`levels(gender) <- c("M","F")`

Factors are widely used in statistical modeling, data analysis, and visualization. Many statistical functions in R treat factors differently from numeric or character data because they represent categories. Using factors helps R understand the structure of categorical variables and produce more accurate results.

Understanding factors and categorical data is important because many real-world datasets contain categories such as gender, region, product type, or customer segment. Proper use of factors ensures correct analysis and meaningful interpretations in R.

❮ Previous Next ❯

Welcome Back

Create Account

R Programming for Data Analysis and Pharmaceutical Research Tutorial

R Programming for Data Analysis and Pharmaceutical Research Tutorial

Module 1: Introduction to R and RStudio

Module 2: R Programming Fundamentals

Module 3: Data Structures in R

Module 4: Data Import, Cleaning, and Preprocessing

Module 5: Data Manipulation with dplyr

Module 6: Data Visualization with ggplot2

Module 7: Statistical Analysis in R

Module 8: Working with Real-World Datasets

Module 9: Clinical and Pharmaceutical Data Analysis

Module 10: Reporting and Automation in R

Module 11: Advanced R Concepts and Packages

Module 12: Projects, Interview Preparation, and Job Readiness

Factors and Categorical Data

Join our community on Telegram!

Join the biggest community of Pharma students and professionals.