# Categorical

## Definition of Categorical

A categorical variable is a type of data that is represented by a list of values. These values can be either numbers or text, and they are usually organized into categories. Categorical variables are often used in data science to analyze the relationships between different factors.

## What is Categorical used for?

Categorical is a data type used in machine learning and data science that refers to values that are placed into specific categories. These categories can be determined from the data itself, or manually defined by the user. Categorical data can be further broken down into ordinal and nominal data types. Ordinal categorical data has an order associated with it, such as a rating scale from 1-10, whereas nominal categorical data does not have any order associated with it, such as gender or zip code.

Categorical data is used in many applications of machine learning and data science as it allows for more accurate modeling of complex relationships between variables, compared to continuous numerical values. For example, when predicting a customer’s risk profile, it would be important to include information about their age group and income level which are both categorical variables, rather than using just a single numerical value such as “age” or “income”. Similarly in natural language processing, words can be categorized into different classes by their part of speech (e.g., verb, noun). Categorical variables also provide more intuitive ways of understanding trends in the dataset compared to continuous numerical values. For example, looking at the average ages across countries may give us more insight than simply looking at the population figures for each country on its own.