{"id":25590,"date":"2021-03-08T07:14:00","date_gmt":"2021-03-08T01:44:00","guid":{"rendered":"https:\/\/www.mygreatlearning.com\/blog\/understanding-categorical-data\/"},"modified":"2025-12-01T07:28:19","modified_gmt":"2025-12-01T01:58:19","slug":"understanding-categorical-data","status":"publish","type":"post","link":"https:\/\/www.mygreatlearning.com\/blog\/understanding-categorical-data\/","title":{"rendered":"Understanding Categorical Data"},"content":{"rendered":"\n<p>You are likely here because your dataset is full of text labels (like cities, colors, or product names), and your <a href=\"https:\/\/www.mygreatlearning.com\/blog\/machine-learning-models\/\">machine learning model<\/a> is throwing errors because it only understands numbers.<\/p>\n\n\n\n<p>Before we fix your model errors, you need to understand what you are actually feeding it. In the world of analytics, <strong>Data is simply recorded reality.<\/strong><\/p>\n\n\n\n<p>However, machines process reality in two very different languages:<\/p>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>Quantitative Data (Numbers):<\/strong> This is language machines speak natively. \"Temperature is 25 \u00b0C\" or \"Salary is $50,000.\"<\/li>\n\n\n\n<li><strong>Qualitative Data (Categories):<\/strong> This is the language humans speak. \"The car is Red,\" \"The city is London,\" or \"The customer is Satisfied.\"<\/li>\n<\/ol>\n\n\n\n<p><strong>Here is the conflict:<\/strong> You cannot feed a word into a mathematical equation. You cannot calculate 'Red \u00d7 5'. This is where <strong>Categorical Data<\/strong> becomes the most critical, and frustrating, part of the data science pipeline. If you don't translate it correctly, your analysis fails.<\/p>\n\n\n\n<p>You can Read More about <a href=\"https:\/\/www.mygreatlearning.com\/blog\/types-of-data\/\">Types of Data<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"what-is-categorical-data-the-basics\">What is Categorical Data? (The Basics)<\/h2>\n\n\n\n<p>Categorical data represents groups or qualities. Unlike numerical data (height, weight, salary), you cannot do math on it directly. You cannot subtract \"New York\" from \"London.\"<\/p>\n\n\n\n<p>There are only two types you need to care about. Knowing the difference changes how you process them.<\/p>\n\n\n<figure class=\"wp-block-image aligncenter size-full zoomable\" data-full=\"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2021\/03\/nominal-ordinal-data.png\"><img decoding=\"async\" width=\"699\" height=\"480\" src=\"http:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2021\/03\/nominal-ordinal-data.png\" alt=\"Image of nominal vs ordinal data examples\" class=\"wp-image-113790\" srcset=\"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2021\/03\/nominal-ordinal-data.png 699w, https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2021\/03\/nominal-ordinal-data-300x206.png 300w, https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2021\/03\/nominal-ordinal-data-150x103.png 150w\" sizes=\"(max-width: 699px) 100vw, 699px\" \/><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"a-nominal-data-no-order\">A. Nominal Data (No Order)<\/h4>\n\n\n\n<p>These are variables with no natural ranking.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Examples:<\/strong>\u00a0Colors (Red, Blue, Green), Cities (Paris, Tokyo, Delhi), Payment Method (Card, Cash).<\/li>\n\n\n\n<li><strong>The Trap:<\/strong>\u00a0If you assign numbers like 1, 2, 3 to these, the machine thinks 3 is \"greater\" than 1. It thinks \"Green\" is three times better than \"Red.\"\u00a0<strong>This destroys model accuracy.<\/strong><\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"b-ordinal-data-ordered\">B. Ordinal Data (Ordered)<\/h4>\n\n\n\n<p>These have a clear, logical rank.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Examples:<\/strong>\u00a0T-Shirt Size (S, M, L, XL), Education Level (High School, Bachelor, PhD), Satisfaction (Low, Medium, High).<\/li>\n\n\n\n<li><strong>The Strategy:<\/strong>\u00a0The order matters here. You\u00a0<em>want<\/em>\u00a0the model to know that PhD > High School.<\/li>\n<\/ul>\n\n\n\n    <div class=\"courses-cta-container\">\n        <div class=\"courses-cta-card\">\n            <div class=\"courses-cta-header\">\n                <div class=\"courses-learn-icon\"><\/div>\n                <span class=\"courses-learn-text\">Academy Pro<\/span>\n            <\/div>\n            <p class=\"courses-cta-title\">\n                <a href=\"https:\/\/www.mygreatlearning.com\/academy\/premium\/master-data-science-machine-learning-in-python\" class=\"courses-cta-title-link\">Master Python for ML &amp; Data Science<\/a>\n            <\/p>\n            <p class=\"courses-cta-description\">Learn Python for data science and machine learning to unlock endless opportunities. Its ease of use and powerful libraries help you transform data into insights &amp; build intelligent systems seamlessly.<\/p>\n            <div class=\"courses-cta-stats\">\n                <div class=\"courses-stat-item\">\n                    <div class=\"courses-stat-icon courses-user-icon\"><\/div>\n                    <span>17 Hrs<\/span>\n                <\/div>\n                <div class=\"courses-stat-item\">\n                    <div class=\"courses-stat-icon courses-star-icon\"><\/div>\n                    <span>136 Coding Exercises<\/span>\n                <\/div>\n            <\/div>\n            <a href=\"https:\/\/www.mygreatlearning.com\/academy\/premium\/master-data-science-machine-learning-in-python\" class=\"courses-cta-button\">\n                Enroll Now\n                <div class=\"courses-arrow-icon\"><\/div>\n            <\/a>\n        <\/div>\n    <\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"how-to-translate-it-for-machines-encoding\">How to \"Translate\" It for Machines (Encoding)<\/h2>\n\n\n\n<p>We call the translation process\u00a0<strong><a href=\"https:\/\/www.mygreatlearning.com\/blog\/label-encoding-in-python\/\">Encoding<\/a><\/strong>. Here are the three industry-standard methods we use in real-world pipelines.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"method-1-one-hot-encoding-ohe\">Method 1: One-Hot Encoding (OHE)<\/h4>\n\n\n\n<p>This is the most common method for&nbsp;<strong>Nominal Data<\/strong>. You create a new column for every unique category and fill it with 0s and 1s.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>How it works:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Original:\u00a0<code>[\"Red\", \"Blue\"]<\/code><\/li>\n\n\n\n<li>New Column \"Is_Red\":\u00a0<code>[1, 0]<\/code><\/li>\n\n\n\n<li>New Column \"Is_Blue\":\u00a0<code>[0, 1]<\/code><\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>When to use:<\/strong>\u00a0When you have a small number of categories (e.g., Gender, Payment Type).<\/li>\n\n\n\n<li><strong>The Pain Point:<\/strong>\u00a0If you have a column like \"Zip Code\" with 5,000 unique codes, OHE will create 5,000 new columns. This creates a massive, slow dataset (called the \"<a href=\"https:\/\/www.mygreatlearning.com\/blog\/understanding-curse-of-dimensionality\/\">Curse of Dimensionality<\/a>\").<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"method-2-label-ordinal-encoding\">Method 2: Label \/ Ordinal Encoding<\/h4>\n\n\n\n<p>This assigns an integer to each category.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>How it works:<\/strong>\u00a0Small = 1, Medium = 2, Large = 3.<\/li>\n\n\n\n<li><strong>When to use:<\/strong>\u00a0Strictly for\u00a0<strong>Ordinal Data<\/strong>\u00a0where rank matters.<\/li>\n\n\n\n<li><strong>Warning:<\/strong>\u00a0Do not use this for Nominal data (like cities), or your model will find patterns that don't exist.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"method-3-target-encoding-the-experts-choice\">Method 3: Target Encoding (The Expert's Choice)<\/h4>\n\n\n\n<p>This is what we use when One-Hot Encoding creates too many columns.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>How it works:<\/strong>\u00a0You replace the category name with the\u00a0<em>average value of the target<\/em>\u00a0(what you are predicting) for that category.\n<ul class=\"wp-block-list\">\n<li><em>Example:<\/em>\u00a0If you are predicting House Price, and \"Neighborhood A\" has an average house price of $500k, you replace \"Neighborhood A\" with the number 500,000.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Benefit:<\/strong>\u00a0It keeps your dataset small.<\/li>\n\n\n\n<li><strong>Risk:<\/strong>\u00a0It can lead to \"overfitting\" (the model memorizes the data). We solve this by adding slight random noise or using \"smoothing.\"<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"the-hidden-problems-no-one-tells-you-about\">The \"Hidden\" Problems No One Tells You About<\/h2>\n\n\n\n<p>Tutorials usually stop at One-Hot Encoding. In the real world, that is where the problems start.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"the-high-cardinality-nightmare\">The \"High Cardinality\" Nightmare<\/h4>\n\n\n\n<p>\"Cardinality\" just means the number of unique categories.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Scenario:<\/strong>\u00a0You have a \"User_ID\" column or \"Product_ID\" with 1 million unique values.<\/li>\n\n\n\n<li><strong>Solution:<\/strong>\u00a0Do not encode these. Usually, unique IDs are noise and should be dropped. If the ID contains info (like a prefix\u00a0<code>US-123<\/code>), extract the prefix (<code>US<\/code>) and encode that instead.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"the-rare-label-issue\">The \"Rare Label\" Issue<\/h4>\n\n\n\n<p>Some categories appear only once or twice (e.g., a specific typo like \"Calfornia\" instead of \"California\").<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>The Fix:<\/strong>\u00a0Before encoding, group any category that appears less than 1-2% of the time into a new category called \"Other.\" This stabilizes your model.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"the-new-data-crash\">The \"New Data\" Crash<\/h4>\n\n\n\n<p>You train your model on data containing cities \"A\" and \"B\". In production, a user enters city \"C\".<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>The Crash:<\/strong>\u00a0Your code will fail because it doesn't know how to encode \"C\".<\/li>\n\n\n\n<li><strong>The Fix:<\/strong>\u00a0Always have a handle_unknown='ignore' setting in your code (available in Scikit-Learn) or strictly treat unseen categories as \"Other.\"<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"comparison-table-which-to-choose\">Comparison Table: Which to Choose?<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Data Type<\/th><th>Cardinality (Count)<\/th><th>Recommended Method<\/th><th>Python Tool<\/th><\/tr><\/thead><tbody><tr><td><strong>Nominal<\/strong><\/td><td>Low (&lt;10)<\/td><td>One-Hot Encoding<\/td><td><code>pd.get_dummies<\/code>&nbsp;or&nbsp;<code>OneHotEncoder<\/code><\/td><\/tr><tr><td><strong>Nominal<\/strong><\/td><td>High (&gt;100)<\/td><td>Target Encoding \/ Frequency Encoding<\/td><td><code>category_encoders<\/code>&nbsp;library<\/td><\/tr><tr><td><strong>Ordinal<\/strong><\/td><td>Any<\/td><td>Ordinal Encoding<\/td><td><code>OrdinalEncoder<\/code><\/td><\/tr><tr><td><strong>Tree Models<\/strong><\/td><td>High<\/td><td>Native Support<\/td><td><code>CatBoost<\/code>&nbsp;\/&nbsp;<code>LightGBM<\/code><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"summary-for-your-workflow\">Summary for your Workflow<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Identify:<\/strong>\u00a0Is it Nominal (Names) or Ordinal (Ranks)?<\/li>\n\n\n\n<li><strong>Clean:<\/strong>\u00a0Group rare categories into \"Other.\"<\/li>\n\n\n\n<li><strong>Choose:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Use\u00a0<strong>One-Hot<\/strong>\u00a0for nominal data with few options.<\/li>\n\n\n\n<li>Use\u00a0<strong>Ordinal Encoding<\/strong>\u00a0for ranked data.<\/li>\n\n\n\n<li>Use\u00a0<strong>Target Encoding<\/strong>\u00a0if you have hundreds of categories.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Automate:<\/strong>\u00a0Use pipelines (like Scikit-Learn\u00a0<code>ColumnTransformer<\/code>) so your production code handles new data automatically.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"faqs\">FAQs<\/h2>\n\n\n\n<p>These are the edge cases that tutorials rarely cover, but we encounter frequently in production environments.<\/p>\n\n\n\n<p><strong>Q1: Can data look like a number but act like a category?<\/strong><\/p>\n\n\n\n<p><strong>A:<\/strong> Yes, and this is a common trap. <strong>Zip Codes<\/strong> (e.g., 90210) and <strong>Phone Area Codes<\/strong> look like numbers, but they are Nominal Categories. You cannot add two zip codes to get a \"better\" zip code. Always check if the math makes sense; if adding the numbers is meaningless, treat it as categorical.<\/p>\n\n\n\n<p><strong>Q2: How do I handle \"Missing Values\" in categorical data?<\/strong><\/p>\n\n\n\n<p><strong>A:<\/strong> You cannot fill missing text with an \"average\" (mean) like you do with numbers. You have two options:<\/p>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>Imputation:<\/strong> Fill it with the <strong>Mode<\/strong> (the most frequent category).<\/li>\n\n\n\n<li><strong>Explicit Labeling:<\/strong> Fill it with a new category called \"Unknown.\" In many real-world scenarios, the fact that data is missing is actually a signal in itself (e.g., a user hiding their income bracket).<\/li>\n<\/ol>\n\n\n\n<p><strong>Q3: What if I am using Deep Learning (<a href=\"https:\/\/www.mygreatlearning.com\/blog\/types-of-neural-networks\/\">Neural Networks<\/a>)?<\/strong><\/p>\n\n\n\n<p><strong>A:<\/strong> Deep Learning uses a specialized technique called <strong>Entity Embeddings<\/strong>. Instead of simple 0s and 1s, the network learns a multi-dimensional vector representation for each category. It can learn that \"Paris\" and \"London\" are similar (both European capitals) while \"Tokyo\" is different, purely based on the data context. This is far more powerful than standard encoding.<\/p>\n\n\n\n<p><strong>Q4: Does the specific algorithm I choose change how I handle categories?<\/strong><\/p>\n\n\n\n<p><strong>A:<\/strong> Absolutely.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong><a href=\"https:\/\/www.mygreatlearning.com\/blog\/linear-regression-in-machine-learning\/\">Linear Regression <\/a>\/ Logistic Regression:<\/strong> Strict requirements. You <em>must<\/em> encode to numbers (usually One-Hot) and avoid the \"Dummy Variable Trap\" (multicollinearity).<\/li>\n\n\n\n<li><strong>Tree-Based Models (<a href=\"https:\/\/www.mygreatlearning.com\/blog\/random-forest-algorithm\/\">Random Forest<\/a>, <a href=\"https:\/\/www.mygreatlearning.com\/blog\/xgboost-algorithm\/\">XGBoost<\/a>):<\/strong> More forgiving.<\/li>\n\n\n\n<li><strong>CatBoost \/ LightGBM:<\/strong> These modern algorithms can handle text categories natively without you doing <em>any<\/em> manual encoding. They often outperform manual methods because they calculate statistics on the categories during training.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Learn how to encode categorical data for machine learning correctly. This guide covers nominal vs ordinal, one-hot, target encoding, and real-world encoding issues.<\/p>\n","protected":false},"author":41,"featured_media":113791,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_uag_custom_page_level_css":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"set","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[9],"tags":[],"content_type":[],"class_list":["post-25590","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.3 (Yoast SEO v27.3) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Understanding Categorical Data<\/title>\n<meta name=\"description\" content=\"Learn how to encode categorical data for machine learning correctly. This guide covers nominal vs ordinal, one-hot, target encoding, and real-world encoding issues.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.mygreatlearning.com\/blog\/understanding-categorical-data\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Understanding Categorical Data\" \/>\n<meta property=\"og:description\" content=\"Learn how to encode categorical data for machine learning correctly. This guide covers nominal vs ordinal, one-hot, target encoding, and real-world encoding issues.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.mygreatlearning.com\/blog\/understanding-categorical-data\/\" \/>\n<meta property=\"og:site_name\" content=\"Great Learning Blog: Free Resources what Matters to shape your Career!\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/GreatLearningOfficial\/\" \/>\n<meta property=\"article:published_time\" content=\"2021-03-08T01:44:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-12-01T01:58:19+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2021\/03\/categorical-data.webp\" \/>\n\t<meta property=\"og:image:width\" content=\"1408\" \/>\n\t<meta property=\"og:image:height\" content=\"768\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/webp\" \/>\n<meta name=\"author\" content=\"Great Learning Editorial Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@https:\/\/twitter.com\/Great_Learning\" \/>\n<meta name=\"twitter:site\" content=\"@Great_Learning\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Great Learning Editorial Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/understanding-categorical-data\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/understanding-categorical-data\\\/\"},\"author\":{\"name\":\"Great Learning Editorial Team\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#\\\/schema\\\/person\\\/6f993d1be4c584a335951e836f2656ad\"},\"headline\":\"Understanding Categorical Data\",\"datePublished\":\"2021-03-08T01:44:00+00:00\",\"dateModified\":\"2025-12-01T01:58:19+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/understanding-categorical-data\\\/\"},\"wordCount\":1132,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/understanding-categorical-data\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/categorical-data.webp\",\"articleSection\":[\"Data Science and Analytics\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/understanding-categorical-data\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/understanding-categorical-data\\\/\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/understanding-categorical-data\\\/\",\"name\":\"Understanding Categorical Data\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/understanding-categorical-data\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/understanding-categorical-data\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/categorical-data.webp\",\"datePublished\":\"2021-03-08T01:44:00+00:00\",\"dateModified\":\"2025-12-01T01:58:19+00:00\",\"description\":\"Learn how to encode categorical data for machine learning correctly. This guide covers nominal vs ordinal, one-hot, target encoding, and real-world encoding issues.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/understanding-categorical-data\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/understanding-categorical-data\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/understanding-categorical-data\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/categorical-data.webp\",\"contentUrl\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/categorical-data.webp\",\"width\":1408,\"height\":768,\"caption\":\"Categorical Data\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/understanding-categorical-data\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Blog\",\"item\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Data Science and Analytics\",\"item\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/data-science\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Understanding Categorical Data\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/\",\"name\":\"Great Learning Blog\",\"description\":\"Learn, Upskill &amp; Career Development Guide and Resources\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#organization\"},\"alternateName\":\"Great Learning\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#organization\",\"name\":\"Great Learning\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/GL-Logo.jpg\",\"contentUrl\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/GL-Logo.jpg\",\"width\":900,\"height\":900,\"caption\":\"Great Learning\"},\"image\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/GreatLearningOfficial\\\/\",\"https:\\\/\\\/x.com\\\/Great_Learning\",\"https:\\\/\\\/www.instagram.com\\\/greatlearningofficial\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/school\\\/great-learning\\\/\",\"https:\\\/\\\/in.pinterest.com\\\/greatlearning12\\\/\",\"https:\\\/\\\/www.youtube.com\\\/user\\\/beaconelearning\\\/\"],\"description\":\"Great Learning is a leading global ed-tech company for professional training and higher education. It offers comprehensive, industry-relevant, hands-on learning programs across various business, technology, and interdisciplinary domains driving the digital economy. These programs are developed and offered in collaboration with the world's foremost academic institutions.\",\"email\":\"info@mygreatlearning.com\",\"legalName\":\"Great Learning Education Services Pvt. Ltd\",\"foundingDate\":\"2013-11-29\",\"numberOfEmployees\":{\"@type\":\"QuantitativeValue\",\"minValue\":\"1001\",\"maxValue\":\"5000\"}},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#\\\/schema\\\/person\\\/6f993d1be4c584a335951e836f2656ad\",\"name\":\"Great Learning Editorial Team\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/02\\\/unnamed.webp\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/02\\\/unnamed.webp\",\"contentUrl\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/02\\\/unnamed.webp\",\"caption\":\"Great Learning Editorial Team\"},\"description\":\"The Great Learning Editorial Staff includes a dynamic team of subject matter experts, instructors, and education professionals who combine their deep industry knowledge with innovative teaching methods. Their mission is to provide learners with the skills and insights needed to excel in their careers, whether through upskilling, reskilling, or transitioning into new fields.\",\"sameAs\":[\"https:\\\/\\\/www.mygreatlearning.com\\\/\",\"https:\\\/\\\/in.linkedin.com\\\/school\\\/great-learning\\\/\",\"https:\\\/\\\/x.com\\\/https:\\\/\\\/twitter.com\\\/Great_Learning\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UCObs0kLIrDjX2LLSybqNaEA\"],\"award\":[\"Best EdTech Company of the Year 2024\",\"Education Economictimes Outstanding Education\\\/Edtech Solution Provider of the Year 2024\",\"Leading E-learning Platform 2024\"],\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/author\\\/greatlearning\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Understanding Categorical Data","description":"Learn how to encode categorical data for machine learning correctly. This guide covers nominal vs ordinal, one-hot, target encoding, and real-world encoding issues.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.mygreatlearning.com\/blog\/understanding-categorical-data\/","og_locale":"en_US","og_type":"article","og_title":"Understanding Categorical Data","og_description":"Learn how to encode categorical data for machine learning correctly. This guide covers nominal vs ordinal, one-hot, target encoding, and real-world encoding issues.","og_url":"https:\/\/www.mygreatlearning.com\/blog\/understanding-categorical-data\/","og_site_name":"Great Learning Blog: Free Resources what Matters to shape your Career!","article_publisher":"https:\/\/www.facebook.com\/GreatLearningOfficial\/","article_published_time":"2021-03-08T01:44:00+00:00","article_modified_time":"2025-12-01T01:58:19+00:00","og_image":[{"width":1408,"height":768,"url":"http:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2021\/03\/categorical-data.webp","type":"image\/webp"}],"author":"Great Learning Editorial Team","twitter_card":"summary_large_image","twitter_creator":"@https:\/\/twitter.com\/Great_Learning","twitter_site":"@Great_Learning","twitter_misc":{"Written by":"Great Learning Editorial Team","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.mygreatlearning.com\/blog\/understanding-categorical-data\/#article","isPartOf":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/understanding-categorical-data\/"},"author":{"name":"Great Learning Editorial Team","@id":"https:\/\/www.mygreatlearning.com\/blog\/#\/schema\/person\/6f993d1be4c584a335951e836f2656ad"},"headline":"Understanding Categorical Data","datePublished":"2021-03-08T01:44:00+00:00","dateModified":"2025-12-01T01:58:19+00:00","mainEntityOfPage":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/understanding-categorical-data\/"},"wordCount":1132,"commentCount":0,"publisher":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/understanding-categorical-data\/#primaryimage"},"thumbnailUrl":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2021\/03\/categorical-data.webp","articleSection":["Data Science and Analytics"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.mygreatlearning.com\/blog\/understanding-categorical-data\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.mygreatlearning.com\/blog\/understanding-categorical-data\/","url":"https:\/\/www.mygreatlearning.com\/blog\/understanding-categorical-data\/","name":"Understanding Categorical Data","isPartOf":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/understanding-categorical-data\/#primaryimage"},"image":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/understanding-categorical-data\/#primaryimage"},"thumbnailUrl":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2021\/03\/categorical-data.webp","datePublished":"2021-03-08T01:44:00+00:00","dateModified":"2025-12-01T01:58:19+00:00","description":"Learn how to encode categorical data for machine learning correctly. This guide covers nominal vs ordinal, one-hot, target encoding, and real-world encoding issues.","breadcrumb":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/understanding-categorical-data\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.mygreatlearning.com\/blog\/understanding-categorical-data\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.mygreatlearning.com\/blog\/understanding-categorical-data\/#primaryimage","url":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2021\/03\/categorical-data.webp","contentUrl":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2021\/03\/categorical-data.webp","width":1408,"height":768,"caption":"Categorical Data"},{"@type":"BreadcrumbList","@id":"https:\/\/www.mygreatlearning.com\/blog\/understanding-categorical-data\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog","item":"https:\/\/www.mygreatlearning.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Data Science and Analytics","item":"https:\/\/www.mygreatlearning.com\/blog\/data-science\/"},{"@type":"ListItem","position":3,"name":"Understanding Categorical Data"}]},{"@type":"WebSite","@id":"https:\/\/www.mygreatlearning.com\/blog\/#website","url":"https:\/\/www.mygreatlearning.com\/blog\/","name":"Great Learning Blog","description":"Learn, Upskill &amp; Career Development Guide and Resources","publisher":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/#organization"},"alternateName":"Great Learning","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.mygreatlearning.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.mygreatlearning.com\/blog\/#organization","name":"Great Learning","url":"https:\/\/www.mygreatlearning.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.mygreatlearning.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2022\/06\/GL-Logo.jpg","contentUrl":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2022\/06\/GL-Logo.jpg","width":900,"height":900,"caption":"Great Learning"},"image":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/GreatLearningOfficial\/","https:\/\/x.com\/Great_Learning","https:\/\/www.instagram.com\/greatlearningofficial\/","https:\/\/www.linkedin.com\/school\/great-learning\/","https:\/\/in.pinterest.com\/greatlearning12\/","https:\/\/www.youtube.com\/user\/beaconelearning\/"],"description":"Great Learning is a leading global ed-tech company for professional training and higher education. It offers comprehensive, industry-relevant, hands-on learning programs across various business, technology, and interdisciplinary domains driving the digital economy. These programs are developed and offered in collaboration with the world's foremost academic institutions.","email":"info@mygreatlearning.com","legalName":"Great Learning Education Services Pvt. Ltd","foundingDate":"2013-11-29","numberOfEmployees":{"@type":"QuantitativeValue","minValue":"1001","maxValue":"5000"}},{"@type":"Person","@id":"https:\/\/www.mygreatlearning.com\/blog\/#\/schema\/person\/6f993d1be4c584a335951e836f2656ad","name":"Great Learning Editorial Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2022\/02\/unnamed.webp","url":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2022\/02\/unnamed.webp","contentUrl":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2022\/02\/unnamed.webp","caption":"Great Learning Editorial Team"},"description":"The Great Learning Editorial Staff includes a dynamic team of subject matter experts, instructors, and education professionals who combine their deep industry knowledge with innovative teaching methods. Their mission is to provide learners with the skills and insights needed to excel in their careers, whether through upskilling, reskilling, or transitioning into new fields.","sameAs":["https:\/\/www.mygreatlearning.com\/","https:\/\/in.linkedin.com\/school\/great-learning\/","https:\/\/x.com\/https:\/\/twitter.com\/Great_Learning","https:\/\/www.youtube.com\/channel\/UCObs0kLIrDjX2LLSybqNaEA"],"award":["Best EdTech Company of the Year 2024","Education Economictimes Outstanding Education\/Edtech Solution Provider of the Year 2024","Leading E-learning Platform 2024"],"url":"https:\/\/www.mygreatlearning.com\/blog\/author\/greatlearning\/"}]}},"uagb_featured_image_src":{"full":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2021\/03\/categorical-data.webp",1408,768,false],"thumbnail":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2021\/03\/categorical-data-150x150.webp",150,150,true],"medium":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2021\/03\/categorical-data-300x164.webp",300,164,true],"medium_large":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2021\/03\/categorical-data-768x419.webp",768,419,true],"large":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2021\/03\/categorical-data-1024x559.webp",1024,559,true],"1536x1536":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2021\/03\/categorical-data.webp",1408,768,false],"2048x2048":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2021\/03\/categorical-data.webp",1408,768,false],"web-stories-poster-portrait":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2021\/03\/categorical-data-640x768.webp",640,768,true],"web-stories-publisher-logo":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2021\/03\/categorical-data-96x96.webp",96,96,true],"web-stories-thumbnail":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2021\/03\/categorical-data-150x82.webp",150,82,true]},"uagb_author_info":{"display_name":"Great Learning Editorial Team","author_link":"https:\/\/www.mygreatlearning.com\/blog\/author\/greatlearning\/"},"uagb_comment_info":0,"uagb_excerpt":"Learn how to encode categorical data for machine learning correctly. This guide covers nominal vs ordinal, one-hot, target encoding, and real-world encoding issues.","_links":{"self":[{"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/posts\/25590","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/users\/41"}],"replies":[{"embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/comments?post=25590"}],"version-history":[{"count":9,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/posts\/25590\/revisions"}],"predecessor-version":[{"id":113792,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/posts\/25590\/revisions\/113792"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/media\/113791"}],"wp:attachment":[{"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/media?parent=25590"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/categories?post=25590"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/tags?post=25590"},{"taxonomy":"content_type","embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/content_type?post=25590"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}