{"id":22022,"date":"2020-11-11T13:51:28","date_gmt":"2020-11-11T08:21:28","guid":{"rendered":"https:\/\/www.mygreatlearning.com\/blog\/simplified-reinforcement-learning-q-learning\/"},"modified":"2024-10-24T19:21:07","modified_gmt":"2024-10-24T13:51:07","slug":"simplified-reinforcement-learning-q-learning","status":"publish","type":"post","link":"https:\/\/www.mygreatlearning.com\/blog\/simplified-reinforcement-learning-q-learning\/","title":{"rendered":"Simplified Reinforcement Learning: Q Learning"},"content":{"rendered":"\n<p>Q Learning, a model-free reinforcement learning algorithm, aims to learn the quality of actions and telling an agent what action is to be taken under which circumstance. Through the course of this blog, we will learn more about Q Learning, and it's learning process with the help of an example. <\/p>\n\n\n\n<p><em><strong>Contributed by: <a href=\"https:\/\/www.linkedin.com\/in\/rahul-purohit-3644a623\/\" target=\"_blank\" rel=\"nofollow noreferrer noopener\" aria-label=\"Rahul Purohit (opens in a new tab)\">Rahul Purohit<\/a> <\/strong><\/em><\/p>\n\n\n\n<p>Richard S. Sutton in his book \u201cReinforcement Learning \u2013 An Introduction\u201d considered as the Gold Standard, gives a very intuitive definition \u2013 \u201cReinforcement learning is learning what to do\u2014how to map situations to actions\u2014to maximize a numerical reward signal.\u201d The field of reinforcement learning (RL from now on) is not new. It was initiated as early as the 1960s (earlier referred to as \u201chedonistic\u201d learning system). Although it failed to gain popularity with Supervised Learning (SL), attracting a large group of researchers\u2019 interest. Only in the last decade or so, researchers have come to realize untapped potential RL possesses. DeepMind\u2019s AlphaGo, Alpha Zero, are some brilliant examples of the powers of RL, and it is just the beginning.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/p>\n\n\n\n<p>Also, this would be a good place to understand how RL is different from SL. SL put in simple words is learning from a set of examples provided by an external supervisor, each example being a description of a phenomenon (independent variables) with a result (dependent variable) associated with it. The objective of SL is to exploit the knowledge gained from training examples and use it to determine the result of unseen data. This works pretty well for most of the problems, except it fails in the case of Interactive problems (e.g. Games, robotic manoeuvres etc.) where gathering a set of examples that are representative and exhaustive is not feasible. This is where the RL systems come to rescue, RL systems can learn without a set of examples explicitly given by an external supervisor, and rather, the agent itself interacts with the environment and can figure out a combination of actions that leads to the desired outcome.&nbsp;<\/p>\n\n\n\n<p>Also Read: <a href=\"https:\/\/www.mygreatlearning.com\/blog\/what-is-machine-learning\/\">What is Machine Learning?<\/a><\/p>\n\n\n\n    <div class=\"courses-cta-container\">\n        <div class=\"courses-cta-card\">\n            <div class=\"courses-cta-header\">\n                <div class=\"courses-learn-icon\"><\/div>\n                <span class=\"courses-learn-text\">Advance Data Science with MIT<\/span>\n            <\/div>\n            <p class=\"courses-cta-title\">\n                <a href=\"https:\/\/idss-gl.mit.edu\/mit-idss-data-science-machine-learning-online-program\" class=\"courses-cta-title-link\">MIT Data Science and Machine Learning Course<\/a>\n            <\/p>\n            <p class=\"courses-cta-description\">Unlock the power of data. Build hands-on data science and machine learning skills to drive innovation in your career.<\/p>\n            <div class=\"courses-cta-stats\">\n                <div class=\"courses-stat-item\">\n                    <div class=\"courses-stat-icon courses-user-icon\"><\/div>\n                    <span>Duration: 12 weeks<\/span>\n                <\/div>\n                <div class=\"courses-stat-item\">\n                    <div class=\"courses-stat-icon courses-star-icon\"><\/div>\n                    <span>4.62\/5 Rating<\/span>\n                <\/div>\n            <\/div>\n            <a href=\"https:\/\/idss-gl.mit.edu\/mit-idss-data-science-machine-learning-online-program\" class=\"courses-cta-button\">\n                Discover the Program\n                <div class=\"courses-arrow-icon\"><\/div>\n            <\/a>\n        <\/div>\n    <\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"there-are-two-popular-learning-approaches\"><strong>There are two popular Learning approaches<\/strong><\/h2>\n\n\n\n<p><strong>1. Policy Based-&nbsp;<\/strong><\/p>\n\n\n\n<p>In this learning approach, a policy i.e. a function mapping each state to the best action is optimized. Once we have a well-defined policy, the agent can determine the best action to take by giving the current state as an input to the policy.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br><\/p>\n\n\n\n<p>We can further divide the policies in two types-&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Deterministic \u2013 <\/strong>A policy at a given state returns a unique action&nbsp;<\/li>\n<\/ul>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;S=(s) \u27a1 A= (a)<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Stochastic \u2013<\/strong> Instead of returning a unique action, it returns a probability distribution of actions at a given state.&nbsp;<\/li>\n<\/ul>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Policy \u27a1 p (A = a | S = s)&nbsp;<\/p>\n\n\n\n<p><strong>2. Value Based-&nbsp;<\/strong><\/p>\n\n\n\n<p>In value-based RL, the objective is to optimize a value function, a function (can be thought of as a simple Lookup table) which maps maximum future reward to a given state. The value of each state is the total amount of reward an RL agent can expect to receive until the fulfilment of goal.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"q-learning\"><strong>Q Learning <\/strong><\/h2>\n\n\n\n<p>Q Learning comes under Value-based learning algorithms. The objective is to optimize a value function suited to a given problem\/environment. The \u2018Q\u2019 stands for quality; it helps in finding the next action resulting in a state of the highest quality. This approach is rather simple and intuitive. It a very good place to start the RL journey. The values are stored in a table, called a Q Table.&nbsp;<\/p>\n\n\n\n<p>Let us devise a simple 2D game environment of size 4 x 4 and understand how Q- Learning can be used to arrive at the best solution.<\/p>\n\n\n\n<p><strong>Goal:<\/strong> Guide the kid to the Park<\/p>\n\n\n\n<p><strong>Reward System:<\/strong> <br>A. Get candy = +10 points<br>B. Encounter Dog = -50 points&nbsp;<br>C. Reach Park = +50 points&nbsp; <\/p>\n\n\n\n<p><strong>End of an Episode:<\/strong> <br>A. Encounter Dog&nbsp;<br>B. Reach Park&nbsp;<\/p>\n\n\n\n<p>Now let us see how a typical Q learning agent will play this game. First, let us create a Q- table where we will keep a track of all values associated with each state. The Q Table will have rows equal to the number of states in the problem i.e. 16 in our case, and the number of columns would be equal to the number of actions an agent can make which happens to be 4 (Up, Down, Left &amp; Right).<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ACTIONS&nbsp;<\/strong><br><strong>STATES&nbsp;<\/strong><\/td><td><strong>UP<\/strong><\/td><td><strong>DOWN&nbsp;<\/strong><\/td><td><strong>LEFT<\/strong><\/td><td><strong>RIGHT&nbsp;<\/strong><\/td><\/tr><tr><td><strong>1 (START)<\/strong><\/td><td>0<\/td><td>0<\/td><td>0<\/td><td>0<\/td><\/tr><tr><td><strong>2<\/strong><\/td><td>0<\/td><td>0<\/td><td>0<\/td><td>0<\/td><\/tr><tr><td><strong>\u2026\u2026<\/strong><\/td><td>0<\/td><td>0<\/td><td>0<\/td><td>0<\/td><\/tr><tr><td><strong>16<\/strong><\/td><td>0<\/td><td>0<\/td><td>0<\/td><td>0<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><em>Sample Q-Table for 4 x 4 2D game environment<\/em><\/p>\n\n\n\n    <div class=\"courses-cta-container\">\n        <div class=\"courses-cta-card\">\n            <div class=\"courses-cta-header\">\n                <div class=\"courses-learn-icon\"><\/div>\n                <span class=\"courses-learn-text\">Accelerate AI Career<\/span>\n            <\/div>\n            <p class=\"courses-cta-title\">\n                <a href=\"https:\/\/www.mygreatlearning.com\/executive-pgp-ai-machine-learning\" class=\"courses-cta-title-link\">PGP- Artificial Intelligence and Machine Learning (Executive)<\/a>\n            <\/p>\n            <p class=\"courses-cta-description\">Advance your career with cutting-edge AI skills \u2014 built for working professionals.<\/p>\n            <div class=\"courses-cta-stats\">\n                <div class=\"courses-stat-item\">\n                    <div class=\"courses-stat-icon courses-user-icon\"><\/div>\n                    <span>7 months Duration<\/span>\n                <\/div>\n                <div class=\"courses-stat-item\">\n                    <div class=\"courses-stat-icon courses-star-icon\"><\/div>\n                    <span>4.72\/5 Rating<\/span>\n                <\/div>\n            <\/div>\n            <a href=\"https:\/\/www.mygreatlearning.com\/executive-pgp-ai-machine-learning\" class=\"courses-cta-button\">\n                Explore the Course\n                <div class=\"courses-arrow-icon\"><\/div>\n            <\/a>\n        <\/div>\n    <\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"learning-process\"><strong>Learning Process <\/strong><\/h2>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"step-1-initialization\"><strong>Step 1: Initialization <\/strong><\/h4>\n\n\n\n<p>When the agent plays the game for the first time, it has no prior knowledge so let\u2019s initialize the table with zeroes.&nbsp;&nbsp;<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"step-2-exploitation-or-exploration\"><strong>Step 2: Exploitation OR Exploration <\/strong><\/h4>\n\n\n\n<p>Now the agent can interact with the environment in two ways: either it can use already gained info from the Q-table i.e. exploit, or it can venture to uncharted territories i.e. explore. Exploitation becomes very useful when the agent has worked out a high number of episodes and has information about the environment. Whereas, the exploration becomes important when the agent is na\u00efve and does not have much experience. This tradeoff between exploitation and exploration can be handled by including epsilon in the value function. Ideally, at initial stages, we would like to give more preference to exploration, while in the later stages exploitation would be more useful.&nbsp;&nbsp;<\/p>\n\n\n\n<p>In Step 2, the agent takes an action (exploit or explore).&nbsp;<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"step-3-measure-reward\"><strong>Step 3: Measure Reward <\/strong><\/h4>\n\n\n\n<p>After the agent performs an action decided in step 2, it reaches the next state say <strong>s<\/strong>\u2019. Now again at state <strong>s\u2019<\/strong> the four actions can be performed, each one leading to a different reward score.&nbsp;<\/p>\n\n\n\n<p>For e.g, the boy moves from 1 to 5, now either 6 can be selected or 9 can be selected. Now for finding the reward value for state 5, we will find out the reward values of all the future states i.e, 6 &amp; 9, and select the maximum value.<\/p>\n\n\n\n<p>At 5, there are two options (For simplicity retracing steps is not performed)\u2013&nbsp;<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Go to 9 : End of Episode&nbsp;<\/li>\n\n\n\n<li>Go to 6 : At state 6 there are again 3 options \u2013&nbsp;<\/li>\n<\/ol>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Go to 7 -&nbsp; End of Episode&nbsp;<\/li>\n\n\n\n<li>Go to 2 - Continue this step until reach end of episode and find out the reward&nbsp;&nbsp;<\/li>\n\n\n\n<li>Go to 10 - Continue this step, find out reward&nbsp;&nbsp;<\/li>\n<\/ol>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"sample-calculation\"><strong>Sample Calculation <\/strong><\/h4>\n\n\n\n<p>Path A reward = 10 + 50 = 60&nbsp;<\/p>\n\n\n\n<p>Path B reward = 50&nbsp;<\/p>\n\n\n\n<p>Max Reward = 60 (Path A)&nbsp;<\/p>\n\n\n\n<p>Total Rewards at State 5:&nbsp; -50 (Faced dog at 9), 10 + 60 (Max reward from State 6 onwards)&nbsp;<\/p>\n\n\n\n<p>Value of reward at 5 = Max (-50 , 10+60 ) = 70<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"step-4-update-the-q-table\"><strong>Step 4: Update the Q table<\/strong><\/h4>\n\n\n\n<p>The reward value calculated in step 3 is then used to update the value at state 5 using the Bellman\u2019s equation-<\/p>\n\n\n\n<p>Here,&nbsp;Learning rate = A constant which determines how much weightage you want to give to the new value vs the old value.&nbsp;<\/p>\n\n\n\n<p>Discount Rate = Constant that discounts the effect of future rewards (0.8 to 0.99), i.e., balance the effect of future rewards in the new values.&nbsp;&nbsp;&nbsp;<\/p>\n\n\n\n<p>The agent will iterate over these steps and achieve a Q- Table with updated values. Now using this Q-Table is as simple as using a map, for each state select an action, which leads to a state with the maximum Q value.&nbsp;<\/p>\n\n\n\n<p>If you found this helpful and wish to learn more such concepts, you can enrol with <a href=\"https:\/\/www.mygreatlearning.com\/academy\" target=\"_blank\" rel=\"noreferrer noopener\">Great Learning Academy's free online courses<\/a> today. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Q Learning, a model-free reinforcement learning algorithm, aims to learn the quality of actions and telling an agent what action is to be taken under which circumstance. Through the course of this blog, we will learn more about Q Learning, and it's learning process with the help of an example. Contributed by: Rahul Purohit Richard [&hellip;]<\/p>\n","protected":false},"author":41,"featured_media":22028,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_uag_custom_page_level_css":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"set","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[2],"tags":[],"content_type":[],"class_list":["post-22022","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.3 (Yoast SEO v27.3) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>What is Reinforcement Learning | Everything about Q Learning<\/title>\n<meta name=\"description\" content=\"Reinforcement Learning or Q Learning: A model-free reinforcement learning algorithm, aims to learn the quality of actions and telling an agent what action is to be taken.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.mygreatlearning.com\/blog\/simplified-reinforcement-learning-q-learning\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Simplified Reinforcement Learning: Q Learning\" \/>\n<meta property=\"og:description\" content=\"Reinforcement Learning or Q Learning: A model-free reinforcement learning algorithm, aims to learn the quality of actions and telling an agent what action is to be taken.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.mygreatlearning.com\/blog\/simplified-reinforcement-learning-q-learning\/\" \/>\n<meta property=\"og:site_name\" content=\"Great Learning Blog: Free Resources what Matters to shape your Career!\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/GreatLearningOfficial\/\" \/>\n<meta property=\"article:published_time\" content=\"2020-11-11T08:21:28+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-10-24T13:51:07+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/10\/iStock-1193832036-1.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1254\" \/>\n\t<meta property=\"og:image:height\" content=\"836\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Great Learning Editorial Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@https:\/\/twitter.com\/Great_Learning\" \/>\n<meta name=\"twitter:site\" content=\"@Great_Learning\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Great Learning Editorial Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/simplified-reinforcement-learning-q-learning\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/simplified-reinforcement-learning-q-learning\\\/\"},\"author\":{\"name\":\"Great Learning Editorial Team\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#\\\/schema\\\/person\\\/6f993d1be4c584a335951e836f2656ad\"},\"headline\":\"Simplified Reinforcement Learning: Q Learning\",\"datePublished\":\"2020-11-11T08:21:28+00:00\",\"dateModified\":\"2024-10-24T13:51:07+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/simplified-reinforcement-learning-q-learning\\\/\"},\"wordCount\":1269,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/simplified-reinforcement-learning-q-learning\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2020\\\/10\\\/iStock-1193832036-1.jpg\",\"articleSection\":[\"AI and Machine Learning\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/simplified-reinforcement-learning-q-learning\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/simplified-reinforcement-learning-q-learning\\\/\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/simplified-reinforcement-learning-q-learning\\\/\",\"name\":\"What is Reinforcement Learning | Everything about Q Learning\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/simplified-reinforcement-learning-q-learning\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/simplified-reinforcement-learning-q-learning\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2020\\\/10\\\/iStock-1193832036-1.jpg\",\"datePublished\":\"2020-11-11T08:21:28+00:00\",\"dateModified\":\"2024-10-24T13:51:07+00:00\",\"description\":\"Reinforcement Learning or Q Learning: A model-free reinforcement learning algorithm, aims to learn the quality of actions and telling an agent what action is to be taken.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/simplified-reinforcement-learning-q-learning\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/simplified-reinforcement-learning-q-learning\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/simplified-reinforcement-learning-q-learning\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2020\\\/10\\\/iStock-1193832036-1.jpg\",\"contentUrl\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2020\\\/10\\\/iStock-1193832036-1.jpg\",\"width\":1254,\"height\":836,\"caption\":\"q learning\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/simplified-reinforcement-learning-q-learning\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Blog\",\"item\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AI and Machine Learning\",\"item\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/artificial-intelligence\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Simplified Reinforcement Learning: Q Learning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/\",\"name\":\"Great Learning Blog\",\"description\":\"Learn, Upskill &amp; Career Development Guide and Resources\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#organization\"},\"alternateName\":\"Great Learning\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#organization\",\"name\":\"Great Learning\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/GL-Logo.jpg\",\"contentUrl\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/GL-Logo.jpg\",\"width\":900,\"height\":900,\"caption\":\"Great Learning\"},\"image\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/GreatLearningOfficial\\\/\",\"https:\\\/\\\/x.com\\\/Great_Learning\",\"https:\\\/\\\/www.instagram.com\\\/greatlearningofficial\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/school\\\/great-learning\\\/\",\"https:\\\/\\\/in.pinterest.com\\\/greatlearning12\\\/\",\"https:\\\/\\\/www.youtube.com\\\/user\\\/beaconelearning\\\/\"],\"description\":\"Great Learning is a leading global ed-tech company for professional training and higher education. It offers comprehensive, industry-relevant, hands-on learning programs across various business, technology, and interdisciplinary domains driving the digital economy. These programs are developed and offered in collaboration with the world's foremost academic institutions.\",\"email\":\"info@mygreatlearning.com\",\"legalName\":\"Great Learning Education Services Pvt. Ltd\",\"foundingDate\":\"2013-11-29\",\"numberOfEmployees\":{\"@type\":\"QuantitativeValue\",\"minValue\":\"1001\",\"maxValue\":\"5000\"}},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#\\\/schema\\\/person\\\/6f993d1be4c584a335951e836f2656ad\",\"name\":\"Great Learning Editorial Team\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/02\\\/unnamed.webp\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/02\\\/unnamed.webp\",\"contentUrl\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/02\\\/unnamed.webp\",\"caption\":\"Great Learning Editorial Team\"},\"description\":\"The Great Learning Editorial Staff includes a dynamic team of subject matter experts, instructors, and education professionals who combine their deep industry knowledge with innovative teaching methods. Their mission is to provide learners with the skills and insights needed to excel in their careers, whether through upskilling, reskilling, or transitioning into new fields.\",\"sameAs\":[\"https:\\\/\\\/www.mygreatlearning.com\\\/\",\"https:\\\/\\\/in.linkedin.com\\\/school\\\/great-learning\\\/\",\"https:\\\/\\\/x.com\\\/https:\\\/\\\/twitter.com\\\/Great_Learning\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UCObs0kLIrDjX2LLSybqNaEA\"],\"award\":[\"Best EdTech Company of the Year 2024\",\"Education Economictimes Outstanding Education\\\/Edtech Solution Provider of the Year 2024\",\"Leading E-learning Platform 2024\"],\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/author\\\/greatlearning\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"What is Reinforcement Learning | Everything about Q Learning","description":"Reinforcement Learning or Q Learning: A model-free reinforcement learning algorithm, aims to learn the quality of actions and telling an agent what action is to be taken.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.mygreatlearning.com\/blog\/simplified-reinforcement-learning-q-learning\/","og_locale":"en_US","og_type":"article","og_title":"Simplified Reinforcement Learning: Q Learning","og_description":"Reinforcement Learning or Q Learning: A model-free reinforcement learning algorithm, aims to learn the quality of actions and telling an agent what action is to be taken.","og_url":"https:\/\/www.mygreatlearning.com\/blog\/simplified-reinforcement-learning-q-learning\/","og_site_name":"Great Learning Blog: Free Resources what Matters to shape your Career!","article_publisher":"https:\/\/www.facebook.com\/GreatLearningOfficial\/","article_published_time":"2020-11-11T08:21:28+00:00","article_modified_time":"2024-10-24T13:51:07+00:00","og_image":[{"width":1254,"height":836,"url":"http:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/10\/iStock-1193832036-1.jpg","type":"image\/jpeg"}],"author":"Great Learning Editorial Team","twitter_card":"summary_large_image","twitter_creator":"@https:\/\/twitter.com\/Great_Learning","twitter_site":"@Great_Learning","twitter_misc":{"Written by":"Great Learning Editorial Team","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.mygreatlearning.com\/blog\/simplified-reinforcement-learning-q-learning\/#article","isPartOf":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/simplified-reinforcement-learning-q-learning\/"},"author":{"name":"Great Learning Editorial Team","@id":"https:\/\/www.mygreatlearning.com\/blog\/#\/schema\/person\/6f993d1be4c584a335951e836f2656ad"},"headline":"Simplified Reinforcement Learning: Q Learning","datePublished":"2020-11-11T08:21:28+00:00","dateModified":"2024-10-24T13:51:07+00:00","mainEntityOfPage":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/simplified-reinforcement-learning-q-learning\/"},"wordCount":1269,"commentCount":0,"publisher":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/simplified-reinforcement-learning-q-learning\/#primaryimage"},"thumbnailUrl":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/10\/iStock-1193832036-1.jpg","articleSection":["AI and Machine Learning"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.mygreatlearning.com\/blog\/simplified-reinforcement-learning-q-learning\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.mygreatlearning.com\/blog\/simplified-reinforcement-learning-q-learning\/","url":"https:\/\/www.mygreatlearning.com\/blog\/simplified-reinforcement-learning-q-learning\/","name":"What is Reinforcement Learning | Everything about Q Learning","isPartOf":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/simplified-reinforcement-learning-q-learning\/#primaryimage"},"image":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/simplified-reinforcement-learning-q-learning\/#primaryimage"},"thumbnailUrl":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/10\/iStock-1193832036-1.jpg","datePublished":"2020-11-11T08:21:28+00:00","dateModified":"2024-10-24T13:51:07+00:00","description":"Reinforcement Learning or Q Learning: A model-free reinforcement learning algorithm, aims to learn the quality of actions and telling an agent what action is to be taken.","breadcrumb":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/simplified-reinforcement-learning-q-learning\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.mygreatlearning.com\/blog\/simplified-reinforcement-learning-q-learning\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.mygreatlearning.com\/blog\/simplified-reinforcement-learning-q-learning\/#primaryimage","url":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/10\/iStock-1193832036-1.jpg","contentUrl":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/10\/iStock-1193832036-1.jpg","width":1254,"height":836,"caption":"q learning"},{"@type":"BreadcrumbList","@id":"https:\/\/www.mygreatlearning.com\/blog\/simplified-reinforcement-learning-q-learning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog","item":"https:\/\/www.mygreatlearning.com\/blog\/"},{"@type":"ListItem","position":2,"name":"AI and Machine Learning","item":"https:\/\/www.mygreatlearning.com\/blog\/artificial-intelligence\/"},{"@type":"ListItem","position":3,"name":"Simplified Reinforcement Learning: Q Learning"}]},{"@type":"WebSite","@id":"https:\/\/www.mygreatlearning.com\/blog\/#website","url":"https:\/\/www.mygreatlearning.com\/blog\/","name":"Great Learning Blog","description":"Learn, Upskill &amp; Career Development Guide and Resources","publisher":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/#organization"},"alternateName":"Great Learning","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.mygreatlearning.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.mygreatlearning.com\/blog\/#organization","name":"Great Learning","url":"https:\/\/www.mygreatlearning.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.mygreatlearning.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2022\/06\/GL-Logo.jpg","contentUrl":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2022\/06\/GL-Logo.jpg","width":900,"height":900,"caption":"Great Learning"},"image":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/GreatLearningOfficial\/","https:\/\/x.com\/Great_Learning","https:\/\/www.instagram.com\/greatlearningofficial\/","https:\/\/www.linkedin.com\/school\/great-learning\/","https:\/\/in.pinterest.com\/greatlearning12\/","https:\/\/www.youtube.com\/user\/beaconelearning\/"],"description":"Great Learning is a leading global ed-tech company for professional training and higher education. It offers comprehensive, industry-relevant, hands-on learning programs across various business, technology, and interdisciplinary domains driving the digital economy. These programs are developed and offered in collaboration with the world's foremost academic institutions.","email":"info@mygreatlearning.com","legalName":"Great Learning Education Services Pvt. Ltd","foundingDate":"2013-11-29","numberOfEmployees":{"@type":"QuantitativeValue","minValue":"1001","maxValue":"5000"}},{"@type":"Person","@id":"https:\/\/www.mygreatlearning.com\/blog\/#\/schema\/person\/6f993d1be4c584a335951e836f2656ad","name":"Great Learning Editorial Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2022\/02\/unnamed.webp","url":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2022\/02\/unnamed.webp","contentUrl":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2022\/02\/unnamed.webp","caption":"Great Learning Editorial Team"},"description":"The Great Learning Editorial Staff includes a dynamic team of subject matter experts, instructors, and education professionals who combine their deep industry knowledge with innovative teaching methods. Their mission is to provide learners with the skills and insights needed to excel in their careers, whether through upskilling, reskilling, or transitioning into new fields.","sameAs":["https:\/\/www.mygreatlearning.com\/","https:\/\/in.linkedin.com\/school\/great-learning\/","https:\/\/x.com\/https:\/\/twitter.com\/Great_Learning","https:\/\/www.youtube.com\/channel\/UCObs0kLIrDjX2LLSybqNaEA"],"award":["Best EdTech Company of the Year 2024","Education Economictimes Outstanding Education\/Edtech Solution Provider of the Year 2024","Leading E-learning Platform 2024"],"url":"https:\/\/www.mygreatlearning.com\/blog\/author\/greatlearning\/"}]}},"uagb_featured_image_src":{"full":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/10\/iStock-1193832036-1.jpg",1254,836,false],"thumbnail":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/10\/iStock-1193832036-1-150x150.jpg",150,150,true],"medium":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/10\/iStock-1193832036-1-300x200.jpg",300,200,true],"medium_large":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/10\/iStock-1193832036-1-768x512.jpg",768,512,true],"large":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/10\/iStock-1193832036-1-1024x683.jpg",1024,683,true],"1536x1536":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/10\/iStock-1193832036-1.jpg",1254,836,false],"2048x2048":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/10\/iStock-1193832036-1.jpg",1254,836,false],"web-stories-poster-portrait":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/10\/iStock-1193832036-1.jpg",640,427,false],"web-stories-publisher-logo":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/10\/iStock-1193832036-1.jpg",96,64,false],"web-stories-thumbnail":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/10\/iStock-1193832036-1.jpg",150,100,false]},"uagb_author_info":{"display_name":"Great Learning Editorial Team","author_link":"https:\/\/www.mygreatlearning.com\/blog\/author\/greatlearning\/"},"uagb_comment_info":0,"uagb_excerpt":"Q Learning, a model-free reinforcement learning algorithm, aims to learn the quality of actions and telling an agent what action is to be taken under which circumstance. Through the course of this blog, we will learn more about Q Learning, and it's learning process with the help of an example. Contributed by: Rahul Purohit Richard&hellip;","_links":{"self":[{"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/posts\/22022","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/users\/41"}],"replies":[{"embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/comments?post=22022"}],"version-history":[{"count":17,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/posts\/22022\/revisions"}],"predecessor-version":[{"id":104973,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/posts\/22022\/revisions\/104973"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/media\/22028"}],"wp:attachment":[{"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/media?parent=22022"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/categories?post=22022"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/tags?post=22022"},{"taxonomy":"content_type","embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/content_type?post=22022"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}