{"id":20142,"date":"2020-09-19T16:00:06","date_gmt":"2020-09-19T10:30:06","guid":{"rendered":"https:\/\/www.mygreatlearning.com\/blog\/the-vanishing-gradient-problem\/"},"modified":"2025-04-28T03:17:29","modified_gmt":"2025-04-27T21:47:29","slug":"the-vanishing-gradient-problem","status":"publish","type":"post","link":"https:\/\/www.mygreatlearning.com\/blog\/the-vanishing-gradient-problem\/","title":{"rendered":"The Vanishing Gradient Problem"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\" id=\"what-is-the-vanishing-gradient\">What is the Vanishing Gradient?<\/h2>\n\n\n\n<p>The vanishing gradient is a phenomenon in which the\u2002gradients used to update the weights in <a href=\"https:\/\/www.mygreatlearning.com\/blog\/types-of-neural-networks\/\">neural networks<\/a> become very small, tending closely towards zero. This becomes a problem when dealing with deep networks that have multiple hidden layers.<\/p>\n\n\n\n<p>In <a href=\"https:\/\/www.mygreatlearning.com\/blog\/backpropagation-algorithm\/\">backpropagation<\/a>, we pass the gradients of the loss function backward from the output layer all the way to the input layer, whereby in each layer, the gradients are multiplied by the derivatives of the activation functions.<\/p>\n\n\n\n<p>For deep networks, this can lead to gradients that vanish as they propagate through multiple layers, especially if the activation function has a region with a low derivative. As these gradients tend to zero, the weights in earlier layers, closer to the input, settle, and the model cannot learn from the data effectively.<\/p>\n\n\n\n<p>This causes the model, specifically the earlier layers of it, to become stuck at a certain point, preventing it from learning. Significantly curtailing or stopping training altogether can lead to a model with poor performance, even when data and large architectures are available.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"the-vanishing-gradient-problem-explained\">The Vanishing Gradient Problem Explained<\/h2>\n\n\n\n<p>Vanishing Gradient Problem The vanishing gradient problem is one of the most significant issues in training deep neural networks. It affects the model's ability to learn, enforcing a cap on the depth of networks that can be reasonably trained. The issue\u2002is largely prevalent in deep networks, or networks with multiple hidden layers.<\/p>\n\n\n\n<p>The two primary reasons behind the problem are the activation functions, specifically the sigmoid function and the tanh function. The activation functions we used in early neural networks have areas where their derivatives become very small. During backpropagation, the derivatives get multiplied repeatedly, and the values of the gradients shrink\u2002and converge toward zero.<\/p>\n\n\n\n<p>As the number of layers in the network continues to increase, the problem becomes defined more precisely. In deep nets, the gradients backpropagate through multiple\u2002layers, and as they arrive at previous layers, they're often so small that they have little or no effect on the weight updates.&nbsp;<\/p>\n\n\n\n<p>This causes the model not to learn the features represented by the first layers, which is very dangerous not only in image recognition but also in tasks like <a href=\"https:\/\/www.mygreatlearning.com\/blog\/natural-language-processing-tutorial\/\">natural language processing<\/a>, where the first layers can convey essential concepts related to the data.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"the-role-of-gradients-in-neural-networks\">The Role of Gradients in Neural Networks<\/h2>\n\n\n\n<p><strong>The Role of Gradients in Neural Networks<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Neural Networks<\/strong>:\n<ul class=\"wp-block-list\">\n<li>Consists of multiple layers of neurons that process and pass data.<\/li>\n\n\n\n<li>The goal is to set the correct weights and biases for accurate predictions or classifications.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Training with Backpropagation<\/strong>:\n<ul class=\"wp-block-list\">\n<li>Backpropagation is used to update weights and biases by propagating errors backwards through the network.<\/li>\n\n\n\n<li><strong>Gradients<\/strong> are crucial in this process, as they help determine how much to adjust the weights based on prediction errors.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Understanding Gradients<\/strong>:\n<ul class=\"wp-block-list\">\n<li>A gradient indicates the direction and magnitude of weight adjustments required to minimize error.<\/li>\n\n\n\n<li>It is calculated as the change in the loss function with respect to the weights.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Iterative Process<\/strong>:\n<ul class=\"wp-block-list\">\n<li>The gradient-based updates are performed iteratively until the model\u2019s weights are optimized for accurate predictions.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Vanishing Gradient Issue<\/strong>:\n<ul class=\"wp-block-list\">\n<li>If gradients become too small, learning slows down, leading to the vanishing gradient problem, which makes optimization harder.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"the-impact-of-vanishing-gradients-on-deep-neural-networks\">The Impact of Vanishing Gradients on Deep Neural Networks<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Kewan or No Learning<\/strong>:\n<ul class=\"wp-block-list\">\n<li>Vanishing gradients cause weights in earlier layers not to update correctly.<\/li>\n\n\n\n<li>This leads to failure in convergence or extremely slow convergence, preventing the model from learning effectively.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Training Deep Networks<\/strong>:\n<ul class=\"wp-block-list\">\n<li>Deep networks learn hierarchical representations, but vanishing gradients make it difficult for deeper layers to learn.<\/li>\n\n\n\n<li>As layers are added, gradients get smaller, leading to poor optimization and limited practical use in real-world applications.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Poor Feature Representation<\/strong>:\n<ul class=\"wp-block-list\">\n<li>When gradients shrink, earlier layers, which capture low-level features, cannot update their weights.<\/li>\n\n\n\n<li>This results in the loss of both local and global information, leading to weak feature representation and overall poor performance.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Difficulty with Nonlinear Activation Functions<\/strong>:\n<ul class=\"wp-block-list\">\n<li>Activation functions like sigmoid and tanh can worsen the vanishing gradient problem.<\/li>\n\n\n\n<li>Large or small input values squeeze gradients to very small values, exacerbating the issue.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"causes-of-vanishing-gradients\">Causes of Vanishing Gradients<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Choice of Activation Function<\/strong>:\n<ul class=\"wp-block-list\">\n<li>Functions like sigmoid and tanh can cause vanishing gradients when inputs are too small.<\/li>\n\n\n\n<li>The derivatives of these functions approach zero for large or small input values, making gradients very small and hindering learning, especially in deeper layers.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Deep Neural Network Architectures<\/strong>:\n<ul class=\"wp-block-list\">\n<li>As more layers are added, gradients are multiplied by activation function derivatives at each layer.<\/li>\n\n\n\n<li>This results in exponentially smaller gradients as depth increases, making it harder for the network to learn.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Weight Initialization<\/strong>:\n<ul class=\"wp-block-list\">\n<li>Poor weight initialization can worsen vanishing gradients.<\/li>\n\n\n\n<li>Small initial weights result in small gradients, whereas improper initialization can lead to unstable weight conditions, thereby slowing down training.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Poorly Chosen Learning Rates<\/strong>:\n<ul class=\"wp-block-list\">\n<li>A high learning rate may cause the network to skip optimal weight values.<\/li>\n\n\n\n<li>A low learning rate results in extremely small updates, preventing the network from converging properly.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"rectified-linear-unit-activation-function-relus\">Rectified Linear Unit Activation Function (ReLU\u2019s)<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"demonstrating-the-vanishing-gradient-problem-in-tensorflow\">Demonstrating the vanishing gradient problem in TensorFlow<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"creating-the-model\"><strong>Creating the model<\/strong><\/h4>\n\n\n\n<p>In the TensorFlow code that I am about to show you, we\u2019ll create a seven-layer densely connected network (including the input and output layers) and use TensorFlow summary operations and TensorBoard visualization to visualize the gradients.<\/p>\n\n\n\n<p>The code utilizes the TensorFlow layers (tf.layers) framework, which enables the quick and easy construction of networks. The data we will be training the network on is the MNIST handwritten digit recognition dataset that comes packaged with the TensorFlow installation. To create the dataset, we can run the following:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nmnist = input_data.read_data_sets(&quot;MNIST_data\/&quot;, one_hot=True)\n<\/pre><\/div>\n\n\n<p>The MNIST data can be extracted from this MNIST data set by calling mnist.train.next_batch(batch_size). In this case, we\u2019ll just be looking at the training data, but you can also extract a test dataset from the same data.<\/p>\n\n\n\n<p>In this example, I\u2019ll be using the feed_dict methodology and placeholder variables to feed in the training data, which isn\u2019t the optimal method, but it will do for these purposes.\u00a0<\/p>\n\n\n\n<p>Set up the data placeholders:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nself.input_images = tf.placeholder(tf.float32, shape=&#x5B;None, self._input_size])\nself.labels = tf.placeholder(tf.float32, shape=&#x5B;None, self._label_size])\n<\/pre><\/div>\n\n\n<p>The MNIST data input size (<em>self._input_size<\/em>) is equal to the 28 x 28 image pixels, i.e. 784 pixels. The number of associated labels, <em>self._label_size,<\/em> is equal to the 10 possible hand-written digit classes in the MNIST dataset.<\/p>\n\n\n\n<p>We\u2019ll be creating a slightly deep, fully connected network \u2013 a network with seven total layers, including input and output layers. To create these densely connected layers easily, we\u2019ll be using TensorFlow\u2019s handy tf.layers API and a simple Python loop like follows:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n# create self._num_layers dense layers as the model\ninput = self.input_images\nfor i in range(self._num_layers - 1):\n    input = tf.layers.dense(input, self._hidden_size, activation=self._activation,\n                                    name=&#039;layer{}&#039;.format(i+1))\n<\/pre><\/div>\n\n\n<p>First, the generic <em>input <\/em>variable is initialized to be equal to the input images (fed via the placeholder)<\/p>\n\n\n\n<p>Next, the code runs through a loop where multiple dense layers are created, each named \u2018layerX\u2019 where X is the layer number.<\/p>\n\n\n\n<p>The number of nodes in the layer is set equal to the class property <em>self._hidden_size,<\/em> and the <a href=\"https:\/\/www.mygreatlearning.com\/blog\/activation-functions\/\">activation function<\/a> is also supplied via the property <em>self._activation<\/em>.<\/p>\n\n\n\n<p>Next, we create the final output layer (you\u2019ll note that the loop above terminates before it gets to creating the final layer), and we don\u2019t supply an activation to this layer. In the tf.layers API, a linear activation (i.e. <em>f(x) = x<\/em>) is applied by default if no activation argument is supplied.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n# don&#039;t supply an activation for the final layer - the loss definition will\n# supply softmax activation. This defaults to a linear activation i.e. f(x) = x\nlogits = tf.layers.dense(input, 10, name=&#039;layer{}&#039;.format(self._num_layers))\n<\/pre><\/div>\n\n\n<p>Next, the loss operation is set up and logged:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n# use softmax cross entropy with logits - no need to apply softmax activation to\n# logits\nself.loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits,\n                                                                             labels=self.labels))\n# add the loss to the summary\ntf.summary.scalar(&#039;loss&#039;, self.loss)\n<\/pre><\/div>\n\n\n<p>The loss used in this instance is the handy TensorFlow softmax_cross_entropy_with_logits_v2 (the original version is soon to be deprecated). This loss function will apply the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Softmax_function\" target=\"_blank\" rel=\"noreferrer noopener\">softmax<\/a> operation to the unactivated output of the network, then apply the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Cross_entropy\" target=\"_blank\" rel=\"noreferrer noopener\">cross entropy loss<\/a> to this outcome.<\/p>\n\n\n\n<p>After this loss operation is created, its output value is added to the tf.summary framework. This framework allows scalar values to be logged and subsequently visualized in the TensorBoard web-based visualization page. It can also log histogram information, along with audio and images \u2013 all of these can be observed through the aforementioned TensorBoard visualization.<\/p>\n\n\n\n<p>Next, the program calls a method to log the gradients, which we will visualize to examine the vanishing gradient problem:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nself._log_gradients(self._num_layers)\n<\/pre><\/div>\n\n\n<p>This method looks like the following:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\ndef _log_gradients(self, num_layers):\n    gr = tf.get_default_graph()\n    for i in range(num_layers):\n        weight = gr.get_tensor_by_name(&#039;layer{}\/kernel:0&#039;.format(i + 1))\n        grad = tf.gradients(self.loss, weight)&#x5B;0]\n        mean = tf.reduce_mean(tf.abs(grad))\n        tf.summary.scalar(&#039;mean_{}&#039;.format(i + 1), mean)\n        tf.summary.histogram(&#039;histogram_{}&#039;.format(i + 1), grad)\n        tf.summary.histogram(&#039;hist_weights_{}&#039;.format(i + 1), grad)\n<\/pre><\/div>\n\n\n<p>In this method, the TensorFlow computational graph is first extracted so that weight variables can be called out of it. Then, a loop is entered to cycle through all the layers. For each layer, first, the weight tensor for the given layer is grabbed by the handy function get_tensor_by_name.<\/p>\n\n\n\n<p>You will recall that each layer was named \u201clayerX\u201d where X is the layer number. This is supplied to the function, along with \u201c\/kernel:0\u201d \u2013 this tells the function that we are trying to access the weight variable (also called a kernel) as opposed to the bias value, which would be \u201c\/bias:0\u201d.<\/p>\n\n\n\n<p>On the following line, the tf.gradients() function is used. This will calculate gradients of the form \u2202y\/\u2202x where the first argument supplied to the function is <em>y<\/em> and the second is <em>x<\/em>. In the gradient descent step, the weight update is made in proportion to \u2202loss\/\u2202W, so in this case, the first argument supplied to tf.gradients() is the loss, and the second is the weight tensor.<\/p>\n\n\n\n<p>Next, the mean absolute value of the gradient is calculated and logged as a scalar in the summary. Next, histograms of the gradients and the weight values are also logged in the summary. The flow now returns to the primary method in the class.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nself.optimizer = tf.train.AdamOptimizer().minimize(self.loss)\nself.accuracy = self._compute_accuracy(logits, self.labels)\ntf.summary.scalar(&#039;acc&#039;, self.accuracy)\n<\/pre><\/div>\n\n\n<p>The code above is reasonably standard TensorFlow usage \u2013 defining an optimizer, in this case the flexible and powerful AdamOptimizer(), and also a generic accuracy operation, the outcome of which is also added to the summary (see the<a href=\"https:\/\/github.com\/adventuresinML\/adventures-in-ml-code\"> Github code<\/a> for the accuracy method called).<\/p>\n\n\n\n<p>Finally, a summary merge operation is created, which will gather up all the summary data ready for export to the TensorBoard file whenever it is executed:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nself.merged = tf.summary.merge_all()\n<\/pre><\/div>\n\n\n<p>An initialization operation is also created. Now all that is left is to run the main training loop.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"training-the-model\"><strong>Training the model<\/strong><\/h4>\n\n\n\n<p>The main training loop of this experimental model is shown in the code below:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\ndef run_training(model, mnist, sub_folder, iterations=2500, batch_size=30):\n    with tf.Session() as sess:\n        sess.run(model.init_op)\n        train_writer = tf.summary.FileWriter(base_path + sub_folder,\n                                             sess.graph)\n        for i in range(iterations):\n            image_batch, label_batch = mnist.train.next_batch(batch_size)\n            l, _, acc = sess.run(&#x5B;model.loss, model.optimizer, model.accuracy],\n                                 feed_dict={model.input_images: image_batch, model.labels: label_batch})\n            if i % 200 == 0:\n                summary = sess.run(model.merged, feed_dict={model.input_images: image_batch,\n                                                            model.labels: label_batch})\n                train_writer.add_summary(summary, i)\n                print(&quot;Iteration {} of {}, loss: {:.3f}, train accuracy: &quot;\n                      &quot;{:.2f}%&quot;.format(i, iterations, l, acc * 100))\n<\/pre><\/div>\n\n\n<p>This is a pretty standard TensorFlow training loop \u2013 however, one non-standard addition is the tf.summary.FileWriter() operation and its associated uses. This operation typically requires two arguments: the location to store the files and the session graph.<\/p>\n\n\n\n<p>Note that it is a good idea to set up a separate subfolder for each of your TensorFlow runs when using summaries, as this enables better visualization and comparison of the various runs within TensorBoard.<\/p>\n\n\n\n<p>Every 200 iterations, we run the <em>merged<\/em> operation, which is defined in the class instance model \u2013 as mentioned previously, this gathers up all the logged summary data ready for writing. The train_writer.add_summary() operation is then run on this output, which writes the data into the chosen location (optionally along with the iteration\/epoch number).<\/p>\n\n\n\n<p>The summary data can then be visualized using TensorBoard. To run TensorBoard, using the command prompt, navigate to the base directory where all the subfolders are stored, and run the following command:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\ntensorboard \u2013log_dir=whatever_your_folder_path_is\n<\/pre><\/div>\n\n\n<p>Upon running this command, you will see startup information in the prompt, telling you the address to type into your browser and bring up the TensorBoard interface. Note that the TensorBoard page will update itself dynamically during training, so you can visually monitor the progress.<\/p>\n\n\n\n<p>Now, to run this whole experiment, we can run the following code, which cycles through each of the activation functions:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nscenarios = &#x5B;&quot;sigmoid&quot;, &quot;relu&quot;, &quot;leaky_relu&quot;]\nact_funcs = &#x5B;tf.sigmoid, tf.nn.relu, tf.nn.leaky_relu]\nassert len(scenarios) == len(act_funcs)\n# collect the training data\nmnist = input_data.read_data_sets(&quot;MNIST_data\/&quot;, one_hot=True)\nfor i in range(len(scenarios)):\n    tf.reset_default_graph()\n    print(&quot;Running scenario: {}&quot;.format(scenarios&#x5B;i]))\n    model = Model(784, 10, act_funcs&#x5B;i], 6, 10)\n    run_training(model, mnist, scenarios&#x5B;i])\n<\/pre><\/div>\n\n\n<p>This should be pretty self-explanatory. Three scenarios are investigated. A scenario is reviewed for each type of activation: sigmoid, ReLU, and Leaky ReLU. Note that, in this experiment, I\u2019ve set up a densely connected model with 6 layers (including the output layer but excluding the input layer), each with a layer size of 10 nodes.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"solutions-to-the-vanishing-gradient-problem\">Solutions to the Vanishing Gradient Problem<\/h2>\n\n\n\n<p>Several strategies have been developed to mitigate the vanishing gradient problem, enabling the training of deeper networks:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.mygreatlearning.com\/blog\/relu-activation-function\/\"><strong>ReLU Activation Function<\/strong>:<\/a>\n<ul class=\"wp-block-list\">\n<li>ReLU returns the input value if it is positive and zero if it is negative.<\/li>\n\n\n\n<li>Unlike sigmoid and tanh, it doesn't saturate in the positive domain, ensuring gradients don\u2019t shrink to zero.<\/li>\n\n\n\n<li><strong>Downside<\/strong>: The 'dying ReLU' problem, where neurons can get stuck and stop learning when the input is negative.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Leaky ReLU and Parametric ReLU<\/strong>:\n<ul class=\"wp-block-list\">\n<li>These variants solve the 'dying ReLU' issue by allowing small, non-zero gradients for negative inputs.<\/li>\n\n\n\n<li>Helps maintain neuron activity, especially in deep networks.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Weight Initialization Techniques<\/strong>:\n<ul class=\"wp-block-list\">\n<li><strong>Xavier Initialization<\/strong>: Ensures weights have variance proportional to the number of input and output units, reducing small gradients.<\/li>\n\n\n\n<li><strong>He Initialization<\/strong>: Optimized for ReLU networks, preventing vanishing gradients by using larger initial weights.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Batch Normalization<\/strong>:\n<ul class=\"wp-block-list\">\n<li>Normalizes the inputs of each layer, stabilizing the distribution of activations during training.<\/li>\n\n\n\n<li>Helps maintain a consistent range for activations, improving gradient flow and mitigating vanishing gradients.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Gradient Clipping<\/strong>:\n<ul class=\"wp-block-list\">\n<li>Limits the size of gradients during backpropagation to prevent them from becoming too large or too small.<\/li>\n\n\n\n<li>Ensures stable training by avoiding gradient explosion or vanishing.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Skip Connections and Residual Networks (<a href=\"https:\/\/www.mygreatlearning.com\/blog\/resnet\/\">ResNets<\/a>)<\/strong>:\n<ul class=\"wp-block-list\">\n<li>Skip connections allow gradients to bypass one or more layers, preventing them from becoming too small.<\/li>\n\n\n\n<li>Especially useful for training very deep networks with hundreds or thousands of layers, helping to alleviate vanishing gradients.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"conclusion\"><strong>Conclusion<\/strong><\/h3>\n\n\n\n<p>Overcoming the vanishing gradient\u2002problem, acting as a significant barrier for training deep NNs, primarily using the classical activation functions such as sigmoid or tanh. As neural networks become deeper, the gradients may shrink towards zero, and the network's weights will not be updated correctly, leading to poor model performance.<\/p>\n\n\n\n<p>However, this vanishing gradient problem has been ideally resolved due to the development of new activation functions, such as ReLU, and weight initialization methods, as well as by batch normalization. Furthermore, architectures such as ResNets utilize skip connections, which have successfully enabled more profound network training by ensuring gradients can flow through the network more smoothly.<\/p>\n\n\n\n<p>If you want to learn about deep learning, it's essential to understand the vanishing gradient problem and its solution. As this advances, neural networks can become more profound and complex, enabling them to solve increasingly complex issues in various fields.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>What is the Vanishing Gradient? The vanishing gradient is a phenomenon in which the\u2002gradients used to update the weights in neural networks become very small, tending closely towards zero. This becomes a problem when dealing with deep networks that have multiple hidden layers. In backpropagation, we pass the gradients of the loss function backward from [&hellip;]<\/p>\n","protected":false},"author":41,"featured_media":20147,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_uag_custom_page_level_css":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"set","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[2],"tags":[36827],"content_type":[],"class_list":["post-20142","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence","tag-neural-network"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.3 (Yoast SEO v27.3) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>What is Vanishing Gradient Problem?<\/title>\n<meta name=\"description\" content=\"Understand the vanishing gradient problem, its causes, impacts, and solutions.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.mygreatlearning.com\/blog\/the-vanishing-gradient-problem\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"The Vanishing Gradient Problem\" \/>\n<meta property=\"og:description\" content=\"Understand the vanishing gradient problem, its causes, impacts, and solutions.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.mygreatlearning.com\/blog\/the-vanishing-gradient-problem\/\" \/>\n<meta property=\"og:site_name\" content=\"Great Learning Blog: Free Resources what Matters to shape your Career!\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/GreatLearningOfficial\/\" \/>\n<meta property=\"article:published_time\" content=\"2020-09-19T10:30:06+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-27T21:47:29+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/09\/iStock-1193832036.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1254\" \/>\n\t<meta property=\"og:image:height\" content=\"836\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Great Learning Editorial Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@https:\/\/twitter.com\/Great_Learning\" \/>\n<meta name=\"twitter:site\" content=\"@Great_Learning\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Great Learning Editorial Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/the-vanishing-gradient-problem\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/the-vanishing-gradient-problem\\\/\"},\"author\":{\"name\":\"Great Learning Editorial Team\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#\\\/schema\\\/person\\\/6f993d1be4c584a335951e836f2656ad\"},\"headline\":\"The Vanishing Gradient Problem\",\"datePublished\":\"2020-09-19T10:30:06+00:00\",\"dateModified\":\"2025-04-27T21:47:29+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/the-vanishing-gradient-problem\\\/\"},\"wordCount\":2344,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/the-vanishing-gradient-problem\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2020\\\/09\\\/iStock-1193832036.jpg\",\"keywords\":[\"neural network\"],\"articleSection\":[\"AI and Machine Learning\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/the-vanishing-gradient-problem\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/the-vanishing-gradient-problem\\\/\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/the-vanishing-gradient-problem\\\/\",\"name\":\"What is Vanishing Gradient Problem?\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/the-vanishing-gradient-problem\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/the-vanishing-gradient-problem\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2020\\\/09\\\/iStock-1193832036.jpg\",\"datePublished\":\"2020-09-19T10:30:06+00:00\",\"dateModified\":\"2025-04-27T21:47:29+00:00\",\"description\":\"Understand the vanishing gradient problem, its causes, impacts, and solutions.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/the-vanishing-gradient-problem\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/the-vanishing-gradient-problem\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/the-vanishing-gradient-problem\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2020\\\/09\\\/iStock-1193832036.jpg\",\"contentUrl\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2020\\\/09\\\/iStock-1193832036.jpg\",\"width\":1254,\"height\":836,\"caption\":\"vanishing gradient problem\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/the-vanishing-gradient-problem\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Blog\",\"item\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AI and Machine Learning\",\"item\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/artificial-intelligence\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"The Vanishing Gradient Problem\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/\",\"name\":\"Great Learning Blog\",\"description\":\"Learn, Upskill &amp; Career Development Guide and Resources\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#organization\"},\"alternateName\":\"Great Learning\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#organization\",\"name\":\"Great Learning\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/GL-Logo.jpg\",\"contentUrl\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/GL-Logo.jpg\",\"width\":900,\"height\":900,\"caption\":\"Great Learning\"},\"image\":{\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/GreatLearningOfficial\\\/\",\"https:\\\/\\\/x.com\\\/Great_Learning\",\"https:\\\/\\\/www.instagram.com\\\/greatlearningofficial\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/school\\\/great-learning\\\/\",\"https:\\\/\\\/in.pinterest.com\\\/greatlearning12\\\/\",\"https:\\\/\\\/www.youtube.com\\\/user\\\/beaconelearning\\\/\"],\"description\":\"Great Learning is a leading global ed-tech company for professional training and higher education. It offers comprehensive, industry-relevant, hands-on learning programs across various business, technology, and interdisciplinary domains driving the digital economy. These programs are developed and offered in collaboration with the world's foremost academic institutions.\",\"email\":\"info@mygreatlearning.com\",\"legalName\":\"Great Learning Education Services Pvt. Ltd\",\"foundingDate\":\"2013-11-29\",\"numberOfEmployees\":{\"@type\":\"QuantitativeValue\",\"minValue\":\"1001\",\"maxValue\":\"5000\"}},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/#\\\/schema\\\/person\\\/6f993d1be4c584a335951e836f2656ad\",\"name\":\"Great Learning Editorial Team\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/02\\\/unnamed.webp\",\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/02\\\/unnamed.webp\",\"contentUrl\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/wp-content\\\/uploads\\\/2022\\\/02\\\/unnamed.webp\",\"caption\":\"Great Learning Editorial Team\"},\"description\":\"The Great Learning Editorial Staff includes a dynamic team of subject matter experts, instructors, and education professionals who combine their deep industry knowledge with innovative teaching methods. Their mission is to provide learners with the skills and insights needed to excel in their careers, whether through upskilling, reskilling, or transitioning into new fields.\",\"sameAs\":[\"https:\\\/\\\/www.mygreatlearning.com\\\/\",\"https:\\\/\\\/in.linkedin.com\\\/school\\\/great-learning\\\/\",\"https:\\\/\\\/x.com\\\/https:\\\/\\\/twitter.com\\\/Great_Learning\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UCObs0kLIrDjX2LLSybqNaEA\"],\"award\":[\"Best EdTech Company of the Year 2024\",\"Education Economictimes Outstanding Education\\\/Edtech Solution Provider of the Year 2024\",\"Leading E-learning Platform 2024\"],\"url\":\"https:\\\/\\\/www.mygreatlearning.com\\\/blog\\\/author\\\/greatlearning\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"What is Vanishing Gradient Problem?","description":"Understand the vanishing gradient problem, its causes, impacts, and solutions.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.mygreatlearning.com\/blog\/the-vanishing-gradient-problem\/","og_locale":"en_US","og_type":"article","og_title":"The Vanishing Gradient Problem","og_description":"Understand the vanishing gradient problem, its causes, impacts, and solutions.","og_url":"https:\/\/www.mygreatlearning.com\/blog\/the-vanishing-gradient-problem\/","og_site_name":"Great Learning Blog: Free Resources what Matters to shape your Career!","article_publisher":"https:\/\/www.facebook.com\/GreatLearningOfficial\/","article_published_time":"2020-09-19T10:30:06+00:00","article_modified_time":"2025-04-27T21:47:29+00:00","og_image":[{"width":1254,"height":836,"url":"http:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/09\/iStock-1193832036.jpg","type":"image\/jpeg"}],"author":"Great Learning Editorial Team","twitter_card":"summary_large_image","twitter_creator":"@https:\/\/twitter.com\/Great_Learning","twitter_site":"@Great_Learning","twitter_misc":{"Written by":"Great Learning Editorial Team","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.mygreatlearning.com\/blog\/the-vanishing-gradient-problem\/#article","isPartOf":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/the-vanishing-gradient-problem\/"},"author":{"name":"Great Learning Editorial Team","@id":"https:\/\/www.mygreatlearning.com\/blog\/#\/schema\/person\/6f993d1be4c584a335951e836f2656ad"},"headline":"The Vanishing Gradient Problem","datePublished":"2020-09-19T10:30:06+00:00","dateModified":"2025-04-27T21:47:29+00:00","mainEntityOfPage":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/the-vanishing-gradient-problem\/"},"wordCount":2344,"commentCount":0,"publisher":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/the-vanishing-gradient-problem\/#primaryimage"},"thumbnailUrl":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/09\/iStock-1193832036.jpg","keywords":["neural network"],"articleSection":["AI and Machine Learning"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.mygreatlearning.com\/blog\/the-vanishing-gradient-problem\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.mygreatlearning.com\/blog\/the-vanishing-gradient-problem\/","url":"https:\/\/www.mygreatlearning.com\/blog\/the-vanishing-gradient-problem\/","name":"What is Vanishing Gradient Problem?","isPartOf":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/the-vanishing-gradient-problem\/#primaryimage"},"image":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/the-vanishing-gradient-problem\/#primaryimage"},"thumbnailUrl":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/09\/iStock-1193832036.jpg","datePublished":"2020-09-19T10:30:06+00:00","dateModified":"2025-04-27T21:47:29+00:00","description":"Understand the vanishing gradient problem, its causes, impacts, and solutions.","breadcrumb":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/the-vanishing-gradient-problem\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.mygreatlearning.com\/blog\/the-vanishing-gradient-problem\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.mygreatlearning.com\/blog\/the-vanishing-gradient-problem\/#primaryimage","url":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/09\/iStock-1193832036.jpg","contentUrl":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/09\/iStock-1193832036.jpg","width":1254,"height":836,"caption":"vanishing gradient problem"},{"@type":"BreadcrumbList","@id":"https:\/\/www.mygreatlearning.com\/blog\/the-vanishing-gradient-problem\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog","item":"https:\/\/www.mygreatlearning.com\/blog\/"},{"@type":"ListItem","position":2,"name":"AI and Machine Learning","item":"https:\/\/www.mygreatlearning.com\/blog\/artificial-intelligence\/"},{"@type":"ListItem","position":3,"name":"The Vanishing Gradient Problem"}]},{"@type":"WebSite","@id":"https:\/\/www.mygreatlearning.com\/blog\/#website","url":"https:\/\/www.mygreatlearning.com\/blog\/","name":"Great Learning Blog","description":"Learn, Upskill &amp; Career Development Guide and Resources","publisher":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/#organization"},"alternateName":"Great Learning","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.mygreatlearning.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.mygreatlearning.com\/blog\/#organization","name":"Great Learning","url":"https:\/\/www.mygreatlearning.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.mygreatlearning.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2022\/06\/GL-Logo.jpg","contentUrl":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2022\/06\/GL-Logo.jpg","width":900,"height":900,"caption":"Great Learning"},"image":{"@id":"https:\/\/www.mygreatlearning.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/GreatLearningOfficial\/","https:\/\/x.com\/Great_Learning","https:\/\/www.instagram.com\/greatlearningofficial\/","https:\/\/www.linkedin.com\/school\/great-learning\/","https:\/\/in.pinterest.com\/greatlearning12\/","https:\/\/www.youtube.com\/user\/beaconelearning\/"],"description":"Great Learning is a leading global ed-tech company for professional training and higher education. It offers comprehensive, industry-relevant, hands-on learning programs across various business, technology, and interdisciplinary domains driving the digital economy. These programs are developed and offered in collaboration with the world's foremost academic institutions.","email":"info@mygreatlearning.com","legalName":"Great Learning Education Services Pvt. Ltd","foundingDate":"2013-11-29","numberOfEmployees":{"@type":"QuantitativeValue","minValue":"1001","maxValue":"5000"}},{"@type":"Person","@id":"https:\/\/www.mygreatlearning.com\/blog\/#\/schema\/person\/6f993d1be4c584a335951e836f2656ad","name":"Great Learning Editorial Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2022\/02\/unnamed.webp","url":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2022\/02\/unnamed.webp","contentUrl":"https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2022\/02\/unnamed.webp","caption":"Great Learning Editorial Team"},"description":"The Great Learning Editorial Staff includes a dynamic team of subject matter experts, instructors, and education professionals who combine their deep industry knowledge with innovative teaching methods. Their mission is to provide learners with the skills and insights needed to excel in their careers, whether through upskilling, reskilling, or transitioning into new fields.","sameAs":["https:\/\/www.mygreatlearning.com\/","https:\/\/in.linkedin.com\/school\/great-learning\/","https:\/\/x.com\/https:\/\/twitter.com\/Great_Learning","https:\/\/www.youtube.com\/channel\/UCObs0kLIrDjX2LLSybqNaEA"],"award":["Best EdTech Company of the Year 2024","Education Economictimes Outstanding Education\/Edtech Solution Provider of the Year 2024","Leading E-learning Platform 2024"],"url":"https:\/\/www.mygreatlearning.com\/blog\/author\/greatlearning\/"}]}},"uagb_featured_image_src":{"full":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/09\/iStock-1193832036.jpg",1254,836,false],"thumbnail":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/09\/iStock-1193832036-150x150.jpg",150,150,true],"medium":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/09\/iStock-1193832036-300x200.jpg",300,200,true],"medium_large":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/09\/iStock-1193832036-768x512.jpg",768,512,true],"large":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/09\/iStock-1193832036-1024x683.jpg",1024,683,true],"1536x1536":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/09\/iStock-1193832036.jpg",1254,836,false],"2048x2048":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/09\/iStock-1193832036.jpg",1254,836,false],"web-stories-poster-portrait":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/09\/iStock-1193832036.jpg",640,427,false],"web-stories-publisher-logo":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/09\/iStock-1193832036.jpg",96,64,false],"web-stories-thumbnail":["https:\/\/www.mygreatlearning.com\/blog\/wp-content\/uploads\/2020\/09\/iStock-1193832036.jpg",150,100,false]},"uagb_author_info":{"display_name":"Great Learning Editorial Team","author_link":"https:\/\/www.mygreatlearning.com\/blog\/author\/greatlearning\/"},"uagb_comment_info":0,"uagb_excerpt":"What is the Vanishing Gradient? The vanishing gradient is a phenomenon in which the\u2002gradients used to update the weights in neural networks become very small, tending closely towards zero. This becomes a problem when dealing with deep networks that have multiple hidden layers. In backpropagation, we pass the gradients of the loss function backward from&hellip;","_links":{"self":[{"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/posts\/20142","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/users\/41"}],"replies":[{"embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/comments?post=20142"}],"version-history":[{"count":11,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/posts\/20142\/revisions"}],"predecessor-version":[{"id":107068,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/posts\/20142\/revisions\/107068"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/media\/20147"}],"wp:attachment":[{"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/media?parent=20142"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/categories?post=20142"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/tags?post=20142"},{"taxonomy":"content_type","embeddable":true,"href":"https:\/\/www.mygreatlearning.com\/blog\/wp-json\/wp\/v2\/content_type?post=20142"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}