1. What is Tensorflow?
    – What are Tensors?
    – How to install Tensorflow
    – Tensorflow Basics

    Shape
    – Type
    Graph
    Session
    Operators
  2. Tensorflow Python Simplified
    Creating a Graph and Running it in a Session
  3. Linear Regression with TensorFlow
    What is Linear Regression? Predict Prices for California Houses Linear Classification with Tensorflow
    What is Linear Classification? How to Measure the performance of Linear Classifier?

    – Linear Model
  4. Visualizing the Graph
  5. What is Artificial Neural Network?
  6. Architecture Example of Neural Network in TensorFlow
  7. Tensorflow Graphs
  8. Difference between RNN & CNN
  9. Libraries
  10. What are the Applications of TensorFlow?
  11. What is Machine Learning?
  12. What makes TensorFlow popular?
  13. Specific Applications

What is TensorFlow?

Tensorflow is an open-source library for numerical computation and large-scale machine learning that ease Google Brain TensorFlow, the process of acquiring data, training models, serving predictions, and refining future results.

what is tensorflow

Tensorflow bundles together Machine Learning and Deep Learning models and algorithms. It uses Python as a convenient front-end and runs it efficiently in optimized C++.

Tensorflow allows developers to create a graph of computations to perform. Each node in the graph represents a mathematical operation and each connection represents data. Hence, instead of dealing with low-details like figuring out proper ways to hitch the output of one function to the input of another, the developer can focus on the overall logic of the application.

The deep learning artificial intelligence research team at Google, Google Brain, in the year 2015 developed TensorFlow for Google’s internal use. This Open-Source Software library is used by the research team to perform several important tasks.
TensorFlow is at present the most popular software library. There are several real-world applications of deep learning that makes TensorFlow popular. Being an Open-Source library for deep learning and machine learning, TensorFlow finds a role to play in text-based applications, image recognition, voice search, and many more. DeepFace, Facebook’s image recognition system, uses TensorFlow for image recognition. It is used by Apple’s Siri for voice recognition. Every Google app that you use has made good use of TensorFlow to make your experience better.

What are Tensors?

All the computations associated with TensorFlow involves the use of tensors.

A tensor is a vector/matrix of n-dimensions representing types of data. Values in a tensor hold identical data types with a known shape. This shape is the dimensionality of the matrix. A vector is a one-dimensional tensor; matrix a two-dimensional tensor. Obviously, a scalar is a zero dimensional tensor.

In the graph, computations are made possible through interconnections of tensors. The  mathematical operations are carried by the node of the tensor whereas the input-output relationships between nodes are explained by a tensor’s edge.
Thus TensorFlow takes an input in the form of an n-dimensional array/matrix (known as tensors) which flows through a system of several operations and comes out as output. Hence the name TensorFlow. A graph can be constructed to perform necessary operations at the output.

How to Install Tensorflow?

Assuming you have & set-up, tensorflow can be installed directly via pip. python jupyter-notebook

pip3 install --upgrade tensorflow

If you need GPU support, you will have to install instead of . tensorflow-gpu tensorflow 

To test your installation, simply run the following: 

$ python -c "import tensorflow; print(tensorflow.__version__)" 2.0.0

Tensorflow Basics

Tensorflow’s name is directly derived from its core component: . A tensor is a vector or matrix of n-dimensions that represents all types of Tensor data.

Shape 

Shape is the dimensionality of the matrix. In the image above, the shape of the tensor is . (2,2,2) 

Type 

Type represents the kind of data (integers, strings, floating-point values, etc). All values in a tensor hold identical data type. 

Graph

Graph is a set of computation that takes place successively on input tensors. Basically, a graph is just an arrangement of nodes that represent the operations in your model. 

Session 

Session encapsulates the environment in which the evaluation of the graph takes place.

Operators 

Operators are pre-defined basic mathematical operations. Examples: 

tf.add(a, b) tf.substract(a, b) 

Tensorflow also allows users to define custom operators, e.g., increment by 5, which is an advanced use-case and out of scope for this article. 

Tensorflow Python Simplified 

Creating a Graph and Running it in a Session 

A tensor is an object with three properties: 

  • A unique label (name)
  • A dimension (shape)
  • A data type (dtype) 

Each operation you will do with TensorFlow involves the manipulation of a tensor. There are four main tensors that you can create: 

  • tf.variable tf.constant tf.placeholder tf.SparseTensor 

Constants are (guess what!), constants. As their name states, their value doesn’t change. We’d usually need our network parameters to be updated and that’s where the comes into play. variable 

Following code creates the graph represented in Figure-1:

import tensorflow as tf x = tf.Variable(3, name="x") y = tf.Variable(4, name="y") f = ((x * x) * y) + (y + 2)

The most important thing to understand is that this code does not actually perform any computation, even though it looks like it does (especially the last line). It just creates a computation graph. In fact, even the variables are not initialized yet. To evaluate this graph, you need to open a TensorFlow and use it to initialize the variables and evaluate . A TensorFlow session takes care of placing the operations onto s session f devices uch as CPUs and GPUs and running them, and it holds all the variable values. 

The following code creates a session, initializes the variables, and evaluates, then closes the session (which frees up resources):

sess = tf.Session()
sess.run(x.initializer)
sess.run(y.initializer) result =
sess.run(f) print(result) # 42
sess.close()

There is also a better way:

with tf.Session() as sess: 
x.initializer.run()
y.initializer.run()
result = f.eval()

Inside the with block, the session is set as the default session. Calling is equivalent to calling x.initializer.run() tf.get_default_sess , and similarly is equivalent to calling . This makes the code ion().run(x.initializer) f.eval() tf.get_default_session().run(f) easier to read. Moreover, the session is automatically closed at the end of the block. 

Instead of manually running the initializer for every single variable, you can use the function. Note that it global_variables_initializer() does not actually perform the initialization immediately, but rather creates a node in the graph that will initialize all variables when it is run:

init = tf.global_variables_initializer() # prepare an init node with tf.Session() as sess:
init.run() # actually initialize all the variables result = f.eval()

Linear Regression with TensorFlow

What is Linear Regression?

Imagine you have two variables, x and y, and your task is to predict the value of knowing the value of. If you plot the data, you can see a positive relationship between your independent variable, x, and your dependent variable y.

You may observe, if x=1, y will roughly be equal to 6 and if x=2, y will be around 8.5.

This is not a very accurate method and prone to error, especially with a dataset with hundreds of thousands of points. 

Linear regression is evaluated with an equation. The variable y is explained by one or many covariates. In your example, there is only one dependent variable. If you have to write this equation, If you have to write this equation, it will be: 

y = + X +

With: is the bias. i.e. if x=0, y= 

is the weight associated with x. i.e., if x = 1, y = is the residual or error of the model. It includes what the model cannot learn from the data.

Imagine you fit the model and you find the following solution for: 

= 3.8 = 2.78 

You can substitute those numbers in the equation and it becomes: y= 3.8 + 2.78x 

You have now a better way to find the values for y. That is, you can replace x with any value you want to predict y. In the image below, we have replaced x in the equation with all the values in the dataset and plot the result.

The red line represents the fitted value, that is the value of y for each value of x. You don’t need to see the value of x to predict y, for each x there is a y that belongs to the red line. You can also predict for values of x higher than 2.

The algorithm will choose a random number for each and and replace the value of x to get the predicted value of y. If the dataset has 100 observations, the algorithm computes 100 predicted values. 

We can compute the error, noted of the model, which is the difference between the predicted value and the real value. A positive error means the model underestimates the prediction of y, and a negative error means the model overestimates the prediction of y. 

= y – ypred 

Your goal is to minimize the square of the error. The algorithm computes the mean of the square error. This step is called the minimization of the error. For linear regression is the , also called MSE. Mathematically, it is: Mean Square Error 

the algorithm computes 100 predicted values. 

We can compute the error, noted of the model, which is the difference between the predicted value and the real value. A positive error means the model underestimates the prediction of y, and a negative error means the model overestimates the prediction of y. 

= y – ypred 

Your goal is to minimize the square of the error. The algorithm computes the mean of the square error. This step is called the minimization of the error. For linear regression is the , also called MSE. Mathematically, it is: Mean Square Error 

the algorithm computes 100 predicted values. 

We can compute the error, noted of the model, which is the difference between the predicted value and the real value. A positive error means the model underestimates the prediction of y, and a negative error means the model overestimates the prediction of y. 

= y – ypred 

Your goal is to minimize the square of the error. The algorithm computes the mean of the square error. This step is called the minimization of the error. For linear regression is the , also called MSE. Mathematically, it is: Mean Square Error 

the algorithm computes 100 predicted values. 

We can compute the error, noted of the model, which is the difference between the predicted value and the real value. A positive error means the model underestimates the prediction of y, and a negative error means the model overestimates the prediction of y. 

= y – ypred 

Your goal is to minimize the square of the error. The algorithm computes the mean of the square error. This step is called the minimization of the error. For linear regression is the , also called MSE. Mathematically, it is: Mean Square Error

Where: 

is the weights so X refers to the predicted value T T i y is the real values m is the number of observations 

The goal is to find the best that minimizes the MSE. 

If the average error is large, it means the model performs poorly and the weights are not chosen properly. To correct the weights, you need to use an optimizer. The traditional optimizer is called . Gradient Descent 

The gradient descent takes the derivative and decreases or increases the weight. If the derivative is positive, the weight is decreased. If the derivative is negative, the weight increases. The model will update the weights and recompute the error. This process is repeated until the error does not change anymore. Each process is called an . Besides, the gradients are multiplied by a learning rate. It indicates the speed of iteration the learning. 

If the learning rate is too small, it will take a very long time for the algorithm to converge (i.e requires lots of iterations). If the learning rate is too high, the algorithm might never converge.

Predict Prices for California Houses

scikit-learn provides tools to load larger datasets, downloading them if necessary. We’ll be using the California Housing Dataset for Regression Problem. 

We are fetching the dataset, and adding an extra bias input feature to all training instances.

import numpy as np
from sklearn.datasets import fetch_california_housing housing = fetch_california_housing() m, n = housing.data.shape 
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

Following is the code for performing a linear regression on the dataset

n_epochs = 1000 learning_rate = 0.01 
X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X") y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y") theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0), name="theta") y_pred = tf.matmul(X, theta, name="predictions") error = y_pred - y mse = tf.reduce_mean(tf.square(error), name="mse") gradients = tf.gradients(mse, [theta])[0] training_op = tf.assign(theta, theta - learning_rate * gradients) 
init = tf.global_variables_initializer() with tf.Session() as sess: 
sess.run(init) for epoch in range(n_epochs): 
if epoch%100==0: 
print("Epoch", epoch, "MSE =", mse.eval()) sess.run(training_op) 
best_theta = theta.eval()

The main loop executes the training step over and over again (n_epochs times), and every 100 iterations it prints out the current Mean Squared Error (mse). 

TensorFlow’s autodiff feature can automatically and efficiently compute the gradients for you. The gradients() function takes an op (in this case mse) and a list of variables (in this case just theta), and it creates a list of ops (one per variable) to compute the gradients of the op with regards to each variable. So the gradients node will compute the gradient vector of the MSE with regards to theta.

Linear Classification with Tensorflow

What is Linear Classification?

Classification aims at predicting the probability of each class given a set of inputs. The label (i.e., the dependent variable) is a discrete value, called a class. 

1. If the label has only two classes, the learning algorithm is a binary classifier.
2.The multiclass classifier tackles labels with more than two classes.

For instance, a typical binary classification problem is to predict the likelihood a customer makes a second purchase. Predict the type of animal displayed on a picture is a multiclass classification problem since there are more than two varieties of animal existing. 

For a binary task, the label can have had two possible integer values. In most case, it is either [0,1] or [1,2]. For instance, the objective is to predict whether a customer will buy a product or not. The label is defined as follow: 

Y = 1 (customer purchased the product)
Y = 0 (customer does not purchase the product) 

The model uses the features X to classify each customer in the most likely class he belongs to, namely, a potential buyer or not. The probability of success is computed with . The algorithm will compute a probability based on feature X and predicts a logistic regression success when this probability is above 50 percent. More formally, the probability is calculated as follow:

where 0 is the set of weights, the features, and b the bias. 

The function can be decomposed into two parts: 

  • The linear model
  • The logistic function 

Linear model 

You are already familiar with the way the weights are computed. Weights are computed using a dot product: Y is a linear function of all the features x . If the model does not have features, the prediction is equal to the bias, b.

The weights indicate the direction of the correlation between the features x and the label y. A positive correlation increases the probability of the i positive class while a negative correlation leads the probability closer to 0, (i.e., negative class). 

The linear model returns only real number, which is inconsistent with the probability measure of range [0,1]. The logistic function is required to convert the linear model output to a probability.

Logistic function

The logistic function, or sigmoid function, has an S-shape and the output of this function is always between 0 and 1.

It is easy to substitute the output of the linear regression into the sigmoid function. It results in a new number with a probability between 0 and 1. 

The classifier can transform the probability into a class 

Values between 0 to 0.49 become class 0
Values between 0.5 to 1 become class 1 

How to Measure the performance of Linear Classifier? 

Accuracy 

The overall performance of a classifier is measured with the accuracy metric. Accuracy collects all the correct values divided by the total number of observations. For instance, an accuracy value of 80 percent means the model is correct in 80 percent of the cases.

You can note a shortcoming with this metric, especially for imbalance class. An imbalance dataset occurs when the number of observations per group is not equal. Let’s say; you try to classify a rare event with a logistic function. Imagine the classifier tries to estimate the death of a patient following a disease. In the data, 5 percent of the patients pass away. You can train a classifier to predict the number of death and use the accuracy metric to evaluate the performances. If the classifier predicts 0 death for the entire dataset, it will be correct in 95 percent of the case. 

Confusion matrix 

A better way to assess the performance of a classifier is to look at the confusion matrix.

Precision & Recall

Recall: The ability of a classification model to identify all relevant instances Precision: The ability of a classification model to return only relevant instances

Classification on Income Level using Census Dataset 

Load Data. The data stored online are already divided between a train set and a test set.

import tensorflow as tf import pandas as pd 
## Define path data COLUMNS = ['age','workclass', 'fnlwgt', 'education', 'education_num', 'marital', 
'occupation', 'relationship', 'race', 'sex', 'capital_gain', 'capital_loss', 
'hours_week', 'native_country', 'label'] PATH = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.d ata" PATH_test = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.t est" 
df_train = pd.read_csv(PATH, skipinitialspace=True, names = COLUMNS, index_col=False) df_test = pd.read_csv(PATH_test,skiprows = 1, skipinitialspace=True, names = COLUMNS, index_col=False)

Tensorflow requires a Boolean value to train the classifier. You need to cast the values from string to integer. The label is store as an object, however, you need to convert it into a numeric value. The code below creates a dictionary with the values to convert and loop over the column item. Note that you perform this operation twice, one for the train test, one for the test set.

label = {'<=50K': 0,'>50K': 1} df_train.label = [label[item] for item in df_train.label] label_t = {'<=50K.': 0,'>50K.': 1} df_test.label = [label_t[item] for item in df_test.label]

Define the model.

model = tf.estimator.LinearClassifier( 
n_classes = 2, model_dir="ongoing/train", feature_columns=COLUMNS)

Train the model.

LABEL= 'label' def get_input_fn(data_set, num_epochs=None, n_batch = 128, shuffle=True): 
return tf.estimator.inputs.pandas_input_fn( 
x=pd.DataFrame({k: data_set[k].values for k in COLUMNS}), y = pd.Series(data_set[LABEL].values), batch_size=n_batch, num_epochs=num_epochs, shuffle=shuffle)
model.train(input_fn=get_input_fn(df_train, 
num_epochs=None, n_batch = 128, shuffle=False), steps=1000)

Evaluate the model.

model.evaluate(input_fn=get_input_fn(df_test, 
num_epochs=1, n_batch = 128, shuffle=False), steps=1000)

Visualizing the Graph

So now we have a computation graph that trains a Linear Regression model using Mini-batch Gradient Descent, and we are saving checkpoints at regular intervals. However, we are still relying on the function to visualize progress during training. There is a better way: enter print() Tenso . If you feed it some training stats, it will display nice interactive visualizations of these stats in your web browser (e.g., learning curves). rBoard You can also provide it the graph’s definition and it will give you a great interface to browse through it. This is very useful to identify errors in the graph, to find bottlenecks, and so on. 

The first step is to tweak your program a bit so it writes the graph definition and some training stats – for example, the training error (MSE) – to a log directory that TensorBoard will read from. You need to use a different log directory every time you run your program, or else TensorBoard will merge stats from different runs, which will mess up the visualizations. The simplest solution for this is to include a timestamp in the log directory name. Add the following code at the beginning of the program:

from datetime import datetime now = datetime.utcnow().strftime("%Y%m%d%H%M%S") root_logdir = "tf_logs" logdir = "{}/run-{}/".format(root_logdir, now)

Next, add the following code at the very end of the construction phase:

mse_summary = tf.summary.scalar('MSE', mse) file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

The first line creates a node in the graph that will evaluate the MSE value and write it to a TensorBoard-compatible binary log string called a sum mary . The second line creates a FileWriter that you will use to write summaries to logfiles in the log directory. The first parameter indicates the path of the log directory (in this case something like tf_logs/run-20200229130405/ , relative to the current directory). The second (optional) parameter is the graph you want to visualize. Upon creation, the FileWriter creates the log directory if it does not already exist (and its parent directories if needed), and writes the graph definition in a binary logfile called an events file . Next you need to update the execution phase to evaluate the mse_summary node regularly during training (e.g., every 10 mini-batches). This will output a summary that you can then write to the events file using the file_writer . Finally, the file_writer needs to be closed at the end of the program. Here is the updated code:

for batch_index in range(n_batches): 
X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size) if batch_index % 10 == 0: 
summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch}) step = epoch * n_batches + batch_index file_writer.add_summary(summary_str, step) sess.run(training_op, feed_dict={X: X_batch, y: y_batch}) 
file_writer.close()

Now when you run the program, it will create the log directory tf_logs/run-20200229130405 and write an events file in this directory, containing both the graph definition and the MSE values. If you run the program again, a new directory will be created under the tf_logs director y, e.g., tf_logs/run-20200229130526 . Now that we have the data, let’s fire up the TensorBoard server. To do so, simply run the tensorboard command pointing it to the root log directory. This starts the TensorBoard.

web server, listening on port 6006 (which is “goog” written upside down): $ tensorboard --logdir tf_logs/ Starting TensorBoard on port 6006 (You can navigate to http://0.0.0.0:6006)

What is Artificial Neural Network?

An Artificial Neural Network(ANN) is composed of four principal objects: 

Layers: all the learning occurs in the layers. There are 3 layers 

1. Input
2. Hidden
3. Output 

  • Feature and Label: Input data to the network(features) and output from the network (labels)
  • Loss function: Metric used to estimate the performance of the learning phase
  • Optimizer: Improve the learning by updating the knowledge in the network.

A neural network will take the input data and push them into an ensemble of layers. The network needs to evaluate its performance with a loss function. The loss function gives to the network an idea of the path it needs to take before it masters the knowledge. The network needs to improve its knowledge with the help of an optimizer.

If you take a look at the figure below, you will understand the underlying mechanism.

The program takes some input values and pushes them into two fully connected layers. Imagine you have a math problem, the first thing you do is to read the corresponding chapter to solve the problem. You apply your new knowledge to solve the problem. There is a high chance you will not score very well. It is the same for a network. The first time it sees the data and makes a prediction, it will not match perfectly with the actual data. 

To improve its knowledge, the network uses an optimizer. In our analogy, an optimizer can be thought of as rereading the chapter. You gain new insights/lesson by reading again. Similarly, the network uses the optimizer, updates its knowledge, and tests its new knowledge to check how much it still needs to learn. The program will repeat this step until it makes the lowest error possible. 

In our math problem analogy, it means you read the textbook chapter many times until you thoroughly understand the course content. Even after reading multiple times, if you keep making an error, it means you reached the knowledge capacity with the current material. You need to use different textbook or test different method to improve your score. For a neural network, it is the same process. If the error is far from 100%, but the curve is flat, it means with the current architecture; it cannot learn anything else. The network has to be better optimized to improve the knowledge.

Neural Network Architecture

Layers 

A layer is where all the learning takes place. Inside a layer, there are a large number of weights (neurons). A typical neural network is often processed by densely connected layers (also called fully connected layers). It means all the inputs are connected to all the outputs. 

A typical neural network takes a vector of input and a scalar that contains the labels. The most comfortable set up is a binary classification with only two classes: 0 and 1. 

The network takes an input, sends it to all connected nodes and computes the signal with an activation function.


The figure above plots this idea. The first layer is the input values for the second layer, called the hidden layer, receives the weighted input from the previous layer.

  1. The first node is the input values.
  2. The neuron is decomposed into the input part and the activation function. The left part receives all the input from the previous layer. The right part is the sum of the input passes into an activation function.
  3. Output value computed from the hidden layers and used to make a prediction. For classification, it is equal to the number of class. For regression, only one value is predicted.

Activation function 

The activation function of a node defines the output given a set of inputs. You need an activation function to allow the network to learn non-linear pattern. A common activation function is a The function gives a zero for all negative values. Relu, Rectified linear unit.

The other activation functions are: 

  • Piecewise Linear
  • Sigmoid
  • Tanh
  • Leaky Relu 

The critical decision to make when building a neural network is: 

  • How many layers in the neural network
  • How many hidden units for each layer 

Neural network with lots of layers and hidden units can learn a complex representation of the data, but it makes the network’s computation very expensive. 

Loss function

After you have defined the hidden layers and the activation function, you need to specify the loss function and the optimizer. 

For binary classification, it is common practice to use a binary cross entropy loss function. In the linear regression, you use the mean square error. 

The loss function is an important metric to estimate the performance of the optimizer. During the training, this metric will be minimized. You need to select this quantity carefully depending on the type of problem you are dealing with. 

Optimizer 

The loss function is a measure of the model’s performance. The optimizer will help improve the weights of the network in order to decrease the loss. There are different optimizers available, but the most common one is the Stochastic Gradient Descent. 

The conventional optimizers are: 

  • Momentum optimization,
  • Nesterov Accelerated Gradient,
  • AdaGrad,
  • Adam optimization 

Example Neural Network in TensorFlow 

We will use the MNIST dataset to train your first neural network. Training a neural network with Tensorflow is not very complicated. The preprocessing step looks precisely the same as in the previous tutorials. You will proceed as follow: 

  • Step 1: Import the data
  • Step 2: Transform the data
  • Step 3: Construct the tensor
  • Step 4: Build the model
  • Step 5: Train and evaluate the model
  • Step 6: Improve the model
import numpy as np import tensorflow as tf np.random.seed(42)
from sklearn.datasets import fetch_mldata mnist = fetch_mldata(' /Users/Thomas/Dropbox/Learning/Upwork/tuto_TF/data/mldata/MNIST original') print(mnist.data.shape) print(mnist.target.shape)
from sklearn.model_selection import train_test_split 
X_train, X_test, y_train, y_test = train_test_split(mnist.data, mnist.target, test_size=0.2, random_state=42) y_train = y_train.astype(int) y_test = y_test.astype(int) batch_size =len(X_train) 
print(X_train.shape, y_train.shape,y_test.shape )
from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() X_train_scaled = scaler.fit_transform(X_train.astype(np.float64)) X_test_scaled = scaler.fit_transform(X_test.astype(np.float64))
feature_columns = [tf.feature_column.numeric_column('x', shape=X_train_scaled.shape[1:])] 
estimator = tf.estimator.DNNClassifier( 
feature_columns=feature_columns, hidden_units=[300, 100], n_classes=10, model_dir = '/train/DNN')

Train and evaluate the model

# Train the estimator train_input = tf.estimator.inputs.numpy_input_fn( 
x={"x": X_train_scaled}, y=y_train, batch_size=50, shuffle=False, num_epochs=None) estimator.train(input_fn = train_input,steps=1000) eval_input = tf.estimator.inputs.numpy_input_fn( 
x={"x": X_test_scaled}, y=y_test, shuffle=False, batch_size=X_test_scaled.shape[0], num_epochs=1) estimator.evaluate(eval_input,steps=None)

Tensorflow Graphs

TensorFlow Graphs are generally sets of connected nodes sometimes referred as vertices and the connections are referred to as edges.  The node functions as an input which involves some operations to give a preferable output.

In the above diagram, n1 and n2 are the two nodes having values 1 and 2 respectively and an adding operation which happens at node n3 will help us get the output. We will try to perform the same operation using Tensorflow in Python.

We will import tensorflow and define the nodes n1 and n2 first.

import tensorflow as tf
node1 = tf.constant(1)
node2 = tf.constant(2)

Now we perform adding operation which will be the output

node3 = node1 + node2

Now remember we have to run a tensorflow session in order to get the output. We will use ‘with’ command in order to auto-close the session after executing the output.

with tf.Session() as sess:
    result = sess.run(node3)
print(result)
Output-3

This is how tensorflow graph works.

After a quick overview of tensor graph, it is essential to know the objects used in a tensor graph. Basically, there are two types of objects used in a tensor graph.

a) Variables

b) Placeholders.

Variables and Placeholders.

Variables

During the optimization process, tensorflow tend to tune the model by taking care of the parameters present in the model. Variables are a part of tensor graph which are capable to hold the values of weights and biases obtained throughout the session. They need proper initialization which we will cover throughout the coding session.

Placeholders

Placeholders are also an object of tensor graph which are typically empty and they are used to feed in actual training examples. They hold a condition that they require can expected declared data type such as ‘tf. float32’ with an optional shape argument.

Let’s jump into the example for explaining these two objects.
First, we import tensorflow

import tensorflow as tf

Always it is important to run a session when we use tensorflow. So, we will be running an interactive session in order to perform the further task.

sess = tf.InteractiveSession()

In order to define a variable, we can take some random numbers ranging from 0 to 1 in a 4×4 matrix.

my_tensor = tf.random_uniform((4,4),0,1)
my_variable = tf.Variable(initial_value=my_tensor)

In order to see the variables, we need to initialize a global variable and run it to get the actual variables. Let us do that

init = tf.global_variables_initializer()
init.run()
sess.run(my_variable)

Now sess.run() usually runs a session and it is time to see the output i.e., variables

array ([[ 0.18764639, 0.76903498, 0.88519645, 0.89911747],
       [ 0.18354201, 0.63433743, 0.42470503, 0.27359927],
       [ 0.45305872, 0.65249109, 0.74132109, 0.19152677],
       [ 0.60576665, 0.71895587, 0.69150388, 0.33336747]], dtype=float32)

So, these re the variables ranging from 0 to 1 in a shape of 4 by 4
Now it is time to run a simple placeholder.
In order to define and initialize a placeholder, we need to do the following

Place_h = tf.placeholder(tf.float64)

It is common to use float64 data type but we can also use float32 datatype which is more flexible.

Here we can put ‘None’ or number of features in shape, because ‘None’ can be filled by number of samples in data.

Case Study

Now we will be using case studies which will perform both regression as well as classification.

Regression using Tensorflow

Let us deal with the regression first. In order to perform regression, we will use California Housing data where we will be predicting the value of the blocks using data such as, income, population, number of bedrooms etc.

Let us jump into the data for a quick overview.

import pandas as pd
housing_data = pd.read_csv('cal_housing_clean.csv')
housing_data.head()

Let us have a quick summary of the data

Housing_data.describe().transpose()

Let us select the features and the target variable in order to perform splitting. Splitting is done for training and testing the model.  We can take 70% for training and rest for testing.

x_data = housing_data.drop(['medianHouseValue'],axis=1)
y_val = housing_data['medianHouseValue']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test=train_test_split (x_data, y_val,test_size=0.3,random_state=101)

Now scaling is necessary for this type of data as they contain continuous variables.

So, we will apply MinMaxScaler from sklearn library. We will apply for both training and testing data.

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(X_train)

X_train=pd.DataFrame(data=scaler.transform(X_train),columns= X_train.columns,index=X_train.index)
X_test=pd.DataFrame(data=scaler.transform(X_test),columns= X_test.columns,index=X_test.index)

So, from the above commands, the scaling is done. Now as we are using Tensorflow, it is necessary to convert all the feature columns into continuous numeric columns for the estimators. In order to do that we use a command called tf.feature_column.

Let us import tensorflow and and assign each operation to a variable.

import tensorflow as tf
house_age = tf.feature_column.numeric_column('housingMedianAge')
total_rooms = tf.feature_column.numeric_column('totalRooms')
total_bedrooms=tf.feature_column.numeric_column('totalBedrooms')
population_total= tf.feature_column.numeric_column('population')
households = tf.feature_column.numeric_column('households')
total_income = tf.feature_column.numeric_column('medianIncome')
feature_cols= [house_age,total_rooms, total_bedrooms, population_total, households,total_income]

Now let us create an input function for the estimator object. The parameters such as batch size and epochs can be explored as per our wish as increase in epochs and batch size tend to increase in accuracy of the model. We will use DNN Regressor in order to predict the house value of California.

input_function=tf.estimator.inputs.pandas_input_fn(x=X_train,y=y_train ,batch_size=10,num_epochs=1000,shuffle=True)
regressor=tf.estimator.DNNRegressor(hidden_units=[6,6,6],feature_columns=feature_cols)

While fitting the data, we have used 3 hidden layers in order to build the model. We can also increase the layers but notice, increase of hidden layers can give us an overfitting issue which should be prevented. So, 3 hidden layers are ideal for building a neural network.

Now for prediction, we need to create a predict function and then use. predict () method which will create a list of predictions on the test data.

predict_input_function=tf.estimator.inputs.pandas_input_fn(x=X_test,batch_size=10,num_epochs=1,shuffle=False)
pred_gen =regressor.predict(predict_input_function)

Here pred_gen will be basically a generator which will generate the predictions. In order to look into the predictions, we have to put into the list.

predictions = list(pred_gen)

Now after the prediction is done, we have to evaluate the model. RMSE or Root Mean Squared Error is a great choice for evaluating regression problem. Let us look into that.

final_preds = []
for pred in predictions:
    final_preds.append(pred['predictions'])
from sklearn.metrics import mean_squared_error
mean_squared_error(y_test,final_preds)**0.5

Now after we execute, we get an RMSE of 97921.93181985477 which is expected as the units of median house value is same as RMSE. So here we go. The regression task is over. Now it is time for classification.

Classification using tensorflow. 

Classification is used for data having classes as target variables. Now we will take California Census data and classify whether a person earns more than 50000 dollars or less depending on data such as education, age, occupation, marital status, gender, etc.

Let us look into the data for an overview

import pandas as pd
census_data = pd.read_csv("census_data.csv")	
census_data.head()

Here we can see lots of columns which are categorical which need to be taken care of. On the other hand, the income column which is the target variable contain strings. As tensorflow is unable to understand string as labels, we have to build a custom function so that it converts strings to binary labels, 0 and 1.

def labels(class):
    if class==' <=50K':
        return 0
    else:
        return 1
census_data['income_bracket’] =census_data['income_bracket']. apply(labels)

There are other ways to do that. But this is considered much easy and interpretable.

We will start splitting the data for training and testing.

from sklearn.model_selection import train_test_split
x_data = census_data.drop('income_bracket',axis=1)
y_labels = census_data ['income_bracket']
X_train, X_test, y_train, y_test=train_test_split(x_data, y_labels,test_size=0.3,random_state=101)

After that, we must take care of the categorical variables and numeric features.

gender_data=tf.feature_column.categorical_column_with_vocabulary_list("gender", ["Female", "Male"])
occupation_data=tf.feature_column.categorical_column_with_hash_bucket("occupation", hash_bucket_size=1000)
marital_status_data=tf.feature_column.categorical_column_with_hash_bucket("marital_status", hash_bucket_size=1000)
relationship_data=tf.feature_column.categorical_column_with_hash_bucket("relationship", hash_bucket_size=1000)
education_data=tf.feature_column.categorical_column_with_hash_bucket("education", hash_bucket_size=1000)
workclass_data=tf.feature_column.categorical_column_with_hash_bucket("workclass", hash_bucket_size=1000)
native_country_data=tf.feature_column.categorical_column_with_hash_bucket("native_country", hash_bucket_size=1000)

Now we will take care of the feature columns containing numeric values.

age_data = tf.feature_column.numeric_column("age")
education_num_data=tf.feature_column.numeric_column("education_num")
capital_gain_data=tf.feature_column.numeric_column("capital_gain")
capital_loss_data=tf.feature_column.numeric_column("capital_loss")
hours_per_week_data=tf.feature_column.numeric_column("hours_per_week”)

Now we will combine all these variables and put these into a list.

feature_cols=[gender_data,occupation_data,marital_status_data,relationship_data,education_data,workclass_data,native_country_data,age_data,education_num_data,capital_gain_data,capital_loss_data,hours_per_week_data]

Now all the preprocessing part is done and our data is ready. Let us create an input function and fit the model.

input_func=tf.estimator.inputs.pandas_input_fn(x=X_train,y=y_train,batch_size=100,num_epochs=None,shuffle=True)
classifier=tf.estimator.LinearClassifier(feature_columns=feature_cols)

Let us train the model for atleast 5000 steps

classifier.train(input_fn=input_func,steps=5000)

After the training, it is time to predict the outcome

pred_fn=tf.estimator.inputs.pandas_input_fn(x=X_test,batch_size=len(X_test),shuffle=False)

This will produce a generator which is needed to be converted to a list in order to look into the predictions.

predicted_data = list(classifier.predict(input_fn=pred_fn))

The prediction is done. Now let us take a single test data to look into the predictions

predicted_data[0]
{'class_ids': array([0], dtype=int64),
 'classes': array([b'0'], dtype=object),
 'logistic': array([ 0.21327116], dtype=float32),
 'logits': array([-1.30531931], dtype=float32),
 'probabilities': array([ 0.78672886,  0.21327116], dtype=float32)}

From the above dictionary, we need only class_ids in order to make a comparison with the real test data. Let us extract that.

final_predictions = []
for pred in predicted_data:
    final_predictions.append(pred['class_ids'][0])
final_predictions[:10]

This will give the first 10 predictions.

[0, 0, 0, 0, 1, 0, 0, 0, 0, 0]

 As it is less intuitive to make an inference, we will evaluate it. 

from sklearn.metrics import classification_report
print(classification_report(y_test,final_predictions))

Now we can look into the metrics such as precision and recall to evaluate how our model performed.

The model performed quite good for those people whose income is less than 50K dollars than those earning more than 50K dollars. That’s it for now. This is how tensorflow is used when we perform regression and classification.

Saving and Loading a Model

Tensorflow provides a feature to load and save a model. After saving a model, we can be able to execute any piece of code without running the entire code in tensorflow. Let us illustrate the concept with an example.

We will be using a regression example with some made up data. For that, let us import all the necessary libraries.

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
np.random.seed(101)
tf.set_random_seed(101)

Now the regression works on a straight-line equation which is y=mx+b

We will create some made up data for x and y.

x = np.linspace(0,10,10) + np.random.uniform(-1.5,1.5,10)
x
array([ 0.04919588,  1.32311387,  0.8076449 ,  2.3478983 ,  5.00027539,
        6.55724614, 6.08756533, 8.95861702, 9.55352047, 9.06981686])
y = np.linspace(0,10,10) + np.random.uniform(-1.5,1.5,10)

Now it is time to plot the data to see whether it is linear or not.

plt.plot(x,y,'*')

Let us now add the variables which is the coefficient and the bias.

m = tf.Variable(0.39)
c = tf.Variable(0.2)

Now we have to define a cost function which is nothing but the error in our case.

error = tf.reduce_mean(y - (m*x +c))

Now let us define an optimizer to tune a model and train the model minimizing the error.

optimizer=tf.train.GradientDescentOptimizer(learning_rate=0.001)
train = optimizer.minimize(error)

Now before saving, in tensorflow we have already discussed that we need to initialize the global variable.

init = tf.global_variables_initializer()

Now let us save the model

saver = tf.train.Saver()

Now we will use the saver variable to create and run the session

with tf.Session() as sess:
    sess.run(init)
    epochs = 100
    for i in range(epochs):
        sess.run(train)
    # fetching back the Results
    final_slope , final_intercept = sess.run([m,c])
    saver.save(sess,'new_models/my_second_model.ckpt')

Now the model is saved to a checkpoint. Now let us evaluate the result.

x_test = np.linspace(-1,11,10)
y_prediction_plot = final_slope*x_test + final_intercept
plt.plot(x_test,y_prediction_plot,'r')
plt.plot(x,y,'*')

Now it’s time to load the model. Let us load the model and restore the checkpoint to see whether we get the result or not.

with tf.Session() as sess:
    # For restoring the model
    saver.restore(sess,'new_models/my_second_model.ckpt')
    # Let us fetch back the result
    restore_slope , restore_intercept = sess.run([m,c])

Now let us plot again with the restored parameters.

x_test = np.linspace(-1,11,10)
y_prediction_plot = restore_slope*x_test + restore_intercept
plt.plot(x_test,y_prediction_plot,'r')
plt.plot(x,y,'*')

Optimizers an Overview

When we take an interest to build a deep learning model, it is necessary to understand the concept of a parameter called optimizers.  Optimizers helps us to reduce the value of the cost function used in the model. The cost function is nothing but the error function which we want to reduce during the model building and it largely depends on the internal parameters of the model. As for example, every regression equation contains a weight and bias in order to build a model and in these parameters, the optimizers play a crucial role to find out the optimal values in order to increase the accuracy of the model.

Optimizers generally fall into two categories.

  1. First Order Optimizers
  2. Second Order Optimizers.

First Order Opimizers use a gradient value to deal with their parameters. A gradient value is a function rate which tells us the changing of target variable with respect to its features. A commonly used first order optimizer is Gradient Descent Optimizer.

On the other hand. Second order optimizers increase or decrease the loss function by using second order derivatives. They are much time consuming and take much consuming power compared to first order optimizers. Hence, less used.

Some of the commonly used optimizers are:

SGD (Stochastic Gradient Descent)

If we have 50000 datapoints with 10 features, then it is necessary to compute 50000*10 times on each iteration. So, let us consider 500 iterations for building a model which will take 50000*10*500 computations in total to complete the process. So, for this huge processing, SGD or stochastic gradient descent comes into play. It generally takes a single data point for an iteration to reduce the computing process and works on the loss functions of the model.

Adam

Adam stands for Adaptive Moment Estimation which estimates the loss function by adopting unique learning rate for each parameter. The learning rates keep on decreasing on some optimizers due to adding squared gradients and they tend to decay at some point. Adam optimizers take care of that and it prevents high variance of the parameter and disappearing learning rates also known as decay learning rates.

Adagrad

This optimizer is suitable for sparse data as it deals with the learning rates based on the parameters. We do not need to tune the learning rate manually. But it has a demerit of vanishing learning rate because of the gradient accumulation at every iteration.

RMSprop

It is similar to Adagrad as it also uses an average of the gradient on every step of learning rate. It does not work well on large datasets and also it leads to violate the rules which is used by SGD optimizer.

Let’s us perform these optimizers using using keras. If you are confused, keras is a subset library provided by tensorflow which is used to compute advance deep learning models. So, you see, everything is linked.

We will be using a logistic regression model which involves only two classes. We will just focus on the optimizers without going deep into the entire model.

Let us import the libraries and set a learning rate

from keras.optimizers import SGD, Adam, Adagrad, RMSprop
dflist = []
optimizers = ['SGD (lr=0.01)',
              'SGD (lr=0.01, momentum=0.3)',
              'SGD (lr=0.01, momentum=0.3, nesterov=True)',  
              'Adam(lr=0.01)',
              'Adagrad(lr=0.01)',
              'RMSprop(lr=0.01)']

Now we will compile the learning rates and evaluate

for opt_name in optimizers:
    K.clear_session()
    model = Sequential ()
    model.add(Dense(1, input_shape=(4,), activation='sigmoid'))
    model.compile(loss='binary_crossentropy',
                  optimizer=eval(opt_name),
                  metrics=['accuracy'])
    h = model.fit(X_train, y_train, batch_size=16, epochs=5, verbose=0)
    dflist.append(pd.DataFrame(h.history, index=h.epoch))
historydf = pd.concat(dflist, axis=1)
metrics_reported = dflist[0].columns
idx = pd.MultiIndex.from_product([optimizers, metrics_reported],
                                 names=['optimizers', 'metric'])

Now we will plot and look the performances of the optimizers.

historydf.columns = idx
ax = plt.subplot(211)
historydf.xs('loss', axis=1, level='metric').plot(ylim=(0,1), ax=ax)
plt.title("Loss")

If we look into the graph, we can see that ADAM optimizer performed the best and SGD the worst. It still depends on the data.

ax = plt.subplot(212)
historydf.xs('acc', axis=1, level='metric').plot(ylim=(0,1), ax=ax)
plt.title("Accuracy")
plt.tight_layout()

Now in terms of accuracy too, we can see Adam Optimizer performed the best. This is how we can play around with the optimizers to build the best model.

Difference between RNN & CNN

CNNRNN
It is suitable for spatial data such as images.RNN is suitable for temporal data, also called 
sequential data.
CNN is considered to be more powerful than RNN.RNN includes less feature compatibility when 
compared to CNN.
This network takes fixed-size inputs and generates fixed size outputs.RNN can handle arbitrary input/output lengths.
CNN is a type of feed-forward artificial neural network with variations of multi-layer perceptons designed to use minimal amounts of preprocessing.RNN unlike feed-forward neural networks – can use their internal memory to process arbitrary sequences of inputs.
CNN uses the connectivity pattern between the neurons. This is inspired by the organization of the animal visual cortex, whose individual neurons are arranged in such a way that they respond to overlapping regions tiling the visual field.Recurrent neural networks use time-series information – what a user spoke last will impact what he/she will speak next.
CNN is ideal for images and video processingRNN is ideal for text and speech analysis.

Libraries & Extensions

Tensorflow has the following libraries and extensions to build advanced models or methods. 
1. Model optimisation
2. TensorFlow Graphics
3. Tensor2Tensor
4. Lattice
5. TensorFlow Federated
6. Probability
7. TensorFlow Privacy
8. TensorFlow Agents
9. Dopamine
10. TRFL
11. Mesh TensorFlow
12. Ragged Tensors
13. Unicode Ops
14. TensorFlow Ranking
15. Magenta
16. Nucleus
17. Sonnet
18. Neural Structured Learning
19. TensorFLow Addons
20. TensorFLow I/O

What are the Applications of TensorFlow?

  • Google use Machine Learning in almost all of its products: Google has the most exhaustive database in the world. And they obviously would be more than happy if they could make the best use of this by exploiting it to the fullest. Also, if all the different kinds of teams — researchers, programmers, and data scientists — working on artificial intelligence could work using the same set of tools and thereby collaborating with each other, all their work could be made much simpler and more efficient. As technology developed and our needs widened, such a toolset became a necessity. Motivated by this necessity, Google created TensorFlow  — a solution that they have been long waiting for.
  • TensorFlow bundles together the study of Machine Learning and algorithms and will use it to enhance the efficiency of its products — by improving their search engine, by giving us recommendations, by translating to any of the 100+ languages, and more.

What is Machine Learning?

A computer can perform various functions and tasks relying on inference and patterns as opposed to the conventional methods like feeding explicit instructions, etc. The computer employs statistical models and algorithms to perform these functions. The study of such algorithms and models is termed as Machine Learning.
Deep learning is another term that one has to be familiar with. A subset of Machine Learning, deep learning is a class of algorithms that can extract higher-level features from the raw input. Or, in simple words, they are algorithms that teach a machine to learn from examples and previous experiences. 
Deep learning is based on the concept of Artificial Neural Networks, ANN. Developers use TensorFlow to create many multiple layered neural networks. Artificial Neural Networks, ANN, is an attempt to mimic the human nervous system to a good extent by using silicon and wires. The intention behind this system is to help develop a system that can interpret and solve real-world problems like a human brain would. 

What makes TensorFlow popular?

  • It is free and open-sourced: TensorFlow is an Open-Source Software released under the Apache License. An Open Source Software, OSS, is a kind of computer software where the source code is released under a license that enables anyone to access it. This means that the users can use this software library for any purpose — distribute, study and modify — without actually having to worry about paying royalties.
  • When compared to other such Machine Learning Software Libraries — Microsoft’s CNTK, or Theano — TensorFlow is relatively easy to use. Thus, even new developers with no significant understanding of machine learning can now access a powerful software library instead of building their models from scratch.
  • Another factor that adds to its popularity is the fact that it is based on graph computation. Graph computation allows the programmer to visualize his/her development with the neural networks. This can be achieved through the use of Tensor Board. This comes in handy while debugging the program. The Tensor Board is an important feature of TensorFlow as it helps monitor the activities of TensorFlow– both visually and graphically. Also, the programmer is given an option to save the graph for a later use.  

Applications

Below are listed a few of the use cases of TensorFlow:

  • Voice and speech recognition: The real challenge put before programmers was that a mere hearing of the words will not be enough. Since, words change meaning with context, a clear understanding of what the word represents with respect to the context is necessary. This is where deep learning plays a significant role. With the help of Artificial Neural Networks or ANNs, such an act has been made possible by performing word recognition, phoneme classification, etc.

Thus with the help of TensorFlow, artificial intelligence-enabled machines can now be trained to receive human voice as input, decipher and analyze it, and perform the necessary tasks. A number of applications makes use of this feature. They need this feature for voice search, automatic dictation, and more.
Let us take the case of Google’s search engine as an example. While you are using Google’s search engine, it applies machine learning using TensorFlow to predict the next word that you are about to type. Considering the fact that how accurate they often are, one can understand the level of sophistication and complexity involved in the process.

  • Image recognition: Apps that use the image recognition technology are probably the ones that popularized deep learning among the masses. The technology was developed with the intention to train and develop computers to see, identify, and analyze the world like how a human would.  Today, a number of applications finds these useful — the artificial intelligence enabled camera on your mobile phone, the social networking sites you visit, your telecom operators, to name a few.

In image recognition, Deep Learning trains the system to identify a certain image by exposing it to a number of images that are labelled manually. It is to be noted that the system learns to identify an image by learning from examples that are previously shown to it and not with the help of instructions saved in it on how to identify that particular image.
Take the case of Facebook’s image recognition system, DeepFace. It was trained in a similar way to identify human faces. When you tag someone in a photo that you have uploaded on Facebook, this technology is what that makes it possible. 
Another commendable development is in the field of Medical Science. Deep learning has made great progress in the field of healthcare — especially in the field of Ophthalmology and Digital Pathology. By developing a state of the art computer vision system, Google was able to develop computer-aided diagnostic screening that could detect certain medical conditions that would otherwise have required a diagnosis from an expert. Even with significant expertise in the area, considering the amount of tedious work one has to go through, chances are that the diagnosis vary from person to person. Also, in some cases, the condition might be too dormant to be detected by a medical practitioner. Such an occasion won’t arise here because the computer is designed to detect complex patterns that may not be visible to a human observer.    
TensorFlow is required for deep learning to efficiently use image recognition. The main advantage of using TensorFlow is that it helps to identify and categorize arbitrary objects within a larger image. This is also used for the purpose of identifying shapes for modelling purposes. 

  • Time series: The most common application of Time Series is in Recommendations. If you are someone using Facebook, YouTube, Netflix, or any other entertainment platform, then you may be familiar with this concept. For those who do not know, it is a list of videos or articles that the service provider believes suits you the best. TensorFlow Time Services algorithms are what they use to derive meaningful statistics from your history.

Another example is how PayPal uses the TensorFlow framework to detect fraud and offer secure transactions to its customers. PayPal has successfully been able to identify complex fraud patterns and have increased their fraud decline accuracy with the help of TensorFlow. The increased precision in identification has enabled the company to offer an enhanced experience to its customers. 

A Way Forward

With the help of TensorFlow, Machine Learning has already surpassed the heights that we once thought to be unattainable. There is hardly a domain in our life where a technology that is built with the help of this framework has no impact.
 From healthcare to entertainment industry, the applications of TensorFlow has widened the scope of artificial intelligence to every direction in order to enhance our experiences. Since TensorFlow is an Open-Source Software library, it is just a matter of time for new and innovative use cases to catch the headlines.

5

LEAVE A REPLY

Please enter your comment!
Please enter your name here

seventeen + 12 =