You might have heard of Machine learning, and even know all the basics but somehow you are finding it hard to code what you understood in those long lectures of Machine Learning. Using Microsoft Azure Machine Learning Studio you can apply all the concepts you have learnt without even writing a single line of code. It provides an easy drag and drop option and also comes a lot of functionality. In this tutorial, we will learn how to use Microsoft Azure Machine Learning Studio by solving a simple Machine Learning problem. Here are the contents of this tutorial:
- What is Microsoft Azure?
- What is Azure Machine Learning Studio?
- Working of Azure Machine Learning Studio
- Creating your first Experiment in ML Studio
What is Microsoft Azure?
Microsoft Azure is a Microsoft cloud service provider that provides cloud computing services like computation, storage, security and many other domains. Microsoft is one of the global leaders when it comes to Cloud solutions and global cloud infrastructure. Microsoft Azure provides services in 60+ global regions and serves in 140 counties. It provides services in the form of Infrastructure as a service, Platform as a Service and Software as a service. It even provides serverless computing meaning, you just put your code and all your backend activities as managed by Microsoft Azure.
It easily integrates with Microsoft Products making it very popular using Microsoft products. This platform is now 10 years old and it picked up to compete with the best of the best.
What is Azure Machine Learning Studio?
Microsoft Azure Machine Learning Studio, also known as MAML is a web-based integrated development environment (IDE) which features collaborative drag-and-drop tools for building Machine learning models. Being closely knit with the rest of Azure’s cloud services, it simplifies the development and deployment of machine learning models and services.
Coding the Machine learning algorithms in languages like Python or R can be overwhelming for someone who just started with Machine learning. As MAML encapsulates most typically used Machine learning algorithms as modules and allows you to create learning models graphically using your dataset, it may be a good starting point for these beginners. Also at the same time, MAML also offers the power to advanced users such as easily fine-tune the hyper-parameters of the algorithm and test its effect on the accuracy of the model.
Upon successfully testing and evaluating your models, one can easily deploy his model as Web services so that the custom apps or BI tools, such as Excel, can use it. Azure also provides a service known as Azure Machine Learning Service which supports embedding your Python or R script within your models. Thus giving advanced users the opportunity to write custom Machine learning algorithms.
Working of Azure Machine Learning Studio
As we know by now that Azure Machine Learning Studio gives us the capability of using drag and drop operations instead of manually writing the code, this relatively eases our work. But we still have to figure out which algorithms can be used inside the Azure Machine Learning Studio that is suitable for the problem we are dealing with.
Primarily in our problems, we might want to predict a particular number, or a class(group), or we may even want to find abnormality in our data, here are the algorithms for each of these specific tasks.
We may call the problems in which we have to predict a particular number as Regression. There are several problems that come under regression such as stock price prediction, Sales prediction, House prices prediction, equipment servicing priorities determination, etc. As you can guess all these predictions will be in the form of real numbers. The algorithm options for these problems are:
1. Ordinal regression
2. Poisson regression
3. Fast Forest Quantile regression
4. Linear regression
5. Bayesian regression
6. Neural network regression
7. Decision forest regression
8. Boosted decision tree regression
Now if our problem consists of predicting a particular class or group, it is termed as classification. For example, given a fruit, classify if it is an apple or not an apple. This sort of problem is called Two-Class classification problems. Here are the algorithms that are suitable for these kinds of problems:
1. Two-class SVM
2. Two-class averaged perceptron
3. Two-class Bayes point machine
4. Two-class decision forest
5. Two-class logistic regression
6. Two-class boosted decision tree
7. Two-class decision jungle
8. Two-class locally deep SVM
9. Two-class neural network
In the above problem, we can have more than two classes, such as given a fruit, classify it as an apple, or an orange, or a banana. Here the above algorithms are not going to work and we need different algorithms that are:
1. Multiclass logistic regression
2. Multiclass neural network
3. Multiclass decision forest
4. Multiclass decision jungle
5. One-vs-all multiclass
To find abnormality in the data, also known as anomaly detection we can use the following algorithms:
1. One class SVM
2. PCA-based anomaly detection
Fraud detection, abnormal equipment readings are some examples of anomaly detection.
Creating your first Experiment in ML Studio
You must be aware of the general steps of Machine Learning,if you are not I would highly suggest that you go through this Machine Learning Tutorial.
In this experiment, we shall train a model that can predict the price of an automobile vehicle given its various features such as body style, rpm, fuel type and many more. Here are the steps we are going to follow:
- Get the data
- Prepare the data
- Select features
- Choose and apply an algorithm
- Predict new automobile prices
You can click here to be redirected to Azure ML studio classic
Get the data
LIke Automobiles, Machine learning also requires fuel for working and we often say that the data is the fuel for Machine learning. So the first thing we need is data and luckily there are several sample datasets included with ML Studio. We can either use these datasets or import from any external source. For this example, we’ll be using the sample dataset, Automobile price data (Raw). In this dataset, we have various records that give information about the price of automobiles given some of its features such as make, model and technical specifications.
Here’s how to get the dataset into your experiment.
- Create a new experiment by clicking +NEW at the bottom of the Machine Learning Studio (classic) window. Next click on the Blank Experiment.
- Next the name the experiment according to your preference.This should be the window that appears in front of you:
- On the left side you can see there are various datasets, search the one named Automobile price data (Raw) and drag this dataset to the experiment canvas.
After dragging this onto the canvas, you can visualize it. Just right click on it and click on visualize:
Here is the visualization that should appear in front of you:
Prepare the data
Now that we have the data, we need to preprocess it before using further. This is required because often our dataset has some missing values present in the columns of various rows. Or even sometimes some entries may contain abnormal values such as a string or value tending to infinity. There may be other reasons also to preprocess and clean the data and mostly this stage takes a maximum time of any Machine Learning project. These values need to be removed or substituted with so the model can analyze the data correctly.
In this example we will do two preprocessing steps, one is to remove any rows having missing values and another is to remove a column named normalized-losses as it has almost 41 missing values which are quite large for a dataset of this size.
First, we add a module that drops the normalized-losses column completely, you can even drop any other column if you wish to do so. Then we add another module that removes all rows with missing data.
Note that datasets and modules have input and output ports which are represented by small circles. The input ports are at the top while the output ports are at the bottom. Usually, we connect an output port of one module to an input port of another as you’ll see below. Also, at any stage, you can see what the data looks like at any point in the data flow by clicking the output port of a dataset or module. Here is what we are going to do in this step
- In the search bar, type Select Columns in the Dataset module and then drag it to the experiment canvas. This module allows us to select columns of data that we want to include or exclude in the model.
- Connect the output port of the Automobile price data (Raw) dataset to the input port of the Select Columns in Dataset as shown below. Also, click on the option marked in red rectangle below to select which column to remove.
- After clicking it, you have various ways to either include a column or remove a column. In this case, I would suggest to select the option “with rules” and begin with all columns than exclude the column you want to like shown below:
- Next step is to remove rows which contain any empty column. To do this drag the Clean Missing Data module to the experiment canvas as we did in the previous step and connect it to the Select Columns in Dataset module. Now open the properties option and select Remove entire row under Cleaning mode, you can also handle the missing values in any other way, like substituting it with the mean of the column as shown below:
- Now we run the experiment by clicking RUN at the bottom of the page. When the experiment has finished running, all the modules have a green checkmark to indicate that they finished successfully.
Not all features in our dataset can help us to accurately predict the outcome successfully. Some features may have no effect on the price thus removing them is a good step. So how do we find important features, finding a good set of features for creating a predictive model requires experimentation and domain knowledge about the problem you are solving. Another thing is that two features can be highly correlated for example, in this dataset, city-mpg and highway-mpg are closely related, so we can keep one and the predictions will not be heavily impacted.
Now to do this step, we are again going to make use of the Select Columns in Dataset module as we did in the previous step and select just the columns we need.
It is important to know that sometimes we can create features of our own by using the existing features but we are not going to do that here.
Choose and apply an algorithm
So what kind of problem is this? Is this a Regression problem or a Classification problem? This is clearly a Regression problem as we have to predict the price of an automobile which is a real number. In the previous section, we discussed the algorithms that can be applied to Regression problems, so here we will use Linear Regression.
Also, we will divide the data into two parts, training part and testing part. The model will only be retained using the training part and then we shall test the model on the testing part. For this, we will use a module called split data. We can give the ratio in which we have to divide the data into training and testing part.
Now that we have split the data into training and testing parts, drag and drop two more modules i.e. Linear Regression and Train Model as we did with other modules. Next click on the trained model to specify our label(the value we have to predict) as shown below:
Now click on the run button below and we should see green ticks on all modules as below:
Predict new automobile prices
Till now we have successfully trained the model using 70 per cent of the data. Now we can see how good or bad is our model by testing it on the other 30 per cent of data. To do this follow the steps below:
- Search for the Score model in the search bar and drag it into the canvas. To the left input port of Score Model, attach the output of the Train Model while as attach the test data output of the Split Data module to the right input port of Score Model as shown below:
- Run it and view the output as shown below. The output shows the predicted values for price and the known values from the test data.
Next, this will appear on your screen showing the predicted output marked by the red rectangle along with the real price marked in the green of the automobile.
- Search and drag the Evaluate Model module to the experiment canvas, and connect the output of the Score Model module to the left input of the Evaluate Model. The final experiment should look something like this:
After running the above experiment, click on the Evaluate model and visualize its output:
The following statistics are shown for our model:
- Mean Absolute Error (MAE): The average of absolute errors (an error is a difference between the predicted value and the actual value).
- Root Mean Squared Error (RMSE): The square root of the average of squared errors of predictions made on the test dataset.
- Relative Absolute Error: The average of absolute errors relative to the absolute difference between actual values and the average of all actual values.
- Relative Squared Error: The average of squared errors relative to the squared difference between the actual values and the average of all actual values.
- Coefficient of Determination: Also known as the R squared value, this is a statistical metric indicating how well a model fits the data.
The smaller value for the statistical error is better results compared to them having high values as smaller value indicates that the predictions more closely match the actual values. For the Coefficient of Determination, the closer its value is to one the better are the predictions.
If you no longer need the resources you created using this article, delete them to avoid incurring any charges. To delete the experiments, just click on the experiments and you will see a delete option at the bottom as shown here:
This brings us to the end of this article where we learned how to use Microsoft Azure Machine Learning Studio and solved a simple Regression problem.
- Machine Learning Tutorial For Complete Beginners | Learn Machine Learning with Python
- Hyperparameter Tuning with GridSearchCV
- Comparing Amazon Web Services, Microsoft Azure and Google Cloud
- Data Science Tutorial For Beginners | Learn Data Science Complete Tutorial