ssis tutorial

What is SSIS?

SQL Server Integration Services is a tool developed by Microsoft, and it is available in Microsoft SQL Server database software. It is a tool for the ETL (Extract, Transform and Load) process which is extremely useful in Data warehousing applications and to perform a wide range of data integration applications.

It performs operations like reshaping, cleaning, loading data, supporting the necessity, performing different transformations on the info and to define a workflow to perform some tasks for the day to day activities. SSIS underpins various information types like Excel accounting pages, comma-isolated qualities (CSV) records, text documents, directory services and different sources. 

Prior to the application of SSIS, Data Transformation Services (DTS) in SQL Server 2000 was utilized to perform comparative assignments; however, it had limited functionality. With the introduction of SSIS in SQL Server 2005, many new features are available to use, such as graphical tools, windows wizard workflow function for sending email messages and FTP operations.

To start and to build up SSIS packages, it is mandatory to install SQL Server Business Intelligence Development Studio, which will be accessible as a client tool when installing SQL Server Management Studio (SSMS). 

What is data integration?

Every corporate computing environment consists of hundreds or maybe thousands of different and changing information source systems built, purchased, and used. Information data from these multiple sources need to be stored and processed for reporting and analysis or converted to business valued data and make system compatible format to another if old systems are replaced, and new systems are acquired. An effective way of managing the information passing between two systems may be a major challenge and a priority for each information technology organization. 

Most of the available data management tools focus on data stored in structures like databases and files, and a smaller focus is on the info flowing between and around the data structures. Management of the information interfaces from different sources in organizations is a big challenge for business and knowledge technology (IT) management. As we know, technology is evolving, and the number of systems is getting added to an organization’s portfolio day by day. Hence the quantity and complexity of the interfaces between the systems are growing rapidly, making management of those interfaces overwhelming. The number of interfaces between applications and systems may become exponential and impractical. In practice, not every system must interface with every other, but there could also be multiple interfaces between systems for various sorts of data or needs. So for a corporation with ten applications, there could also be something like 100 interfaces. So, a portfolio of 100 applications may provide half 1,0000 interfaces to manage.

Data Integration is one of the major keys in Data Science and Data mining. Data integration (DI) is the practice of consolidating data from multiple sources into a single dataset with the ultimate goal to provide users with consistent access and delivery of data across the spectrum of subjects and structure types and to satisfy the knowledge needs of all applications and business processes.

Data integration solution architects develop data integration software applications that facilitate and automate data integration processes for connecting and driving data from source systems to the Destination systems. This can be done through a variety of data integration techniques, including:

Extract Transform and Load: The copies of datasets from various sources are gathered, centralized, and loaded into a knowledge warehouse or database.

Extract, Load and Transform: The data is loaded as-is into an enormous information system and transformed later for analytics uses

Change Data Capture: This identifies data changes in databases in real-time and applies them to a knowledge warehouse or other repositories.

Data Replication: The data in one database is replicated to other databases to stay the knowledge synchronized to operational uses and for backup

Data Virtualization: Data from different systems are virtually combined to form a unified view rather than loading data into a replacement repository

Streaming Data Integration: It is a real-time data integration process in which different streams of data sources are continuously integrated and fed into analytics systems and data stores

Why SSIS?

All data that businesses receive from outside of their systems are not compatible with storing, modifying, or applying some process over it for business decisions. Also, most of the real-world data is bulk data, and it can be in excel file, XML file, CSV file or text file so to store/insert these bulk data efficiently, we require some software or application which can fulfil all of these above needs.

SSIS can handle most data situations from different databases or flat files. SSIS has an enormous set of data-transformation tasks to perform a good range of data-cleaning, converting, and applying available functions. You can change values or get descriptions from code values using exact or fuzzy lookups present within the SSIS package. Identifying records duplicated by using SSIS grouping transformations helps to successfully remove them before loading the destination system.

How does SSIS work?

Before we start to understand the working of SSIS, it’s very important to know several terms you’ll often hear employed by SSIS professionals. Here are a few of the key terms and phrases you’ll get to know to achieve success with SSIS.

Package: the package is the heart of the SSIS code, and it’s the area on which you’ll spend most of your development time. An SSIS package not just single it could also be a set of multiple operations that are invoked together.

Task: A task may be a single operation within a package. There are dozens of various sorts of tasks available in SSIS.

Component: A component is a component of a knowledge pipeline, representing either a source from which data is retrieved, a destination to which data is written, or a change that manipulates or reshapes the data.

Execution: This is often the act of invoking, or running, the logic in an SSIS package. Packages are often executed from within the SQL Server Data Tools (SSDT) development environment or directly on a properly configured instance of SQL Server.

Deployment: A deployment occurs when the fully developed SSIS project is pushed from the event workstation to an instance of SQL Server, where it can then be executed either manually or through a scheduling tool like SQL Server Agent. Deployment is typically more complex than copying code from one machine to another, although SQL Server does an honest job of hiding that complexity for many deployments.

Project: actual source code in SSIS is arranged into functional units called projects. A project can contain one package or more than one package. When deploying SSIS code, the whole project will be deployed to the server to perform the required action.

Solution: A solution is a logical way of grouping related projects together.

The SSIS runtime engine: this is often the logic that permits a package to run. When we’re working with SSIS packages in SSDT, the packages will be executed using the SSIS runtime on your visual studio development machine, and After the code is deployed to SQL Server, any execution runs there on the server will use the server’s SSIS runtime.

SSIS has two major function i.e. Data integration and workflow. Each activity is done by the SSIS package. SSIS package consist of three components: 

Data Sources: Business data coming from different sources such as data from Excel or CSV file are examples of data sources. These all data is handled and supplied to ETL in SSIS.

ETL: ETL Stands for Extract, Transform, and Load. It is the most important Process in the SSIS tool, which is responsible for extracting out data from different -different sources and transforming or modifying it into useful data, also storing these data in the data warehouse for processing application and business use.   

Data Warehouse: it is a destination in SSIS that stores all the data coming from different data sources after the ETL process. The data warehouse is used for collecting and processing data to provide meaningful business insights and data analysis.

Requirements for SQL Server Integration Services:

The following two application needs to install to run SQL Server Integration Services:

  • Install the SQL Server
  • Install the SQL Server Data Tools

Use below steps to install the SQL Server Data tools:

Step 1: Click on the link https://docs.microsoft.com/en-us/sql/ssdt/download-sql-server-data-tools-ssdt?view=sql-server-ver15 to download the SQL Server data tools.

Step 2: on click of the above link, below screen appears:

In the above screen, we can see all the latest versions of SSDT available. based on system compatibility and visual studio version select the version to install.

Step 3: Launch Visual Studio installer in your machine

Step 4: Select SQL Server Data Tools under Data storage and processing in the list of workloads and install.

Step 5: for Integration services you have to install appropriate extension from visual studio Extensions🡪Manage Extensions

Step 6: Click on Next to complete the Installation.

What are the SSIS packages?

The package is a collection of units that is retrieved, executed and saved. An SSIS package is a well-ordered collection of control flow, data flow, event handlers, variables, parameters, connections, and configurations that we build using either the graphical design tools that SQL Server Integration Services provides, or it is made programmatically.

Mainly in SSIS Control Flow element and Data Flow Element is a vital part to create any package.

Control flow Element: – It consists of one or more tasks and containers that run when packages are executed. To control the order or sequence of tasks to be executed next in the package, we use precedence constraints to connect the task and containers in the package.

  • Precedence Constraints: – It connects executables, containers and tasks within a package and specifies the condition which determines when the executable will run. An executable could be a For Loop, Foreach Loop, an easy task or an event handler.

Precedence constraint has two parts first is Precedence executable, and second is constrained executable. At the time of execution, precedence executable executes first and then constrained executable, and therefore the result of precedence executable determines whether the constrained executable is going to be executed or not.

Data Flow Element: this element of SSIS transforms source data. A data flow Compose of the sources and destinations that extract and load data, the transformations that modify data. SSIS has three different types of data flow component.

  • Sources: it extracts data from data files such as spreadsheet,csv,xml ,text file. Also, sometimes data from table and view of relational databases.
  • Transformations: it modifies and cleans data.
  • Destination: it loads data into a data store or in memory dataset.

SSIS Task:

In the SSIS package, A task might be a unit of labor and that will have different types of tasks to perform different sorts of work. There are different types of tasks, but we will be taking the most common tasks used in SSIS:

Data Flow Task: The task that runs data flows to extract data, apply column level transformations, and cargo data.

Data Preparation Tasks: These tasks do the following processes: download files and data, copy files and directories, run Web methods, apply operations to XML documents, and profile data for cleansing.

Workflow Tasks: it includes tasks that communicate with different processes to run packages, send and receive messages between packages, run programs or batch files,, send e-mail messages, read Windows Management Instrumentation (WMI) data, and await WMI events.

SQL Server Tasks: The tasks that copy, insert, access, copy, delete, and modify SQL Server objects and data.

Scripting Tasks: The tasks that reach package functionality by using scripts.

Analysis Services Tasks: The tasks that make, modify, delete, and process Analysis Services objects.

Maintenance Tasks: The tasks that perform administrative work like shrinking SQL Server databases, backup of database periodically, rebuilding and reorganizing indexes, and running SQL Server Agent jobs.

Custom Tasks: Additionally, you’ll write custom tasks employing a programing language that supports COM, like Visual Basic, or a .NET programing language, like C#. If you would like to access your custom task within the SSIS Designer, you’ll create and register an interface for the task

Example of Data Flow Task:

Step 1: First, we will create an excel sheet in Microsoft Excel for storing our row data. For Illustration purposes I am creating excel sheets for Student data in any institute. Columns are Student_id,Student_Name,Department,Gender,Passing_year

Step 2: Open the visual studio and click on Create a New Project. Type SSIS in search option

Step 3: Now Select Integration Services Project from search result and click on Next Button

Step 4: give some meaning full name to your SSIS project and click on Create

In the above screenshot, we can see it has control flow, Data Flow, Parameter, Event handler and package Explorer. We have discussed the control flow and data flow in the above SSIS package section. 

Step 5: To import the information data, we’d like first to make the database in which we create the table in SQL Server database. As we all know that student data is inside the Excel file and that we want to import this info into the SQL Server database. In order to achieve this, we need to first create the table in SQL Server. Open the SQL Server Management studio.

Step 6: Create the student database and Right-click on the database, and then click on the New Database option

Step 7: Enter the database name in the below screenshot, Here I have given database name as StudentDB 

Step 8: Click on the Ok button.

Step 9: Now, we will create a table in a StudentDB database (Destination). In this we can create tables by graphical method or directly write create table script as usual. Here I am using Create script to get the table in StudentDB

In the table we are adding the same column which is present in our excel sheet.

USE [StudentDB]

GO

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE TABLE [dbo].[StudentInfo](

[Student_id] [int] NULL,

[Student_Name] [nchar](10) NULL,

[Department] [nchar](10) NULL,

[Gender] [nchar](10) NULL,

[Passing_Year] [nchar](10) NULL

) ON [PRIMARY]

GO

With this step table structure/schema is ready in SQL database without any data in StudentInfo table.

Step 10: Now, Go back to visual Studio and drag and drop Data Flow Task  inside main window from SSIS Toolbox 

Step 11: Now double-click on the Data Flow Task, the control automatically goes to the Data Flow from the Control Flow, so we can say that Control Flow is a container of Data Flow.

Step 12: Now, we will perform the transformations in Data Flow. We want to extract the data from Excel file that we created earlier, so drag and drop the Excel source from the other sources appearing at the leftmost side of toolbox

From the above screen, we observe that the red cross appears inside the Excel Source, it means that the component is configured with the Excel Source.

Step 13: To configure the component, right-click on the Excel source and click on the Edit option as shown in the below screenshot

Step 14: Fill the details shown in the below screenshot:

Step 15: After entering all the details, the screen appears shown as below:

Step 16: click on the Excel source

The above screenshot shows two arrows, a red and blue arrow. The blue arrow defines the particular data that we receive from the Excel file, and therefore the red arrow denotes the errors. Now we’ve to perform the transformations, and before performing the transformations, we’d like to load the database.

Step 17: To load the database, click on the opposite Destinations then click on the Destination

Step 18: After adding the destination (OLE DB), connect the source(Excel) to the Destination 

Step 19: with the above step, we noticed the Red Cross in OLE DB Destination, which suggests that it’s not configured with a component. Right-click on the OLE DB Destination and then click on the Edit option below screen appear:

Step 20: Click on the New button to add the connection manager. When you click on the New button, the screen appears shown below:

Step 21: Add the Server name and select the database name from the SQL Server

Step 22: Now click on OK

Step 23: Add the table name in which you want to load the data

Step 24:  In the below screen, map the field of input source to the field of the destination source

Step 25: After adding a mapping, the source is connected to the Destination, as shown in the below screenshot

Step 26: Click on the Start button to run the Integration Services Project present above . When you run the project, the data gets automatically loaded to the destination, i.e., SQL Server

In the above screenshot we can see that it is showing as 10 rows because we had 10 row of data in excel sheet and with this step all 10 rows is copied to the Database Table StudentInfo as shown in below screenshot

Example of Execute SQL Task:

In Execute SQL Task, user can execute any of the SQL query or perform require action from SQL Task. for the illustration purpose here we are taking example where we will add one Entry in table through Execute SQL Task (Insert data in SQL table)

Step 1: Follow the same step from Step 2 to Step 5 from previous Section example

Step 2: Drag and drop the Execute SQL Task appearing at the leftmost panel to the main       window.

Step 3: To add the Connection Manager, Right-click and select Edit below screenshot appears

Select Connection as per below screenshot and under SQLStatement  write down the query that we want to execute

Step 4: Click on Ellipses in front of SQL Statement below window open where we need to write the SQL query to be Executed

Step 5: Click on OK button and click on Start to run 

Step 6: Output of this task is shown below

If you found this SSIS Tutorial helpful and wish to learn more such concepts, you can check out our various free online courses available on Great Learning Academy.

0

LEAVE A REPLY

Please enter your comment!
Please enter your name here

6 − 1 =