SQL Server Integration Services: SQL Server 2005 s New ETL Platform

Microsoft SQL Server 2005 provides a completely new enterprise extraction, transformation, and loading ETL- platform called SQL Server Integration Services SSIS- that ships with the features, tools, and functionality to build both classic and innovative kinds of ETL-based applications.

Back in the days when Microsoft SQL Server 6.5 was still a hot product, the bulk copy program was the only way to back up or export databases from SQL Server to other data sources, such as other database servers or text files. However, in subsequent SQL Server versions, Microsoft added a large number of features that were aimed at improving the productivity of SQL Server developers and DBAs. One such feature is DTS Data Transformation Services-. By creating DTS packages, you can combine several tasks into one process and use any programming language that supports automation such as Visual Basic.NET, Visual C#, or Managed C++- to execute these packages and monitor their progress for errors. With the upcoming SQL Server 2005 release, Microsoft has raised the bar by introducing SSIS, the brand new ETL tool. Similar to DTS, it provides functions for moving data from one place to another and manipulating that data at run time. However, SSIS has been completely redesigned from scratch to be an enterprise ETL platform. SSIS provides the breadth of features ?and very high-level performance? necessary to build enterprise-class ETL applications. SSIS is fully programmable, embeddable, and extensible, which makes it an ideal ETL platform. It has a great development environment hosted in a Visual Studio shell with cool capabilities for building workflows and pipelines through a rich set of pre-built or custom components. SQL Server 2005 ships with the following development and management environments for designing and managing packages: The SQL Server Management Studio is an environment for managing the storage and execution of deployed packages. It has special features for doing this, including integration with the DTS Service, and the ability to enumerate packages on remote servers. But it is not a design environment. The Business Intelligence Design Studio is an environment for designing packages, organizing them in Solutions and Projects, debugging them, and managing source and version control for multi-user projects. These two environments offer powerful facilities for deploying, debugging, and monitoring deployed packages. In addition, SQL Server 2005 provides rich workflow capabilities that you can use for performing sophisticated data manipulations. There are too many SSIS features to cover in a single article, so this piece highlights some of the important ones and then moves on to creating a simple package with Business Intelligence Development Studio. The following are the most notable features: You can use SSIS to transfer millions of rows of data to and from heterogeneous data sources, but SSIS functionality doesn t stop there. The tool leverages the end-to-end BI suite by offering complete data integration, movement, and shaping, which means that SSIS provides data cleansing, extensibility, and interoperability. SSIS comes with a lot of pre-built data-cleansing functionality, including completely integrated functions, such as fuzzy matching and fuzzy grouping, that use algorithms to match or group disparate data to a configurable degree of accuracy. SSIS offers broad extensibility points for third-party component vendors, meaning that if functionality is not available out of the box, SSIS lets you build your own components or add a third-party component to solve your problem. SSIS can load data directly into Analysis Services cubes, and it also offers robust data-mining features for including scalable data-mining model creation, training, and predictions. SSIS is seamlessly integrated with SQL Server Reporting Services integration, which lets you treat an SSIS package as the data source for reporting. SSIS also has greatly improved performance and scalability that allow you to host complex, high-volume ETL applications on lightweight servers, enabling you to scale down. SSIS can also help reduce ETL data staging areas and help minimize performance costs associated with data staging disk I/O and serial processing-. This is made possible by the ability to perform complex data transformations, data cleansing, and high-volume lookups ?all inline from source to destination. SSIS also provides a new feature, the Slowly Changing Dimension SCD- wizard. Through the SCD interface, you can rapidly generate all the steps and required code to add unique handling of history to multiple attributes in a given dimension. The development environment for SSIS, known as Business Intelligence Development Studio, is hosted in Visual Studio, enabling scripting and other programming tasks that take advantage of that enterprise development environment. SSIS now fully supports the Microsoft .NET Framework, allowing developers to program SSIS in their choice of .NET-compliant languages, as well as native code. The Data Transformation run-time engine is exposed both as a native COM object model and as an entirely managed object model. Although the Data Transformation engine is written in native code, it is available though a signed Primary Interop Assembly PIA- that enables full, managed access to it. Source and full article: www.developer.coma>