As you will see in this and upcoming posts, I am a big fan of anything that improves the process of software development. This can range from efficient daily stand-up meetings to customizing your development toolset. In this post, I will describe how we implemented a number of Continuous Integration practices in the enterprise reporting solution we built for Terasen Gas; see the following case study for more detail about the project.
For those of you not familiar with Continuous Integration or its benefits, I highly recommend reading Martin Fowler's article on this subject. As in any project with multiple developers working on the same area of a system, you want to integrate their changes as soon as possible. This provides quick feedback on new features and more importantly any issues; the sooner issues are identified, the easier they are to fix.
When developing the ETL (Extract Transform and Load) portion of a reporting solution, you want to make sure each new feature added to the solution "works" and doesn't break any existing features. Accomplishing this with the SSIS (SQL Server Integration Services) platform can be a little tricky, but can be done to the appropriate level of depth for your project. I will describe the steps we took to ease some of the SSIS development pains on the aforementioned reporting project.
One of the key practices of Continuous Integration is automating the build. In this example, we setup Cruise Control .NET at our client site to run after anything is checked into the source code repository. We went with Cruise Control .NET since it is free and our client did not have TFS licensing. Having the build server setup on at our client site was important since we needed to connect to a number of data sources during the build. MSBuild was used for all of the scripting; all logic was setup to run on developers workstations AND on the build server. Our main goal for the build process on this project was to identify any it works on my machine issues immediately. So we wanted our build process to test the following:
- Drop and recreate our reporting database on the build server
- Validate all SSIS configuration settings on the build server
- Ensure all SSIS source and destination connections are valid against the real data sources and the newly created data warehouse
Now that the stage has been set, lets see how we did it. First, we need to setup an MSBuild ItemGroup that contains all of the SSIS packages. When these packages are executed on the build server, you cannot use the relative path syntax that works in your development environment since the execution of the MSBuild script occurs in a different container. You can easily get around this issue with the following script:
<!-- Sets the EtlReportingPackage property ONLY if the CCNetWorkingDirectory is NOT empty (this will be part of the CCNET build) --> <ItemGroup Condition="'$(CCNetWorkingDirectory)' != ''"> <EtlReportingPackage Include="$(CCNetWorkingDirectory)\src\Integration\*.dtsx" /> </ItemGroup> <!-- Sets the EtlReportingPackage property ONLY if the CCNetWorkingDirectory IS empty (this will happen when you manually execute an msbuild command) --> <ItemGroup Condition="'$(CCNetWorkingDirectory)' == ''"> <EtlReportingPackage Include="Integration\*.dtsx" /> </ItemGroup>
Now that you have your ETL packages ItemGroup, we can get to the heart of the example — the process of rebuilding your databases and validating your SSIS packages. In order to keep things simple, we wanted to leverage the SQL scripts we were building to create the reporting database along with its various objects. To ensure the data warehouse scripts were inline with our SSIS packages, the following steps were executed on every check-in: drop the database, create the database (with the appropriate file paths and sizes), create the database objects, and verify all of the source and destination sources in the SSIS packages are valid. This was accomplished with the following MSBuild script:
<PropertyGroup> <Osql>osql.exe</Osql> <DtExec>dtexec.exe</DtExec> <DtExecValidateArgs>/Va /Rep E /F</DtExecValidateArgs> ... </PropertyGroup> <Target Name="DropReportingDb"> <Exec ContinueOnError="false" Command="osql -Q "DROP DATABASE [$(ReportingDbName)]" -E -S $(ServerName) -n -b" /> </Target> <Target Name="CreateReportingDb"> <MakeDir Directories="$(ReportingDbPath)" /> <Exec ContinueOnError="false" Command="osql -E -S $(ServerName) -n -b -Q "CREATE DATABASE [$(ReportingDbName)] ON PRIMARY ( NAME = N'$(ReportingDbName)_Data', FILENAME = N'$(ReportingDbPath)\$(ReportingDbName)_Data.MDF', SIZE = $(ReportingDbSize) , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB ) LOG ON ( NAME = N'$(ReportingDbName)_Log', FILENAME = N'$(ReportingDbPath)\$(ReportingDbName)_Log.LDF', SIZE = $(ReportingDbLogSize) , MAXSIZE = 2048GB , FILEGROWTH = 10% )"" /> <Exec ContinueOnError="false" Command="osql -E -S $(ServerName) -n -b -Q "ALTER DATABASE [$(ReportingDbName)] SET RECOVERY SIMPLE "" /> </Target> <Target Name="CreateReportingDbObjects"> <Exec ContinueOnError="false" Command="osql -i "SQL Scripts\CreateWarehouse.sql" -E -S $(ServerName) -d $(ReportingDbName) -n -b" /> </Target>Drops and recreates all of the databases, then validates all of the ssis packages using either the default or specified package configuration file
<Target Name="ValidatePackages" DependsOnTargets="DropReportingDb;CreateReportingDb;CreateReportingDbObjects"> <Message Condition="'$(PackageConfigFile)' == ''" Text="We are about to validate the ETL reporting packages" /> <Exec Condition="'$(PackageConfigFile)' == ''" ContinueOnError="false" Command="$(DtExec) $(DtExecValidateArgs) %(EtlReportingPackage.Identity)" /> <Message Condition="'$(PackageConfigFile)' != ''" Text="We are about to validate the ETL reporting packages with the package config file $(PackageConfigFile)" /> <Exec Condition="'$(PackageConfigFile)' != ''" ContinueOnError="false" Command="$(DtExec) $(DtExecValidateArgs) %(EtlReportingPackage.Identity) /Conf $(PackageConfigFile)" /> </Target>
There are a couple of things you should note in the example above. First, the /Va switch (defined on line 4 but used in lines 46 and 54) tells the SSIS engine to only validate your package, not execute it. Another important technique used in this example is the ability to use a different SSIS configuration file during the package validation — this is done by using the /Conf switch (line 54). We use the indirect XML configuration setting for all of our SSIS packages, but the /Conf switch gave us the flexibility to override some of our typical configuration settings during the build process. Hopefully these tips will help you introduce a few Continuous Integration practices into your reporting solutions.