Building a report based on an Analysis Services cube using Reporting Services 2014

No Comments

Written by Luis Miguel – Originally Published on Jan. 25, 2015

I built a cube based on the data in my fictitious auto sales data mart. The cube is structured as a star schema, with a fact table called fact_sales, and 3 dimensions, dim_buyer, dim_model, and dim_location. I want to build a report that shows the total sales by make per region. Make is an attribute of dim_model and Region is an attribute of dim_location. For this report I will not use any attributes of dim_buyer. If I needed to see sales data by Gender or Income Level, I would use those attributes of dim_buyer in my SSRS project.

The automobile sales cube

Step-by-Step Instructions

Open SQL Server Data Tools for Visual Studio 2013 and add a new project, then add a new report to the project.

A new Reporting Services project with a new report

Add a Microsoft SQL Server Analysis Services data source to the report, using the Edit button of the new data source wizard select the server and cube where the source data resides.

new data source

Add a Dataset to the report; it will be based on the Data Source you added in the previous step. Click Query Designer and drag the measure and dimension attributes into the large empty area at the bottom right of the dataset designer. In this case I’m using the measure “Fact Sales Count” and the Make and Region dimension attributes.

Setting up a new dataset

The new dataset query deisgner

Add a table to the body of the report

A new table

Drag the fields from the dataset into the Data area of the table

Adding fields from the dataset to the new table

Apply grouping by Make to the table

Add group - Parent group

Group by Make

When you apply grouping to a tablix column, a new (grouped) column is added but the existing column is not removed. The report now has two Make columns, one with grouping applied to it and another one without grouping. Remove the extra column left in the table without grouping.

Deleting the extra column left after the parent group was added

Format the column names, rename Fact Sales Count to Total Sales, make the font bold and centered

Improving the apperance of the report

Click the Preview tab to see how the report looks so far

Previewing the report

Sort the data by Total Sales so that the Regions are ordered by sales with the largest quantities on top

Open the Tablix properties

Add sorting by fact_sales_count

Set the Target Server URL to your project

From the menu select: Project - Properties

Deploy the project

From the menu select: Build - Deploy

View the report

Fireup a web browser and enjoy the report

Dynamic Flat File Source – SQL Server Integration Services 2012

No Comments

Written by Luis Miguel – Originally Published on Jan. 13, 2015

Requiremts: Import/load data from a CSV file into SQL Server on a daily basis, the CSV file has a file name pattern that consists of the name of the target table and a date stamp with the format of yyyymmdd, for example “product_20150111.csv”.

The CSV file

product_yyyymmdd.csv

Step by step instructions

1 – Create an OLE DB connection to the destination table

OLE DB connection

2 – Add a string variable that will hold Flat File Source name

String variable to build the dynamic file name

3 – Create a Flat File Connection to the source CSV file

Flat file connection

3a – Select the Flat File Connection to retrieve its properties
3b – Click the ellipses button next to Expressions
3c – Set the property Connection String to the expression @[User::product_yyyymmdd]

Set the Connection String property to an expression

4 – Add an Execute SQL Task that will build the Flat File Source name

4a – Set its properties as follows
4ai – General – SQL Statement: SELECT ‘c:\data\product_’ + CONVERT(VARCHAR(10), GETDATE(), 112) + ‘.csv’
4aii – General – Result Set – Single Row
4aiii – Result Set – Result Name: 0 – Variable name: User::product_yyyymmdd

Execute SQL task

5 – Add a Data Flow Task

Add a Data Flow task

5a – Add a Flat File Source to the data flow task and point its connection manager to the flat file source created previously

Flat File source

6 – Add an OLEDB Destination

OLE DB destination

The Data Flow task

The Data Flow task

A successful execution

Package execution result

The table after the data from the CSV file was imported

SELECT * FROM products

Using Surrogate Keys to load the Data Warehouse Dimensions

No Comments

Written by Luis Miguel – Originally Published on Sept. 14, 2014

One of the most practical reasons for the use of surrogate keys when loading dimensions in a data warehouse is the need to aggregate data from disparate systems.

Implementing a data warehouse works well because it brings together data from two or more systems that exist independent of each other but need to be analyzed together.

Very frequently, companies acquire other businesses or are a conglomerate of small companies that operate independently. Modern data management techniques allow us to see a global perspective through the use of a data warehouse.

One of the challenges for a data warehouse architect is to build a system that homogenizes those autonomous data sources. In the example I use in this post, we see that a holdings company owns two paint distribution/retail operations. One of the entities in both of the transactional systems is named “colors”, and color is a dimension in the data warehouse because we need to analyze sales and other performance indicators by color.

The color tables in each of the transactional systems uses a primary key (color_id); in this case, the primary keys are integers. As expected, these keys cannot be reused in the dimension because they overlap/conflict, but we need to resolve this inconsistency to populate the color dimension and the sales fact table.

To demonstrate, the database for the Colorama store has a color named “White” with a primary key of “1”, while the primary key for “White” in the Paintmart system is “2”.

In this very simple example I’m going to use an intermediate “keymap” table where I’m going to do the mapping (and translation) from the transactional systems’ primary keys (color_id) to the dimension table surrogate key (dim_color_key), so that we have a consistent unique key for every color no matter where it comes from. We then use this surrogate key in both the color dimension and the sales fact tables.

In the COLORS table for the transactional system of Colorama; Red is color_id 4

In the COLORS table for the transactional system of Paintmart; Red is color_id 3

In dim_color in the data warehouse the surrogate key for Red is 4

The keymap table in the data warehouse does the translation from color_id in the transactional entities into dim_color_key in the fact table

detail_id 1-2 and 23-1 are sales of Red paint


line_id 5-2, 11-1, and 15-1 are sales of Red paint

PAINTDW.DBO.FACT_SALES

Observe how in fact_sales all the rows for sales of Red paint have been translated into the surrogate key dim_color_key with a value of 4

This same technique can be used to do the translation of other entity to dimension data, for example in this scenario the same situation is present in the “size” table.

Rename a file using SSIS 2008

No Comments

Written by: Luis Miguel – Originally Published on: August 17, 2013

In my previous post I explained how to unzip a file using SSIS. The scenario presented unzipped a single CSV file and the file name was static, so if you executed the package more than once the CSV file would be overwritten.

In today’s post I’m going to present a method to rename the unzipped file so you can preserve the data when the package is executed multiple times.

Steps used to implement this solution

1 – Create a string variable to hold a daily timestamp

The complete package with the new_name variable

The complete package with the new_name variable

2 – Add an Execute SQL Task to populate the variable

Populate the new_name variable

Populate the new_name variable


Enter this T-SQL script to assign a value to the new_file variable
DECLARE @import_timestamp CHAR(8)
DECLARE @new_file VARCHAR(64)
SET @import_timestamp = CONVERT(VARCHAR, GETDATE(), 112)
SET @new_file = ‘c:\unzip\sale_orders_’ + @import_timestamp + ‘.csv’
SELECT @new_file

3 – Add a Script Task component and enter the C# code used to rename the file
The "rename csv file" Script Task

Enter this C Sharp code to rename the file
public void Main()
{
string old_file;
string new_file;
old_file = “c:\\unzip\\sale_orders.csv”;
new_file = Dts.Variables["User::new_name"].Value.ToString();
File.Move(old_file, new_file);
}

Unzipping a file with SSIS 2008

No Comments

Written by: Luis Miguel – Originally Published on: August 11, 2013

The Business Intelligence Development Studio set of tools does not include the ability to unzip a compressed file natively.

If your project requires this functionality you will have resource to third party tools or some other technique to achieve the objective.

In this post I’m going to show how to unzip a compressed file using the JAR program which is part of the Java Developers Kit.

Here is the detailed list of steps I used to perform this task
1 – Download and install the Java Developers Kit
2 – Create a batch file that sets the path to the Bin folder and executes the JAR program
3 – Build an SSIS 2008 package that calls the batch file

Step number 1, downloading and installing the Java Developers Kit should be a very simple and smooth process, I have done this in several servers without any issues at all.

For step number 2, use Notepad to create a file with the extension CMD with the following code
SETLOCAL
SET PATH=”C:\Program Files\Java\jdk1.7.0_25\bin”
CD c:\data\orders
jar xf sale_orders.zip

For the string used in the SET PATH command just look up where in your system you installed Java

In this case I’m setting the working directory to c:\data\orders because that is where I have placed the zipped file, the batch file, and that is where the unzipped file will be placed

The working directory

The working directory

In step number 3 we are going to use an Execute Process Task to call the batch file from the SSIS package

The Execute Process Task

The Execute Process Task


The unzipped CSV file

The unzipped CSV file

The normal behavior of the JAR program is overwrite the destination file, so if you execute the batch file more than once and the target (unzipped file) already exists it will be overwritten. This could be the desired behavior and in a way is a good thing because that means you don’t have to worry about the program failing because the uncompressed file exists.

But it is also possible that you don’t want to lose the data (CSV files) from previous runs, to achieve this it will be necessary to rename the CSV file before the package is executed again, in my next posts I’m going build a package that does that.

Analysis Services 2008 Workshop

No Comments

Written by: Luis Miguel – Originally published on: April 21, 2012

The folks from Pragmatic Works were back in So. Fla. about three weeks ago, this time with a workshop based in Analysis Services 2008, after the positive experience with their previous session, I could not miss this one.

To sum it up, again, I was not disappointed. This was a two-day class and the instructor was Devin (Brian’s brother).

The good thing about these workshops is that even though they use Microsoft’s sample database AdventureWorks, Pragmatic Works always sprinkle you with real-world examples as realistic as possible if only to put the matter at hand in perspective of how it applies to scenarios we come across at work.

For an introductory class it was well worth the time and money spent, no regrets :) . I wish they had shown an example of building cubes from the ground up, starting with the definition of the Dimensions, instead of using the wizard but I guess that could be considered advanced stuff. They did go into Data Mining with an excellent explanation of basic concepts, which at this level was a nice bonus.

Plus, another shirt, another book, for what I heard from other participants in the workshop, the anticipation from the audience now is for an SSRS class.

Exporting a dataset to a Flat File Destination in SSIS 2008 [DRAFT]

No Comments

Written by Luis Miguel – Originally Published on Feb. 6, 2012

Here is a fairly detailed description of the request coming from management:

1 – Gather a list of new records from the Customers table and save it to a flat file on a local disk nightly
2 – Generate a fixed width ASCI file with the following structure:
Attribute (Field length)
- Customer Number (16)
- Customer Name (128)
- Address1 (128)
- Address2 (128)
- City (64)
- State (2)
- Zip (10)
- Company Id (16)

3 – Fill shorter values with spaces so all columns stay aligned in the flat file
4 – Segregate the datasets by accounting company and put them in separate text files
5 – Generate a file name with the following scheme companyname_yyyymmdd.txt

The following is a step-by-step guide of the process implemented using Integration Services 2008 to fulfill the requirements

I – Create a table with CHAR data type columns

There are two tables involved in this process
a – CUSTOMER_EXPORT – Is the list of companies to process since management does not want to include all companies in this process, the stored procedure reads from this table and performs the extract/save for the records marked export_ = 1 (True)

The Customer Export table where accounting company databases are flagged for processing

b – CUSTOMERS is the complete list of customers from all the companies, every single column on every single row in this table needs to have a value (a space for empty fields, empty fields cannot be NULL) so that in the query the empty fields become a string of spaces

The structure of the Customer table

The Customer table in design view, all relevant fields are CHAR type and do not allow nulls

The Customer table data

The Customer table data, the field 'updated_on' is used to identify (new) records to be exported

2 – Build a package to extract the data and save it to disk
3 – Write a stored procedure to drive the SSIS package
4 – Add an SQL Server job that executes the stored procedure every night

Create a new SSIS package and set its security to “Encrypt Sensitive with password”

2 – Create 3 package variables and set their data type to STRING
a – flat_file, company_id, export_date

The three package variables that will be passed as arguments when executing the package from the stored procedure


3 – Connection Managers
a – Add an OLE DB connection to the source database, use SQL Server authentication, in the real world your Windows credentials will not be calling the package and reading the data from the database

The OLE DB connection using SQL Server authentication and the option to save the password, passwords are encrypted everywhere they are used.


b – Add a flat file connection to the target text file, use an expression so that the destination file name can be dynamically generated, in this case the expression is “ConnectionString = @[User::flat_file]

The flat file connection using an expression for the connection string so the destination file name can be generated dynamically


4 – Add a Data Flow Task
a – Add an OLE DB source to the Customer table and use the SQL Command

SELECT
[customer_number] +
[name] +
[address1] +
[address2] +
[city] +
[state] +
[zip] +
[company_id] AS customer_record
FROM customers WHERE company_id = ? AND updated_on = ?

b – Set the input parameters to the variables company_id and export_date

[insert picture here]

c – Add a Flat File Destination and connect the output of the OLE DB data source to it

[insert picture here]

5 – Create a stored procedure to call (execute) or “run” the package
a – Make sure you encrypt the store procedure so that the package password is not visible
b – Keep the package password in a safe place, you will need it to open the package in BIDS

CREATE PROCEDURE [dbo].[usp_client_file]
WITH ENCRYPTION
AS
DECLARE @year_ CHAR(4)
DECLARE @month_ CHAR(2)
DECLARE @day_ CHAR(2)
DECLARE @export_date VARCHAR(128)
DECLARE @insert_command VARCHAR(2048)
DECLARE @company VARCHAR(128)
DECLARE @company_id VARCHAR(16)
DECLARE @flat_file VARCHAR(128)
DECLARE @dtexec_ VARCHAR(2048)
SET @year_ = CAST(DATEPART(year, getdate()) AS CHAR(4))
IF DATEPART(MONTH, GETDATE()) < 10 SET @month_ = ’0′ + CAST(DATEPART(MONTH, GETDATE()) AS CHAR(2)) ELSE SET @month_ = CAST(DATEPART(MONTH, GETDATE()) AS CHAR(2))
IF DATEPART(DAY, GETDATE()) < 10 SET @day_ = ’0′ + CAST(DATEPART(DAY, GETDATE()) AS CHAR(2)) ELSE SET @day_ = CAST(DATEPART(DAY, GETDATE()) AS CHAR(2))
SET @export_date = @year_ + @month_ + @day_
DECLARE db_cursor CURSOR FOR
SELECT name FROM [dbo].[customer_export] WHERE export_ = 1
OPEN db_cursor
FETCH NEXT FROM db_cursor INTO @company
WHILE @@FETCH_STATUS = 0
BEGIN
SET @company_id = (SELECT company_id FROM customer_export WHERE name = @company)
SET @dtexec_ = ‘dtexec /FILE c:\ssis\packages\clients_file.dtsx /DE “y0ur_p@ssw0rd” /SET \Package.Variables[User::flat_file].Value;”\”c:\exports\’ + @company + ‘_’ + @export_date + ‘.txt”" /SET \Package.Variables[User::export_date].Value;’ + @export_date + ‘ /SET \Package.Variables[User::company_id].Value;’ + @company_id + ”
EXEC xp_cmdshell @dtexec_
FETCH NEXT FROM db_cursor INTO @company
END
CLOSE db_cursor
DEALLOCATE db_cursor

And that takes care of the assignment, if you schedule this stored procedure to run once a day, the package will extract a list of new customers and save it to local storage using a unique file name driven by a timestamp

Using the SSIS 2008 Foreach Loop Container

No Comments

Written by Luis Miguel – Originally published on November 27, 2011

This post is a demonstration of how to use the SSIS 2008 Foreach Loop Container to execute a routine against a dataset one record at a time. I have chosen the DBA task of fixing orphaned users to illustrate this technique.

The challenge: We need to restore several databases that are ready for testing from our Development environment into our UAT instance, as this databases are restored the login/user “links” are lost and we end up with orphaned users, if we restore a dozen databases and our testing group consist of another dozen users the task of manually fixing them becomes a little daunting.

The solution: Use an SSIS package to automate the process of executing the system stored procedure SP_CHANGE_USERS_LOGIN to each an every user on each and every database

Fig. 1 – A list of “fake” databases and users in my Lab’s UAT system, the target of the multiple database restores

List of databases and logins

List of databases and logins

Fig. 2 – The user ABROWN exists “Without Login” after the restore

Database user without login

Database user without login

Fig. 3 – A high level view of the package using the Foreach Loop Container

High level view of the package with the Foreach Loop Container

High level view of the package with the Foreach Loop Container

1 – Add an OLE DB connection to the package

The OLE DB connection

The OLE DB connection

2 – Add a package level variable and set its type Object, for this example I called this variable “db_login_ado”

3 – Add an Execute SQL Task (code below) and set the ResultSet property to “Full result set”
SELECT D.name AS database_, L.name AS login_
FROM sys.databases D, sys.server_principals L
WHERE D.database_id > 4 AND (L.type_desc = ‘sql_login’)
ORDER BY D.name

Execute SQL Task

Execute SQL Task

3a – Add a result set, call it 0 (zero) and assign it to the db_login_ado variable

Result set 0 of the Execute SQL Task

Result set 0 of the Execute SQL Task

4 – Add a Foreach Loop Container and set it’s collection to “foreach ado enumerator”

4a – In the Collection area of the this container set the “ADO object source variable” property to the variable coming from the Execute SQL Task

The Foreach Loop Container

The Foreach Loop Container

5 – Add two String variables, one for the databases and the other one for the logins

6 – In the Variable Mappings area of the container add two entries one for each one of the string variables you just added in the previous step. Make sure your “database” variable, which the first column of the dataset is set to Index 0 and the “logins” variable, which the second column in the dataset is set to index 1

Variable Mappins of the Foreach Loop Container

Variable Mappins of the Foreach Loop Container

7 – Add an execute sql task to the foreach container
DECLARE @dbname VARCHAR(128)
DECLARE @dbuser VARCHAR(128)
DECLARE @commandchangeuser VARCHAR(1024)
SET @dbname = ?
SET @dbuser = ?
SET @commandchangeuser = ‘USE ‘ + @dbname + ‘; IF EXISTS (SELECT name FROM sysusers WHERE name = ”’ + @dbuser + ”’) EXEC sp_change_users_login ”auto_fix”, ‘ + @dbuser + ”
EXECUTE(@commandchangeuser)

Execute SQL Task inside the Foreach Loop Container

Execute SQL Task inside the Foreach Loop Container

7a – Map the input parameters to the string variables set in the Foreach Loop Container, parameter names must be 0 and 1, the parameter size is the length of the VARCHAR data type

Parameter Mappings of the Execute SQL Task

Parameter Mappings of the Execute SQL Task

A successful run of the package

The execution of the pacakage completed successfuly

The execution of the pacakage completed successfuly

The user ABROWN is now linked to the login ABROWN after running the package

User status after the package ran

User status after the package ran

Importing a server-side trace into a table and recycling the trace file

No Comments

Written by Luis Miguel – Originally published on November 6, 2011

Bringing the data captured by a server-side trace into a table is a very convenient way to review the trace results when monitoring and analyzing database activity. One thing I immediately thought of when implementing this kind of solution was to do it periodically and incrementally.

SQL Server 2008 does not come with built-in functionality to append data to an existing trace file or to overwrite it and start fresh using the same file name.

One possible arrangement to achieve this has been explained in other blogs and requires the use of several trace files which is SQL Server’s default behavior. Since this technique is not without flaws and I can afford to stop the data capturing process for a minute while I import the data and refresh the trace file I did come up with the following script which lets me simulate the “refreshing” feature not available in SQL Server 2008.

Run this command first to build the table that is going to store the trace data before you put the script in a store procedure or an SQL Server Agent job


SELECT *
INTO   db_name.schema_name.table_name
FROM   Fn_trace_gettable(‘d:\trace\my_trace_file.trc’, DEFAULT) trace 

Once you have the table in place you can use the script below to keep adding data to it as frequently as necessary or possible


DECLARE @trace_id INT
-- Use the path to your trace file to get the Id of the trace you are working with
SET @trace_id = (SELECT id
                 FROM   sys.traces
                 WHERE  [path] = ‘d:\trace\my_trace_file.trc’)
-- If you get something back the trace is running
IF @trace_id IS NOT NULL
  BEGIN
      -- Stop the trace so it dumps the data into the file
      EXEC Sp_trace_setstatus
        @traceid = @trace_id,
        @status = 0
      -- Import the data from the trace file into a table
      INSERT INTO db_name.schema_name.table_name
      SELECT *
      FROM   Fn_trace_gettable(‘d:\trace\my_trace_file.trc’, DEFAULT) trace
      EXEC Sp_trace_setstatus
        @traceid = @trace_id,
        @status = 2
      -- Delete the trace file
      EXEC Xp_cmdshell ‘erase d:\trace\my_trace_file.trc’
  END 

Now you are ready to kick start the trace again using the same (fixed) file name you used before, probably another step in an SQL Server Agent job.

Lifelong Learning – Knowledge is your friend

No Comments

Written by Luis Miguel – Originally published on October 15, 2011

A continuous learning effort is just one of a few constants across the different areas of the IT field. Thru the years I have, for the most part, satisfied the need for training thru self-study, with the occasional exception of an employer-sponsored week of Oracle 10 or SQL Server 2005 administration at an authorized training facility, and my personal experience is that not even official training at a local college matches the return on investment against doing the study yourself.

I recently took an offer from Brian Knight (famous BI author) and Pragmatic Works for a three-day SSIS workshop around the corner in Ft. Lauderdale, the price seemed reasonable and the content interesting. So I ended up “investing” 3 days of vacation time and $300.00 out of my own pocket to attend this event and I don’t regret it. This is my take on Brian Knight and the Pragmatic Works workshop.

The thing is worth it, money well spent, Brian knows this stuff, his classes are 100% fat free, even his “stupid jokes” are good and few, so you end up with a day of tasty substance. You even get some of your money back, Brian was very kind and threw in a couple of t-shirts for everyone, and so many books of his latest title (Microsoft Business Intelligence 24-Hour Trainer) that I think 80% of the class went home with one, they had doughnuts and coffee for breakfast and lunch was decent.

This particular workshop will not make you an SSIS developer, for that you need longer training and some experience, but it will definitely get you started in the right direction. The workshop covered the most important Data Flow Task transformations, with real world examples and tips acquired by Brian thru the years as a consultant, and this is probably the most valuable part of the workshop, Brian is your buddy that has been in the trenches for a long time and gives you that perspective and insight.

So yeah, good experience overall, in the future I’ll definitely consider other more in-depth offers from Pragmatic Works. In the mean time I’ll stay “hitting the books” on my own, stay thirsty [for knowledge] my friends.

Older Entries

Powered by Netfirms