RAVURI PRADEEP: microsoftExam Implementing datawarehouse using ssis 2012

Data Warehouse Logical Design

Lesson 1: Introducing Star and Snowflake Schemas. . . . . . . . . . . . . . . . . . . .

Lesson Summary

■■ The Star schema is the most common design for a DW.

■■ The Snowflake schema is more appropriate for POC projects.

■■ You should also determine the granularity of fact tables, as well as auditing and lineage

needs.

Lesson Review

Answer the following questions to test your knowledge of the information in this lesson. You

can find the answers to these questions and explanations of why each answer choice is correct

or incorrect in the “Answers” section at the end of this chapter.

1. Reporting from a Star schema is simpler than reporting from a normalized online

transactional processing (OLTP) schema. What are the reasons for wanting simpler

reporting? (Choose all that apply.)

A. A Star schema typically has fewer tables than a normalized schema. Therefore,

queries are simpler because they require fewer joins.

B. A Star schema has better support for numeric data types than a normalized relational

schema; therefore, it is easier to create aggregates.

C. There are specific Transact-SQL expressions that deal with Star schemas.

D. A Star schema is standardized and narrative; you can find the information you

need for a report quickly.

2. You are creating a quick POC project. Which schema is the most suitable for this kind

of a project?

A. Star schema

B. Normalized schema

C. Snowflake schema

D. XML schema

3. A Star schema has two types of tables. What are those two types? (Choose all that

apply.)

A. Lookup tables

B. Dimensions

C. Measures

D. Fact tables

Lesson 2: Designing Dimensions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Lesson Summary

■■ In a dimension, you have the following column types: keys, names, attributes, member

properties, translations, and lineage.

■■ Some attributes form natural hierarchies.

■■ There are standard solutions for the Slowly Changing Dimensions (SCD) problem.

Lesson Review

1. You implement a Type 2 solution for an SCD problem for a specific column. What do

you actually do when you get a changed value for the column from the source system?

A. Add a column for the previous value to the table. Move the current value of the

updated column to the new column. Update the current value with the new value

from the source system.

B. Insert a new row for the same dimension member with the new value for the

updated column. Use a surrogate key, because the business key is now duplicated.

Add a flag that denotes which row is current for a member.

C. Do nothing, because in a DW, you maintain history, you do not update dimension

data.

D. Update the value of the column just as it was updated in the source system.

2. Which kind of a column is not a part of a dimension?

A. Attribute

B. Measure

C. Key

D. Member property

E. Name

3. How can you spot natural hierarchies in a Snowflake schema?

A. You need to analyze the content of the attributes of each dimension.

B. Lookup tables for each dimension provide natural hierarchies.

C. A Snowflake schema does not support hierarchies.

D. You should convert the Snowflake schema to the Star schema, and then you would

spot the natural hierarchies immediately.

Lesson 3: Designing Fact Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Lesson Summary

■■ Fact tables include measures, foreign keys, and possibly an additional primary key and

lineage columns.

■■ Measures can be additive, non-additive, or semi-additive.

■■ For many-to-many relationships, you can introduce an additional intermediate

dimension.

34 Chapter 1 Data Warehouse Logical Design

Lesson Review

1. Over which dimension can you not use the SUM aggregate function for semi-additive

measures?

A. Customer

B. Product

C. Date

D. Employee

2. Which measures would you expect to be non-additive? (Choose all that apply.)

A. Price

B. Debit

C. SalesAmount

D. DiscountPct

E. UnitBalance

3. Which kind of a column is not part of a fact table?

A. Lineage

B. Measure

C. Key

D. Member property

Case Scenarios

In the following case scenarios, you apply what you’ve learned about Star and Snowflake

schemas, dimensions, and the additivity of measures. You can find the answers to these questions

in the “Answers” section at the end of this chapter.

Case Scenario 1: A Quick POC Project

You are hired to implement a quick POC data warehousing project. You have to prepare the

schema for sales data. Your customer’s SME would like to analyze sales data over customers,

products, and time. Before creating a DW and tables, you need to make a couple of decisions

and answer a couple of questions:

Suggested Practices Chapter 1 35

1. What kind of schema would you use?

2. What would the dimensions of your schema be?

3. Do you expect additive measures only?

Case Scenario 2: Extending the POC Project

After you implemented the POC sales data warehouse in Case Scenario 1, your customer

was very satisfied. In fact, the business would like to extend the project to a real, long-term

data warehouse. However, when interviewing analysts, you also discovered some points of

dissatisfaction.

Interviews

Here’s a list of company personnel who expressed some dissatisfaction during their interviews,

along with their statements:

■■ Sales SME “I don’t see correct aggregates over regions for historical data.”

■■ DBA Who Creates Reports “My queries are still complicated, with many joins.”

You need to solve these issues.

Questions

1. How would you address the Sales SME issue?

2. What kind of schema would you implement for a long-term DW?

3. How would you address the DBA’s issue?

Suggested Practices

To help you successfully master the exam objectives presented in this chapter, complete the

following tasks.

Analyze the AdventureWorksDW2012 Database

Thoroughly

To understand all kind of dimensions and fact tables, you should analyze the Adventure-

WorksDW2012 sample database thoroughly. There are cases for many data warehousing

problems you might encounter.

■■ Practice 1 Check all fact tables. Find all semi-additive measures.

■■ Practice 2 Find all hierarchies possible for the DimCustomer dimension. Include

attributes in the dimension and attributes in the lookup DimGeography table.

36 Chapter 1 Data Warehouse Logical Design

Check the SCD and Lineage in the AdventureWorksDW2012 Database

Although the AdventureWorksDW2012 database exemplifies many cases for data warehousing,

not all possible problems are covered. You should check for what is missing.

■■ Practice 1 Is there room for lineage information in all dimensions and fact tables?

How would you accommodate this information?

■■ Practice 2 Are there some important dimensions, such as those representing customers

and products, that are not prepared for a Type 2 SCD solution? How would you

prepare those dimensions for a Type 2 SCD solution?

Answers Chapter 1 37

Answers

This section contains answers to the lesson review questions and solutions to the case scenarios

in this chapter.

Lesson 1

1. Correct Answers: A and D

A. Correct: A Star schema typically has fewer tables than a normalized schema.

B. Incorrect: The support for data types depends on the database management

system, not on the schema.

C. Incorrect: There are no specific Transact-SQL expressions or commands for Star

schemas. However, there are some specific optimizations for Star schema queries.

D. Correct: The Star schema is a de facto standard for data warehouses. It is narrative;

the central table—the fact table—holds the measures, and the surrounding tables,

the dimensions, give context to those measures.

2. Correct Answer: C

A. Incorrect: The Star schema is more suitable for long-term DW projects.

B. Incorrect: A normalized schema is appropriate for OLTP LOB applications.

C. Correct: A Snowflake schema is appropriate for POC projects, because dimensions

are normalized and thus closer to source normalized schema.

D. Incorrect: An XML schema is used for validating XML documents, not for a DW.

3. Correct Answers: B and D

A. Incorrect: Lookup tables are involved in both Snowflake and normalized schemas.

B. Correct: Dimensions are part of a Star schema.

C. Incorrect: Measures are columns in a fact table, not tables by themselves.

D. Correct: A fact table is the central table of a Star schema.

Lesson 2

1. Correct Answer: B

A. Incorrect: This is Type 3 SCD management.

B. Correct: This is how you handle changes when you implement a Type 2 SCD

solution.

C. Incorrect: Maintaining history does not mean that the content of a DW is static.

D. Incorrect: This is Type 1 SCD management.

38 Chapter 1 Data Warehouse Logical Design

2. Correct Answer: B

A. Incorrect: Attributes are part of dimensions.

B. Correct: Measures are part of fact tables.

C. Incorrect: Keys are part of dimensions.

D. Incorrect: Member properties are part of dimensions.

E. Incorrect: Name columns are part of dimensions.

3. Correct Answer: B

A. Incorrect: You need to analyze the attribute names and content in order to spot

the hierarchies in a Star schema.

B. Correct: Lookup tables for dimensions denote natural hierarchies in a Snowflake

schema.

C. Incorrect: A Snowflake schema supports hierarchies.

D. Incorrect: You do not need to convert a Snowflake to a Star schema to spot the

hierarchies.

Lesson 3

1. Correct Answer: C

A. Incorrect: You can use SUM aggregate functions for semi-additive measures over

the Customer dimension.

B. Incorrect: You can use SUM aggregate functions for semi-additive measures over

the Product dimension.

C. Correct: You cannot use SUM aggregate functions for semi-additive measures

over the Date dimension.

D. Incorrect: You can use SUM aggregate functions for semi-additive measures over

the Employee dimension.

2. Correct Answers: A and D

A. Correct: Prices are not additive measures.

B. Incorrect: Debit is an additive measure.

C. Incorrect: Amounts are additive measures.

D. Correct: Discount percentages are not additive measures.

E. Incorrect: Balances are semi-additive measures.

Answers Chapter 1 39

3. Correct Answer: D

A. Incorrect: Lineage columns can be part of a fact table.

B. Incorrect: Measures are included in a fact table.

C. Incorrect: A fact table includes key columns.

D. Correct0: Member property is a type of column in a dimension.

Case Scenario 1

1. For a quick POC project, you should use the Snowflake schema.

2. You would have customer, product, and date dimensions.

3. No, you should expect some non-additive measures as well. For example, prices and

various percentages, such as discount percentage, are non-additive.

Case Scenario 2

1. You should implement a Type 2 solution for the slowly changing customer dimension.

2. For a long-term DW, you should choose a Star schema.

3. With Star schema design, you would address the DBA’s issue automatically.

Implementing a Data Warehouse

Lesson Summary

Lesson 1: Implementing Dimensions and Fact Tables

■■ In this lesson, you learned about implementing a data warehouse.

■■ For a data warehouse database, you should use the Simple recovery model.

■■ When creating a database, allocate enough space for data files and log files to prevent

autogrowth of the files.

■■ Use surrogate keys in dimensions in which you expect SCD Type 2 changes.

■■ Use computed columns.

Lesson Review

1. Which database objects and object properties can you use for autonumbering?

(Choose all that apply.)

A. IDENTITY property

B. SEQUENCE object

C. PRIMARY KEY constraint

D. CHECK constraint

2. What columns do you add to a table to support Type 2 SCD changes? (Choose all that

apply.)

A. Member properties

B. Current row flag

C. Lineage columns

D. Surrogate key

3. What is an inferred member?

A. A row in a fact table added during dimension load

B. A row with aggregated values

C. A row in a dimension added during fact table load

D. A computed column in a fact table

Lesson Summary

Lesson 2: Managing the Performance of a Data Warehouse

■■ In this lesson, you learned how to optimize data warehouse query performance.

■■ In a DW, you should not use many nonclustered indexes.

■■ Use small, integer surrogate columns for clustered primary keys.

■■ Use indexed views.

■■ Use columnstore indexes and exploit batch processing.

Chapter 2 Implementing a Data Warehouse

Lesson Review

1. Which types of data compression are supported by SQL Server? (Choose all that apply.)

A. Bitmap

B. Unicode

C. Row

D. Page

2. Which operators can benefit from batch processing? (Choose all that apply.)

A. Hash Join

B. Merge Join

C. Scan

D. Nested Loops Join

E. Filter

3. Why would you use indexed views? (Choose all that apply.)

A. To speed up queries that aggregate data

B. To speed up data load

C. To speed up selective queries

D. To speed up queries that involve multiple joins

Lesson Summary

Lesson 3: Loading and Auditing Loads

■■ Table partitioning is extremely useful for large fact tables with columnstore indexes.

■■ Partition switch is a metadata operation only if an index is aligned with its base table.

■■ You can add lineage information to your dimensions and fact tables to audit changes

to your DW on a row level.

Lesson Review

Answer the following questions to test your knowledge of the information in this lesson. You

can find the answers to these questions and explanations of why each answer choice is correct

or incorrect in the “Answers” section at the end of this chapter.

1. The database object that maps partitions of a table to filegroups is called a(n)

A. Aligned index

B. Partition function

C. Partition column

D. Partition scheme

2. If you want to switch content from a nonpartitioned table to a partition of a partitioned

table, what conditions must the nonpartitioned table meet? (Choose all that apply.)

A. It must have the same constraints as the partitioned table.

B. It must have the same compression as the partitioned table.

C. It must be in a special PartitionedTables schema.

D. It must have a check constraint on the partitioning column that guarantees that all

of the data goes to exactly one partition of the partitioned table.

E. It must have the same indexes as the partitioned table.

3. Which of the following T-SQL functions is not very useful for capturing lineage

information?

A. APP_NAME()

B. USER_NAME()

C. DEVICE_STATUS()

D. SUSER_SNAME()

Case Scenarios

In the following case scenarios, you apply what you’ve learned about optimized querying and

securing a data warehouse. You can find the answers to these questions in the “Answers” section

at the end of this chapter.

Suggested Practices Chapter 2 79

Case Scenario 1: Slow DW Reports

You have created a data warehouse and populated it. End users have started using it for

reports. However, they have also begun to complain about the performance of the reports.

Some of the very slow reports calculate running totals. You need to answer the following

questions.

1. What changes can you implement in your DW to speed up the reports?

2. Does it make sense to check the source queries of the reports with running totals?

Case Scenario 2: DW Administration Problems

Your end users are happy with the DW reporting performance. However, when talking with

a DBA, you were notified of potential problems. The DW transaction log grows by more than

10 GB per night. In addition, end users have started to create reports from staging tables, and

these reports show messy data. End users complain to the DBA that they cannot trust your

DW if they get such messy data in a report.

1. How can you address the runaway log problem?

2. What can you do to prevent end users from using the staging tables?

Suggested Practices

To help you successfully master the exam objectives presented in this chapter, complete the

following tasks.

Test Different Indexing Methods

For some queries, indexed views could be the best performance booster. For other queries,

columnstore indexes could be more appropriate. Still other queries would benefit from nonclustered

indexes on foreign keys.

■■ Practice 1 Write an aggregate query for Internet sales in the AdventureWorkDW2012

sample database. Create an appropriate indexed view and run the aggregate query.

Check the statistics IO and execution plan.

■■ Practice 2 Drop the indexed view and create a columnstore index. Run the query

and check the statistics IO and execution plan again.

■■ Practice 3 Drop the columnstore index and create nonclustered indexes on all foreign

keys of the fact table included in joins. Run the query and check the statistics IO

and execution plan again.

■■ Practice 4 In the DimCustomer dimension of the AdventureWorksDW2012 sample

database, there is a Suffix column. It is NULL for all rows but three. Create a filtered nonclustered

index on this column and test queries that read data from the DimCustomer

dimension using different filters. Check how the query performs when Suffix is NULL

and when Suffix is known (is not NULL).

80 Chapter 2 Implementing a Data Warehouse

Test Table Partitioning

In order to understand table partitioning thoroughly, you should test it with aligned and

nonaligned indexes.

■■ Practice 1 Partition the FactInternetSales table in the AdventureWorkDW2012 sample

database. Create aligned nonclustered indexes on all foreign keys of the fact table

included in joins of the query from the previous practice. Run the query and check the

execution plan.

■■ Practice 2 Create nonaligned nonclustered indexes on all foreign keys of the fact

table included in joins of the query from the previous practice. Run the query and

check the execution plan again.

Answers Chapter 2 81

Answers

This section contains answers to the lesson review questions and solutions to the case scenarios

in this chapter.

Lesson 1

1. Correct Answers: A and B

A. Correct: The IDENTITY property autonumbers rows.

B. Correct: You can use the new SQL Server 2012 SEQUENCE object for

autonumbering.

C. Incorrect: Primary keys are used to uniquely identify rows, not for

autonumbering.

D. Incorrect: Check constraints are used to enforce data integrity, not for

autonumbering.

2. Correct Answers: B and D

A. Incorrect: Member properties are dimension columns used for additional information

on reports only.

B. Correct: You need a current flag for denoting the current row when you implement

Type 2 SCD changes.

C. Incorrect: Lineage columns are used, as their name states, to track the lineage

information.

D. Correct: You need a new, surrogate key when you implement Type 2 SCD changes.

3. Correct Answer: C

A. Incorrect: You do not add rows to a fact table during dimension load.

B. Incorrect: You do not create rows with aggregated values.

C. Correct: A row in a dimension added during fact table load is called an inferred

member.

D. Incorrect: A computed column is just a computed column, not an inferred member.

Lesson 2

1. Correct Answers: B, C, and D

A. Incorrect: SQL Server does not support bitmap compression.

B. Correct: SQL Server supports Unicode compression. It is applied automatically

when you use either row or page compression.

C. Correct: SQL Server supports row compression.

D. Correct: SQL Server supports page compression.

82 Chapter 2 Implementing a Data Warehouse

2. Correct Answers: A, C, and E

A. Correct: Hash joins can use batch processing.

B. Incorrect: Merge joins do not use batch processing.

C. Correct: Scan operators can benefit from batch processing.

D. Incorrect: Nested loops joins do not use batch processing.

E. Correct: Filter operators use batch processing as well.

3. Correct Answers: A and D

A. Correct: Indexed views are especially useful for speeding up queries that aggregate

data.

B. Incorrect: As with any indexes, indexed views only slow down data load.

C. Incorrect: For selective queries, nonclustered indexes are more appropriate.

D. Correct: Indexed views can also speed up queries that perform multiple joins.

Lesson 3

1. Correct Answer: D

A. Incorrect: Aligned indexes are indexes with the same partitioning as their base

table.

B. Incorrect: The partition function does logical partitioning.

C. Incorrect: The partition column is the column used for partitioning.

D. Correct: The partition scheme does physical partitioning.

2. Correct Answers: A, B, D, and E

A. Correct: It must have the same constraints as the partitioned table.

B. Correct: It must have the same compression as the partitioned table.

C. Incorrect: There is no special schema for partitioned tables.

D. Correct: It must have a check constraint to guarantee that all data goes to a single

partition.

E. Correct: It must have the same indexes as the partitioned table.

3. Correct Answer: C

A. Incorrect: The APP_NAME() function can be useful for capturing lineage

information.

B. Incorrect: The USER_NAME() function can be useful for capturing lineage

information.

C. Correct: There is no DEVICE_STATUS() function in T-SQL.

D. Incorrect: The SUSER_SNAME() function can be useful for capturing lineage

information.

Answers Chapter 2 83

Case Scenario 1

1. You should consider using columnstore indexes, indexed views, data compression, and

table partitioning.

2. Yes, it is definitely worth checking the queries of the running totals reports. The queries

probably use joins or subqueries to calculate the running totals. Consider using

window functions for these calculations.

Case Scenario 2

1. You should check the DW database recovery model and change it to Simple. In addition,

you could use the DBCC SHRINKFILE command to shrink the transaction log to a

reasonable size.

2. End users apparently have permissions, at least the SELECT permission, on the staging

tables. Advise the DBA to revoke permissions from end users on staging tables. In

addition, to speed up the security administration, you should put all staging tables in a

separate schema, thus allowing the DBA to administer them as a group

Chapter 3 Creating SSIS Packages

Lesson Summary

Lesson 1: Using the SQL Server Import and Export Wizard

■■ The SQL Server Import and Export Wizard can be used for simple data movement

operations.

■■ The wizard allows you to create the destination database.

■■ Multiple objects can be transferred in the same operation.

■■ If the destination objects do not already exist, they can be created by the process.

■■ The SSIS package created by the wizard can be saved and reused.

Lesson Review

Answer the following questions to test your knowledge of the information in this lesson. You

can find the answers to these questions and explanations of why each answer choice is correct

or incorrect in the “Answers” section at the end of this chapter.

1. You need to move data from a production database into a testing database. You need

to extract the data from several objects in the source database, but your manager

has asked you to only copy about 10 percent of the rows from the largest production

tables. The testing database already exists, but without any tables. How would you approach

this task?

A. Use the Import and Export Wizard, copy all tables from the source database to the

empty destination database, and delete the excess rows from the largest tables.

B. Use the Import and Export Wizard multiple times—once for all the smaller tables,

and once for each large table, using the Write A Query To Specify The Data To

Transfer option to restrict the rows.

100 CHAPTER 3 Creating SSIS Packages

C. Use the Import and Export Wizard, copy all tables from the source database to the

empty destination database, and restrict the number of rows for each large table

by using the Edit SQL option in the Column Mappings window.

D. Use the Import and Export Wizard, configure it to copy all tables from the source

database to the empty destination database, save the SSIS package, and then,

before executing it, edit it by using SSDT to restrict the number of rows extracted

from the large tables.

2. You need to move data from an operational database into a data warehouse for the

very first time. The data warehouse has already been set up, and it already contains

some reference data. You have just finished preparing views in the operational database

that correspond to the dimension and fact tables of the data warehouse. How

would you approach this task?

A. Use the Import and Export Wizard and copy data from the dimension and fact

views in the operational database into the tables in the data warehouse, by using

the Drop And Re-create The Destination Table option in the Column Mappings

window for every non-empty destination table.

B. Use the Import and Export Wizard, configure it to copy data from the dimension

and fact views in the operational database into the tables in the data warehouse,

save the SSIS package, and then edit it by using SSDT to add appropriate data

merging functionalities for all destination tables.

C. Use the Import and Export Wizard and copy data from the dimension and fact

views in the operational database into the tables in the data warehouse, by using

the Merge Data Into The Destination Table option in the Column Mappings window

for every non-empty destination table.

D. Use SSDT instead of the Import and Export Wizard, because the wizard lacks appropriate

data transformation and merging capabilities.

3. When SSIS packages are saved to DTSX files, what format is used to store the SSIS

package definitions?

A. They are stored as binary files.

B. They are stored as plain text files.

C. They are stored as XML files.

D. They are stored as special Microsoft Word documents.

Lesson Summary

Lesson 2: Developing SSIS Packages in SSDT. . . . . . . . . . . . . . . . . . . . . . . . 101

■■ SSIS projects are developed by using SSDT, a specialized version of Visual Studio.

■■ SSDT provides the complete integrated development environment (IDE) required for

efficient development of SSIS packages.

■■ The SSIS toolbox is context-aware and will either allow you access to control flow tasks

or data flow components, depending on whether you are designing the control flow or

the data flow.

108 CHAPTER 3 Creating SSIS Packages

Lesson Review

Answer the following questions to test your knowledge of the information in this lesson. You

can find the answers to these questions and explanations of why each answer choice is correct

or incorrect in the “Answers” section at the end of this chapter.

1. Which statements best describe SQL Server Development Tools (SSDT)? (Choose all

that apply.)

A. SSDT is an extension of the SQL Server Management Studio that can be used to

create SSIS packages by means of a special wizard.

B. SSDT is a special edition of the SQL Server Management Studio, designed to provide

an improved user experience to developers who are not particularly familiar

with database administration.

C. SSDT is a special edition of Visual Studio, distributed with SQL Server 2012, providing

a rich database development tool set.

D. SSDT is a new service in SQL Server 2012 that can be used to perform SQL Server

maintenance tasks, such as data movements and similar data management processes.

2. Which of the following statements about simple and complex data movements are

true? (Choose all that apply.)

A. Simple data movements only have a single data source and a single data

destination.

B. Complex data movements require data to be transformed before it can be stored

at the destination.

C. In simple data movements, data transformations are limited to data type

conversion.

D. In complex data movements, additional programmatic logic is required to merge

source data with destination data.

3. Which of the following statements are true? (Choose all that apply.)

A. An SSIS package can contain one or more SSDT solutions, each performing a

specific data management operation.

B. An SSIS project can contain one or more SSIS packages.

C. An SSIS project can contain exactly one SSIS package.

D. SSIS packages contain programmatic logic used in data movements and data

transformation operations.

Lesson Summary

Lesson 3: Introducing Control Flow, Data Flow, and

Connection Managers . . . . 109

■■ Existing SSIS packages can be added to SSIS projects in SQL Server Data Tools (SSDT).

■■ Control flows contain the definitions of data management operations.

■■ Control flows determine the order and the conditions of execution.

■■ SSIS package settings can be parameterized, which allows them to be changed without

direct access to SSIS package definitions.

Lesson Review

Answer the following questions to test your knowledge of the information in this lesson. You

can find the answers to these questions and explanations of why each answer choice is correct

or incorrect in the “Answers” section at the end of this chapter.

1. The Execute SQL Task allows you to execute SQL statements and commands against

the data store. What tools do you have at your disposal when developing SSIS packages

to develop and test a SQL command? Choose all that apply.

A. SQL Server Management Studio (SSMS)

B. SQL Server Data Tools (SSDT)

C. The Execute SQL Task Editor

D. SQL Server Enterprise Manager (SSEM)

2. You need to execute two data flow operations in parallel after an Execute SQL Task has

been completed. How can you achieve that? (Choose all that apply.)

A. There is no way for two data flow operations to be executed in parallel in the same

SSIS package.

B. You can place both data flows inside the same data flow task and create a precedence

constraint leading from the preceding Execute SQL Task to the data flow task.

C. You can create two separate data flow tasks and create two precedence constraints

leading from the preceding Execute SQL Task to each of the two data flow tasks.

D. You can create two separate data flow tasks, place them inside a third data flow

task, and create a precedence constraint leading from the preceding Execute SQL

Task to the third data flow task.

3. Which precedence constraint can you use to allow Task B to execute after Task A even

if Task A has failed?

A. The failure precedence constraint, leading from Task A to Task B.

B. The success precedence constraint, leading from Task A to Task B.

C. The completion precedence constraint, leading from Task A to Task B.

D. Use two precedence constraints—a success precedence constraint, and a failure

precedence constraint, both leading from Task A to Task B.

Suggested Practices CHAPTER 3 125

Case Scenarios

In the following case scenarios, you apply what you’ve learned about creating SSIS packages.

You can find the answers to these questions in the “Answers” section at the end of this

chapter.

Case Scenario 1: Copying Production Data to Development

Your IT solution has been deployed to production, version one is complete, and it is now time

to start the work on the next version. To keep the data in the development and testing environment

as up to date as possible, your manager has asked you to design a data movement

solution to be used on a regular basis to copy a subset of production data from the production

data store into the development data store.

1. What method would you use to transfer the data on demand?

2. How would you maximize the reusability of the solution?

Case Scenario 2: Connection Manager Parameterization

Data warehousing maintenance solutions have outgrown your company’s existing infrastructure,

so new servers had to be installed, and this time all data warehousing applications will

use a dedicated network. In phase 1, all of your SSIS solutions must be redeployed to new

servers, and the system administrator has decided that SSIS projects deserve more network

bandwidth, so in phase 2 you will be allowed to dedicate a greater share of the network

bandwidth to your data movement processes.

1. How much additional development work will you have to do to complete phase 1?

2. What will you have to do to reconfigure all of the connection managers to use larger

network packets for phase 2?

Suggested Practices

To help you successfully master the exam objectives presented in this chapter, complete the

following tasks.

Use the Right Tool

If you can combine the benefits of early deployment with the benefits of providing ultimate

stability for the entire data warehouse solution, you might be able to achieve “the best of

both worlds.” For instance, early deployment might allow other members of your team to

begin their work sooner (for example, if you deploy an initial version of the data warehouse

before the model has been finalized, this might allow report development tasks to start early).

126 CHAPTER 3 Creating SSIS Packages

And stabilizing the entire data warehouse solution has obvious benefits (for example, if you

implement changes to the data warehouse model in stages, this might allow the iterations in

report development to be as reasonable as possible).

■■ Practice 1 Develop an initial data movement by using the Import and Export Wizard,

using views in the source data store to emulate data transformations.

■■ Practice 2 Modify the initial data movement—add proper data transformation logic

as well as appropriate logic to merge new or modified data with existing data.

Account for the Differences Between Development and

Production Environments

After a final version of a data warehousing solution has been deployed to production, any

additional work on the current version, even if these development activities could in fact be

reduced to “tweaking,” will eventually cause delays in the development of the next version.

With good parameterization, the burden of “tweaking” an existing solution is lifted from the

shoulders of the developer and is placed on the shoulders of the administrator.

■■ Practice 1 Review your existing data movement solutions, and create a list of settings

that could be beneficial to their maintenance in production.

■■ Practice 2 Create a list of elements in your data movement solution that could be

parameterized, but due to their possible effect on the operational characteristics of the

solution probably should not be parameterized.

Answers CHAPTER 3 127

Answers

This section contains answers to the lesson review questions and solutions to the case scenarios

in this chapter.

Lesson 1

1. Correct Answers: B and D

A. Incorrect: Even though this might seem like the quickest solution, it might only be

quick to develop. Copying a large amount of data from the production environment

to a testing environment should be avoided, especially if most of the data is

just going to be discarded from the destination database afterward.

B. Correct: It might appear cumbersome to design several SSIS packages for a single

data movement operation, but this approach will solve the principal problem while

also following good data management practices, such as avoiding unnecessary

data movements.

C. Incorrect: The Edit SQL option in the Column Mappings window of the Import and

Export Wizard cannot be used to modify the data retrieval query, only the destination

table definition.

D. Correct: An SSIS package created by the Import and Export Wizard can be edited

by using SSDT.

2. Correct Answers: B and D

A. Incorrect: Dropping and re-creating tables cannot be used to merge data.

B. Correct: You can use SSDT to add data merging capabilities to an SSIS package

created by the Import and Export Wizard.

C. Incorrect: No such option exists in the Import and Export Wizard.

D. Correct: You can use SSDT to design pretty much any kind of data movement

processes, especially when you want complete control over the operations needed

by the process, but keep in mind that designing SSIS packages “from scratch” may

not be as time efficient as possible.

3. Correct Answer: C

A. Incorrect: SSIS package files are not stored in binary format.

B. Incorrect: SSIS package files might appear as if they are saved as plain text files,

but they are actually well-formed XML files.

C. Correct: SSIS package files are stored in XML format; the DTSX file extension is

used for distinction.

D. Incorrect: SSIS packages are not Microsoft Word documents.

128 CHAPTER 3 Creating SSIS Packages

Lesson 2

1. Correct Answer: C

A. Incorrect: SSDT is not an extension of SSMS. It is a stand-alone application.

B. Incorrect: SSDT is not a special edition of SSMS. It is a special edition of Visual

Studio.

C. Correct: SSDT is a special edition of Visual Studio, with a complete database development

tool set.

D. Incorrect: SSDT is not a service.

2. Correct Answers: B and D

A. Incorrect: Simple data movements can have as many data sources and as many

data destinations as needed.

B. Correct: Data transformations are present in complex data movements.

C. Incorrect: Typically, in simple data movements, no transformations are needed,

because the data is transferred unchanged. However, it is possible to transform the

data at the source—such as by making retrieval queries or by using views or other

similar techniques.

D. Correct: Additional programmatic logic to merge source data with destination

data is present in complex data movements.

3. Correct Answers: B and D

A. Incorrect: SSIS packages cannot contain SSDT solutions.

B. Correct: An SSIS project can contain as many SSIS packages as needed.

C. Incorrect: An SSIS project can contain more than a single SSIS package.

D. Correct: SSIS packages contain the programmatic logic used in data management

operations, such as data movements and data transformations.

Lesson 3

1. Correct Answers: A and B

A. Correct: SSMS provides all the necessary functionalities to develop and test

SQL code.

B. Correct: SSDT does provide a query designer; it is available from the Data menu,

under Transact-SQL Editor/New Query Connection. Alternatively, the query editor

can also be started from the SQL Server Object Explorer by right-clicking a database

node, and selecting New Query... from the shortcut menu.

C. Incorrect: The Execute SQL Task Editor is just a text box into which you can type or

paste a SQL statement.

D. Incorrect: The Enterprise Manager was replaced with SSMS in SQL Server 2005.

Answers CHAPTER 3 129

2. Correct Answers: B and C

A. Incorrect: Parallel data flow execution is supported.

B. Correct: You can place multiple data flow operations inside the same data flow task.

C. Correct: You can, of course, place data flows in separate data flow tasks, and you

can create multiple precedence constraints leading from or to the same task, as

long as any two tasks are connected to each other only once.

D. Incorrect: You cannot place a data flow task inside a data flow task, because it

cannot contain tasks, only data flow components.

3. Correct Answer: C

A. Incorrect: The failure precedence constraint will allow Task B to execute only if

Task A has failed.

B. Incorrect: The success precedence constraint will prevent Task B from executing

if Task A fails.

C. Correct: The completion precedence constraint will allow Task B to execute

regardless of whether Task A has succeeded or has failed.

D. Incorrect: Only a single precedence constraint can be used to connect two

distinct tasks.

Case Scenario 1

1. An SSIS package stored in the file system, in the database, or in an unscheduled SQL

Server Agent Job would be appropriate.

2. At the very least, the SSIS package would have to be parameterized so that it can be

configured appropriately for the specific environment in which it is going to be used.

Additionally, the programmatic logic should account for merging new or modified

data with existing data.

Case Scenario 2

1. A properly parameterized SSIS package can be redeployed and reconfigured as many

times as needed, without the need for any additional development activities.

2. The network packet size property of OLE DB connections is not exposed to parameterization;

therefore, the entire connection string would have to be parameterized

Designing and Implementing Control Flow

Lesson Summary

Lesson 1: Connection Managers

■■ Connection managers are used to establish connections to data sources.

■■ Different data sources require different types of connection managers.

■■ The usability of a connection manager within an SSIS project or an SSIS package is

determined by its scope.

Lesson Review

Answer the following questions to test your knowledge of the information in this lesson. You

can find the answers to these questions and explanations of why each answer choice is correct

or incorrect in the “Answers” section at the end of this chapter.

1. You need to extract data from delimited text files. What connection manager type

would you choose?

A. A Flat File connection manager

B. An OLE DB connection manager

C. An ADO.NET connection manager

D. A File connection manager

2. Some of the data your company processes is sent in from partners via email.

How would you configure an SMTP connection manager to extract files from email

messages?

A. In the SMTP connection manager, configure the OperationMode setting to Send

And Receive.

B. It is not possible to use the SMTP connection manager in this way, because it can

only be used by SSIS to send email messages.

C. The SMTP connection manager supports sending and receiving email messages by

default, so no additional configuration is necessary.

D. It is not possible to use the SMTP connection manager for this; use the IMAP (Internet

Message Access Protocol) connection manager instead.

3. You need to extract data from a table in a SQL Server 2012 database. What connection

manager types can you use? (Choose all that apply.)

A. An ODBC connection manager

B. An OLE DB connection manager

C. A File connection manager

D. An ADO.NET connection manager

Lesson Summary

Lesson 2: Control Flow Tasks and Containers

■■ A rich collection of tasks supporting the most common data management operations

is provided by the SSIS design model.

■■ Control flow is defined by precedence constraints that determine the order and conditions

of execution.

■■ Tasks representing logical units of work can be grouped in containers.

■■ Loop containers allow a unit of work to be executed repeatedly.

Lesson Review

Answer the following questions to test your knowledge of the information in this lesson. You

can find the answers to these questions and explanations of why each answer choice is correct

or incorrect in the “Answers” section at the end of this chapter.

1. In your SSIS solution, you need to load a large set of rows into the database as quickly

as possible. The rows are stored in a delimited text file, and only one source column

needs its data type converted from String (used by the source column) to Decimal

(used by the destination column). What control flow task would be most suitable for

this operation?

A. The File System task would be perfect in this case, because it can read data from

files and can be configured to handle data type conversions.

164 Chapter 4 Designing and Implementing Control Flow

B. The Bulk Insert task would be the most appropriate, because it is the quickest and

can handle data type conversions.

C. The data flow task would have to be used, because the data needs to be transformed

before it can be loaded into the table.

D. No single control flow task can be used for this operation, because the data needs

to be extracted from the source file, transformed, and then loaded into the destination

table. At least three different tasks would have to be used—the Bulk Insert

task to load the data into a staging database, a Data Conversion task to convert

the data appropriately, and finally, an Execute SQL task to merge the transformed

data with existing destination data.

2. A part of your data consolidation process involves extracting data from Excel workbooks.

Occasionally, the data contains errors that cannot be corrected automatically.

How can you handle this problem by using SSIS?

A. Redirect the failed data flow task to an External Process task, open the problematic

Excel file in Excel, and prompt the user to correct the file before continuing the

data consolidation process.

B. Redirect the failed data flow task to a File System task that moves the erroneous

file to a dedicated location where an information worker can correct it later.

C. If the error cannot be corrected automatically, there is no way for SSIS to continue

with the automated data consolidation process.

D. None of the answers above are correct. Due to Excel’s strict data validation rules,

an Excel file cannot ever contain erroneous data.

3. In your ETL process, a few values need to be retrieved from a database at run time,

based on another value available at run time, and they cannot be retrieved as part of

any data flow task. Which task can you use in this case?

A. The Execute T-SQL Statement task

B. The Execute SQL task

C. The Expression task

D. The Execute Process task

Lesson Summary

Lesson 3: Precedence Constraints

■■ Precedence constraints determine the order of execution and the conditions that must

be met for the process to either continue or stop.

■■ Precedence constraints can even be used to allow the process to recover from failures.

Lesson Review

Answer the following questions to test your knowledge of the information in this lesson. You

can find the answers to these questions and explanations of why each answer choice is correct

or incorrect in the “Answers” section at the end of this chapter.

1. How is the order of execution, or the sequence of operations, defined in an SSIS

package?

A. The SSIS run time engine determines the order of execution automatically, based

on the type of operations, the available software and hardware resources, and the

size of data.

B. The sequence is defined by using precedence constraints.

C. The sequence is defined by using the Sequence container.

D. The sequence is defined at design time by using precedence constraints and

Sequence containers, but at run time the SSIS engine executes the operations in

the order set by the most appropriate execution plan for maximum performance.

170 Chapter 4 Designing and Implementing Control Flow

2. How does the Failure constraint affect the execution order?

A. The following task will only execute after the preceding task has failed.

B. The following task will only execute if the preceding task has not failed.

C. The following task will never execute, because this constraint is only used at

design time.

D. The following task will execute regardless of whether the preceding task has failed,

but an error will be written to the SSIS log.

3. In your ETL process, there are three external processes that need to be executed in

sequence, but you do not want to stop execution if any of them fails. Can this be

achieved by using precedence constraints? If so, which precedence constraints can

be used?

A. No, this cannot be achieved just by using precedence constraints.

B. Yes, this can be achieved by using completion precedence constraints between the

first and the second and between the second and the third Execute Process tasks,

and by using a success precedence constraint between the third Execute Process

task and the following task.

C. Yes, this can be achieved by using completion precedence constraints between

the first and the second, between the second and the third, and between the third

Execute Process task and the following task.

D. Yes, this can be achieved by using failure precedence constraints between the first

and the second, and between the second and the third Execute Process tasks, and

by using a completion precedence constraint between the third Execute Process

task and the following task.

Case Scenarios

In the following case scenarios, you apply what you’ve learned about designing and implementing

control flow. You can find the answers to these questions in the “Answers” section at

the end of this chapter.

Case Scenario 1: Creating a Cleanup Process

In your data management system, there are two data warehouses (that is, in addition to

the operational data store); the principal data warehouse contains all the data, including all

historical data, and the secondary data warehouse is used by web applications, exposing your

data to customers, and should therefore only contain current data. Your data warehousing

solution is already in place, moving data into both data warehouses.

Suggested Practices Chapter 4 171

You have been tasked with creating an additional process to determine which data is no

longer current and must therefore be removed from the secondary data warehouse.

1. How will you determine which rows need to be removed?

2. What technique would you use to perform the removal as efficiently as possible?

Case Scenario 2: Integrating External Processes

In its data management scenarios, your company uses a mix of proprietary solutions that your

team has developed and third party, off-the-shelf solutions that to you, a seasoned developer,

appear just like black boxes—you can trust these solutions to work as expected without

even a faint clue about how this is actually done.

In your data warehousing solution, you need to consolidate data from a solution of your

own and from two diverse black boxes, one of which has a special data extraction tool (a

stand-alone application) that retrieves data from the internal store and saves it to files in the

file system, while the other one exposes its API layer, providing access to its internal data

retrieval functionalities.

1. What functionalities does SSIS provide that can be used to integrate such diverse solutions

into a single SSIS process?

2. How would you use the SQL Server platform to solve this problem?

Suggested Practices

To help you successfully master the exam objectives presented in this chapter, complete the

following tasks.

A Complete Data Movement Solution

Think about all the processes involved in preparing data to be consumed by the analytical

tools used by your managers when determining company strategy. The data must be consolidated,

cleansed, loaded into the data warehouse (which includes transforming it appropriately

and merging it with existing data), and finally, it must be loaded into the multidimensional

SQL Server Analysis Database.

All of these processes need to be performed automatically, at predefined times, and provide

current and coherent results in the end. Otherwise, the operations and—eventually—

even the future of the company might be at stake.

RAVURI PRADEEP

Thursday, November 13, 2014

microsoftExam Implementing datawarehouse using ssis 2012

1 comment:

Followers

Blog Archive

About Me