Thursday, November 13, 2014

microsoftExam Implementing datawarehouse using ssis 2012

Data Warehouse Logical Design
Lesson 1: Introducing Star and Snowflake Schemas. . . . . . . . . . . . . . . . . . . .
Lesson Summary
■■ The Star schema is the most common design for a DW.
■■ The Snowflake schema is more appropriate for POC projects.
■■ You should also determine the granularity of fact tables, as well as auditing and lineage
needs.
Lesson Review
Answer the following questions to test your knowledge of the information in this lesson. You
can find the answers to these questions and explanations of why each answer choice is correct
or incorrect in the “Answers” section at the end of this chapter.
1. Reporting from a Star schema is simpler than reporting from a normalized online
transactional processing (OLTP) schema. What are the reasons for wanting simpler
reporting? (Choose all that apply.)
A. A Star schema typically has fewer tables than a normalized schema. Therefore,
queries are simpler because they require fewer joins.
B. A Star schema has better support for numeric data types than a normalized relational
schema; therefore, it is easier to create aggregates.
C. There are specific Transact-SQL expressions that deal with Star schemas.
D. A Star schema is standardized and narrative; you can find the information you
need for a report quickly.
2. You are creating a quick POC project. Which schema is the most suitable for this kind
of a project?
A. Star schema
B. Normalized schema
C. Snowflake schema
D. XML schema
3. A Star schema has two types of tables. What are those two types? (Choose all that
apply.)
A. Lookup tables
B. Dimensions
C. Measures
D. Fact tables
Lesson 2: Designing Dimensions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Lesson Summary
■■ In a dimension, you have the following column types: keys, names, attributes, member
properties, translations, and lineage.
■■ Some attributes form natural hierarchies.
■■ There are standard solutions for the Slowly Changing Dimensions (SCD) problem.
Lesson Review
1. You implement a Type 2 solution for an SCD problem for a specific column. What do
you actually do when you get a changed value for the column from the source system?
A. Add a column for the previous value to the table. Move the current value of the
updated column to the new column. Update the current value with the new value
from the source system.
B. Insert a new row for the same dimension member with the new value for the
updated column. Use a surrogate key, because the business key is now duplicated.
Add a flag that denotes which row is current for a member.
C. Do nothing, because in a DW, you maintain history, you do not update dimension
data.
D. Update the value of the column just as it was updated in the source system.
2. Which kind of a column is not a part of a dimension?
A. Attribute
B. Measure
C. Key
D. Member property
E. Name
3. How can you spot natural hierarchies in a Snowflake schema?
A. You need to analyze the content of the attributes of each dimension.
B. Lookup tables for each dimension provide natural hierarchies.
C. A Snowflake schema does not support hierarchies.
D. You should convert the Snowflake schema to the Star schema, and then you would
spot the natural hierarchies immediately.
Lesson 3: Designing Fact Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Lesson Summary
■■ Fact tables include measures, foreign keys, and possibly an additional primary key and
lineage columns.
■■ Measures can be additive, non-additive, or semi-additive.
■■ For many-to-many relationships, you can introduce an additional intermediate
dimension.
34 Chapter 1 Data Warehouse Logical Design
Lesson Review
1. Over which dimension can you not use the SUM aggregate function for semi-additive
measures?
A. Customer
B. Product
C. Date
D. Employee
2. Which measures would you expect to be non-additive? (Choose all that apply.)
A. Price
B. Debit
C. SalesAmount
D. DiscountPct
E. UnitBalance
3. Which kind of a column is not part of a fact table?
A. Lineage
B. Measure
C. Key
D. Member property
Case Scenarios
In the following case scenarios, you apply what you’ve learned about Star and Snowflake
schemas, dimensions, and the additivity of measures. You can find the answers to these questions
in the “Answers” section at the end of this chapter.
Case Scenario 1: A Quick POC Project
You are hired to implement a quick POC data warehousing project. You have to prepare the
schema for sales data. Your customer’s SME would like to analyze sales data over customers,
products, and time. Before creating a DW and tables, you need to make a couple of decisions
and answer a couple of questions:
Suggested Practices Chapter 1 35
1. What kind of schema would you use?
2. What would the dimensions of your schema be?
3. Do you expect additive measures only?
Case Scenario 2: Extending the POC Project
After you implemented the POC sales data warehouse in Case Scenario 1, your customer
was very satisfied. In fact, the business would like to extend the project to a real, long-term
data warehouse. However, when interviewing analysts, you also discovered some points of
dissatisfaction.
Interviews
Here’s a list of company personnel who expressed some dissatisfaction during their interviews,
along with their statements:
■■ Sales SME “I don’t see correct aggregates over regions for historical data.”
■■ DBA Who Creates Reports “My queries are still complicated, with many joins.”
You need to solve these issues.
Questions
1. How would you address the Sales SME issue?
2. What kind of schema would you implement for a long-term DW?
3. How would you address the DBA’s issue?
Suggested Practices
To help you successfully master the exam objectives presented in this chapter, complete the
following tasks.
Analyze the AdventureWorksDW2012 Database
Thoroughly
To understand all kind of dimensions and fact tables, you should analyze the Adventure-
WorksDW2012 sample database thoroughly. There are cases for many data warehousing
problems you might encounter.
■■ Practice 1 Check all fact tables. Find all semi-additive measures.
■■ Practice 2 Find all hierarchies possible for the DimCustomer dimension. Include
attributes in the dimension and attributes in the lookup DimGeography table.

36 Chapter 1 Data Warehouse Logical Design
Check the SCD and Lineage in the AdventureWorksDW2012 Database
Although the AdventureWorksDW2012 database exemplifies many cases for data warehousing,
not all possible problems are covered. You should check for what is missing.
■■ Practice 1 Is there room for lineage information in all dimensions and fact tables?
How would you accommodate this information?
■■ Practice 2 Are there some important dimensions, such as those representing customers
and products, that are not prepared for a Type 2 SCD solution? How would you
prepare those dimensions for a Type 2 SCD solution?

Answers Chapter 1 37
Answers
This section contains answers to the lesson review questions and solutions to the case scenarios
in this chapter.
Lesson 1
1. Correct Answers: A and D
A. Correct: A Star schema typically has fewer tables than a normalized schema.
B. Incorrect: The support for data types depends on the database management
system, not on the schema.
C. Incorrect: There are no specific Transact-SQL expressions or commands for Star
schemas. However, there are some specific optimizations for Star schema queries.
D. Correct: The Star schema is a de facto standard for data warehouses. It is narrative;
the central table—the fact table—holds the measures, and the surrounding tables,
the dimensions, give context to those measures.
2. Correct Answer: C
A. Incorrect: The Star schema is more suitable for long-term DW projects.
B. Incorrect: A normalized schema is appropriate for OLTP LOB applications.
C. Correct: A Snowflake schema is appropriate for POC projects, because dimensions
are normalized and thus closer to source normalized schema.
D. Incorrect: An XML schema is used for validating XML documents, not for a DW.
3. Correct Answers: B and D
A. Incorrect: Lookup tables are involved in both Snowflake and normalized schemas.
B. Correct: Dimensions are part of a Star schema.
C. Incorrect: Measures are columns in a fact table, not tables by themselves.
D. Correct: A fact table is the central table of a Star schema.
Lesson 2
1. Correct Answer: B
A. Incorrect: This is Type 3 SCD management.
B. Correct: This is how you handle changes when you implement a Type 2 SCD
solution.
C. Incorrect: Maintaining history does not mean that the content of a DW is static.
D. Incorrect: This is Type 1 SCD management.

38 Chapter 1 Data Warehouse Logical Design
2. Correct Answer: B
A. Incorrect: Attributes are part of dimensions.
B. Correct: Measures are part of fact tables.
C. Incorrect: Keys are part of dimensions.
D. Incorrect: Member properties are part of dimensions.
E. Incorrect: Name columns are part of dimensions.
3. Correct Answer: B
A. Incorrect: You need to analyze the attribute names and content in order to spot
the hierarchies in a Star schema.
B. Correct: Lookup tables for dimensions denote natural hierarchies in a Snowflake
schema.
C. Incorrect: A Snowflake schema supports hierarchies.
D. Incorrect: You do not need to convert a Snowflake to a Star schema to spot the
hierarchies.
Lesson 3
1. Correct Answer: C
A. Incorrect: You can use SUM aggregate functions for semi-additive measures over
the Customer dimension.
B. Incorrect: You can use SUM aggregate functions for semi-additive measures over
the Product dimension.
C. Correct: You cannot use SUM aggregate functions for semi-additive measures
over the Date dimension.
D. Incorrect: You can use SUM aggregate functions for semi-additive measures over
the Employee dimension.
2. Correct Answers: A and D
A. Correct: Prices are not additive measures.
B. Incorrect: Debit is an additive measure.
C. Incorrect: Amounts are additive measures.
D. Correct: Discount percentages are not additive measures.
E. Incorrect: Balances are semi-additive measures.

Answers Chapter 1 39
3. Correct Answer: D
A. Incorrect: Lineage columns can be part of a fact table.
B. Incorrect: Measures are included in a fact table.
C. Incorrect: A fact table includes key columns.
D. Correct0: Member property is a type of column in a dimension.
Case Scenario 1
1. For a quick POC project, you should use the Snowflake schema.
2. You would have customer, product, and date dimensions.
3. No, you should expect some non-additive measures as well. For example, prices and
various percentages, such as discount percentage, are non-additive.
Case Scenario 2
1. You should implement a Type 2 solution for the slowly changing customer dimension.
2. For a long-term DW, you should choose a Star schema.
3. With Star schema design, you would address the DBA’s issue automatically.
Implementing a Data Warehouse
Lesson Summary
Lesson 1: Implementing Dimensions and Fact Tables
■■ In this lesson, you learned about implementing a data warehouse.
■■ For a data warehouse database, you should use the Simple recovery model.
■■ When creating a database, allocate enough space for data files and log files to prevent
autogrowth of the files.
■■ Use surrogate keys in dimensions in which you expect SCD Type 2 changes.
■■ Use computed columns.
Lesson Review
1. Which database objects and object properties can you use for autonumbering?
(Choose all that apply.)
A. IDENTITY property
B. SEQUENCE object
C. PRIMARY KEY constraint
D. CHECK constraint
2. What columns do you add to a table to support Type 2 SCD changes? (Choose all that
apply.)
A. Member properties
B. Current row flag
C. Lineage columns
D. Surrogate key
3. What is an inferred member?
A. A row in a fact table added during dimension load
B. A row with aggregated values
C. A row in a dimension added during fact table load
D. A computed column in a fact table
Lesson Summary
Lesson 2: Managing the Performance of a Data Warehouse
■■ In this lesson, you learned how to optimize data warehouse query performance.
■■ In a DW, you should not use many nonclustered indexes.
■■ Use small, integer surrogate columns for clustered primary keys.
■■ Use indexed views.
■■ Use columnstore indexes and exploit batch processing.
Chapter 2 Implementing a Data Warehouse
Lesson Review
1. Which types of data compression are supported by SQL Server? (Choose all that apply.)
A. Bitmap
B. Unicode
C. Row
D. Page
2. Which operators can benefit from batch processing? (Choose all that apply.)
A. Hash Join
B. Merge Join
C. Scan
D. Nested Loops Join
E. Filter
3. Why would you use indexed views? (Choose all that apply.)
A. To speed up queries that aggregate data
B. To speed up data load
C. To speed up selective queries
D. To speed up queries that involve multiple joins
Lesson Summary
Lesson 3: Loading and Auditing Loads

■■ Table partitioning is extremely useful for large fact tables with columnstore indexes.
■■ Partition switch is a metadata operation only if an index is aligned with its base table.
■■ You can add lineage information to your dimensions and fact tables to audit changes
to your DW on a row level.
Lesson Review
Answer the following questions to test your knowledge of the information in this lesson. You
can find the answers to these questions and explanations of why each answer choice is correct
or incorrect in the “Answers” section at the end of this chapter.
1. The database object that maps partitions of a table to filegroups is called a(n)
A. Aligned index
B. Partition function
C. Partition column
D. Partition scheme
2. If you want to switch content from a nonpartitioned table to a partition of a partitioned
table, what conditions must the nonpartitioned table meet? (Choose all that apply.)
A. It must have the same constraints as the partitioned table.
B. It must have the same compression as the partitioned table.
C. It must be in a special PartitionedTables schema.
D. It must have a check constraint on the partitioning column that guarantees that all
of the data goes to exactly one partition of the partitioned table.
E. It must have the same indexes as the partitioned table.
3. Which of the following T-SQL functions is not very useful for capturing lineage
information?
A. APP_NAME()
B. USER_NAME()
C. DEVICE_STATUS()
D. SUSER_SNAME()
Case Scenarios
In the following case scenarios, you apply what you’ve learned about optimized querying and
securing a data warehouse. You can find the answers to these questions in the “Answers” section
at the end of this chapter.

Suggested Practices Chapter 2 79
Case Scenario 1: Slow DW Reports
You have created a data warehouse and populated it. End users have started using it for
reports. However, they have also begun to complain about the performance of the reports.
Some of the very slow reports calculate running totals. You need to answer the following
questions.
1. What changes can you implement in your DW to speed up the reports?
2. Does it make sense to check the source queries of the reports with running totals?
Case Scenario 2: DW Administration Problems
Your end users are happy with the DW reporting performance. However, when talking with
a DBA, you were notified of potential problems. The DW transaction log grows by more than
10 GB per night. In addition, end users have started to create reports from staging tables, and
these reports show messy data. End users complain to the DBA that they cannot trust your
DW if they get such messy data in a report.
1. How can you address the runaway log problem?
2. What can you do to prevent end users from using the staging tables?
Suggested Practices
To help you successfully master the exam objectives presented in this chapter, complete the
following tasks.
Test Different Indexing Methods
For some queries, indexed views could be the best performance booster. For other queries,
columnstore indexes could be more appropriate. Still other queries would benefit from nonclustered
indexes on foreign keys.
■■ Practice 1 Write an aggregate query for Internet sales in the AdventureWorkDW2012
sample database. Create an appropriate indexed view and run the aggregate query.
Check the statistics IO and execution plan.
■■ Practice 2 Drop the indexed view and create a columnstore index. Run the query
and check the statistics IO and execution plan again.
■■ Practice 3 Drop the columnstore index and create nonclustered indexes on all foreign
keys of the fact table included in joins. Run the query and check the statistics IO
and execution plan again.
■■ Practice 4 In the DimCustomer dimension of the AdventureWorksDW2012 sample
database, there is a Suffix column. It is NULL for all rows but three. Create a filtered nonclustered
index on this column and test queries that read data from the DimCustomer
dimension using different filters. Check how the query performs when Suffix is NULL
and when Suffix is known (is not NULL).

80 Chapter 2 Implementing a Data Warehouse
Test Table Partitioning
In order to understand table partitioning thoroughly, you should test it with aligned and
nonaligned indexes.
■■ Practice 1 Partition the FactInternetSales table in the AdventureWorkDW2012 sample
database. Create aligned nonclustered indexes on all foreign keys of the fact table
included in joins of the query from the previous practice. Run the query and check the
execution plan.
■■ Practice 2 Create nonaligned nonclustered indexes on all foreign keys of the fact
table included in joins of the query from the previous practice. Run the query and
check the execution plan again.

Answers Chapter 2 81
Answers
This section contains answers to the lesson review questions and solutions to the case scenarios
in this chapter.
Lesson 1
1. Correct Answers: A and B
A. Correct: The IDENTITY property autonumbers rows.
B. Correct: You can use the new SQL Server 2012 SEQUENCE object for
autonumbering.
C. Incorrect: Primary keys are used to uniquely identify rows, not for
autonumbering.
D. Incorrect: Check constraints are used to enforce data integrity, not for
autonumbering.
2. Correct Answers: B and D
A. Incorrect: Member properties are dimension columns used for additional information
on reports only.
B. Correct: You need a current flag for denoting the current row when you implement
Type 2 SCD changes.
C. Incorrect: Lineage columns are used, as their name states, to track the lineage
information.
D. Correct: You need a new, surrogate key when you implement Type 2 SCD changes.
3. Correct Answer: C
A. Incorrect: You do not add rows to a fact table during dimension load.
B. Incorrect: You do not create rows with aggregated values.
C. Correct: A row in a dimension added during fact table load is called an inferred
member.
D. Incorrect: A computed column is just a computed column, not an inferred member.
Lesson 2
1. Correct Answers: B, C, and D
A. Incorrect: SQL Server does not support bitmap compression.
B. Correct: SQL Server supports Unicode compression. It is applied automatically
when you use either row or page compression.
C. Correct: SQL Server supports row compression.
D. Correct: SQL Server supports page compression.

82 Chapter 2 Implementing a Data Warehouse
2. Correct Answers: A, C, and E
A. Correct: Hash joins can use batch processing.
B. Incorrect: Merge joins do not use batch processing.
C. Correct: Scan operators can benefit from batch processing.
D. Incorrect: Nested loops joins do not use batch processing.
E. Correct: Filter operators use batch processing as well.
3. Correct Answers: A and D
A. Correct: Indexed views are especially useful for speeding up queries that aggregate
data.
B. Incorrect: As with any indexes, indexed views only slow down data load.
C. Incorrect: For selective queries, nonclustered indexes are more appropriate.
D. Correct: Indexed views can also speed up queries that perform multiple joins.
Lesson 3
1. Correct Answer: D
A. Incorrect: Aligned indexes are indexes with the same partitioning as their base
table.
B. Incorrect: The partition function does logical partitioning.
C. Incorrect: The partition column is the column used for partitioning.
D. Correct: The partition scheme does physical partitioning.
2. Correct Answers: A, B, D, and E
A. Correct: It must have the same constraints as the partitioned table.
B. Correct: It must have the same compression as the partitioned table.
C. Incorrect: There is no special schema for partitioned tables.
D. Correct: It must have a check constraint to guarantee that all data goes to a single
partition.
E. Correct: It must have the same indexes as the partitioned table.
3. Correct Answer: C
A. Incorrect: The APP_NAME() function can be useful for capturing lineage
information.
B. Incorrect: The USER_NAME() function can be useful for capturing lineage
information.
C. Correct: There is no DEVICE_STATUS() function in T-SQL.
D. Incorrect: The SUSER_SNAME() function can be useful for capturing lineage
information.

Answers Chapter 2 83
Case Scenario 1
1. You should consider using columnstore indexes, indexed views, data compression, and
table partitioning.
2. Yes, it is definitely worth checking the queries of the running totals reports. The queries
probably use joins or subqueries to calculate the running totals. Consider using
window functions for these calculations.
Case Scenario 2
1. You should check the DW database recovery model and change it to Simple. In addition,
you could use the DBCC SHRINKFILE command to shrink the transaction log to a
reasonable size.
2. End users apparently have permissions, at least the SELECT permission, on the staging
tables. Advise the DBA to revoke permissions from end users on staging tables. In
addition, to speed up the security administration, you should put all staging tables in a
separate schema, thus allowing the DBA to administer them as a group
Chapter 3 Creating SSIS Packages
Lesson Summary
Lesson 1: Using the SQL Server Import and Export Wizard
■■ The SQL Server Import and Export Wizard can be used for simple data movement
operations.
■■ The wizard allows you to create the destination database.
■■ Multiple objects can be transferred in the same operation.
■■ If the destination objects do not already exist, they can be created by the process.
■■ The SSIS package created by the wizard can be saved and reused.
Lesson Review
Answer the following questions to test your knowledge of the information in this lesson. You
can find the answers to these questions and explanations of why each answer choice is correct
or incorrect in the “Answers” section at the end of this chapter.
1. You need to move data from a production database into a testing database. You need
to extract the data from several objects in the source database, but your manager
has asked you to only copy about 10 percent of the rows from the largest production
tables. The testing database already exists, but without any tables. How would you approach
this task?
A. Use the Import and Export Wizard, copy all tables from the source database to the
empty destination database, and delete the excess rows from the largest tables.
B. Use the Import and Export Wizard multiple times—once for all the smaller tables,
and once for each large table, using the Write A Query To Specify The Data To
Transfer option to restrict the rows.

100 CHAPTER 3 Creating SSIS Packages
C. Use the Import and Export Wizard, copy all tables from the source database to the
empty destination database, and restrict the number of rows for each large table
by using the Edit SQL option in the Column Mappings window.
D. Use the Import and Export Wizard, configure it to copy all tables from the source
database to the empty destination database, save the SSIS package, and then,
before executing it, edit it by using SSDT to restrict the number of rows extracted
from the large tables.
2. You need to move data from an operational database into a data warehouse for the
very first time. The data warehouse has already been set up, and it already contains
some reference data. You have just finished preparing views in the operational database
that correspond to the dimension and fact tables of the data warehouse. How
would you approach this task?
A. Use the Import and Export Wizard and copy data from the dimension and fact
views in the operational database into the tables in the data warehouse, by using
the Drop And Re-create The Destination Table option in the Column Mappings
window for every non-empty destination table.
B. Use the Import and Export Wizard, configure it to copy data from the dimension
and fact views in the operational database into the tables in the data warehouse,
save the SSIS package, and then edit it by using SSDT to add appropriate data
merging functionalities for all destination tables.
C. Use the Import and Export Wizard and copy data from the dimension and fact
views in the operational database into the tables in the data warehouse, by using
the Merge Data Into The Destination Table option in the Column Mappings window
for every non-empty destination table.
D. Use SSDT instead of the Import and Export Wizard, because the wizard lacks appropriate
data transformation and merging capabilities.
3. When SSIS packages are saved to DTSX files, what format is used to store the SSIS
package definitions?
A. They are stored as binary files.
B. They are stored as plain text files.
C. They are stored as XML files.
D. They are stored as special Microsoft Word documents.
Lesson Summary
Lesson 2: Developing SSIS Packages in SSDT. . . . . . . . . . . . . . . . . . . . . . . . 101
■■ SSIS projects are developed by using SSDT, a specialized version of Visual Studio.
■■ SSDT provides the complete integrated development environment (IDE) required for
efficient development of SSIS packages.
■■ The SSIS toolbox is context-aware and will either allow you access to control flow tasks
or data flow components, depending on whether you are designing the control flow or
the data flow.

108 CHAPTER 3 Creating SSIS Packages
Lesson Review
Answer the following questions to test your knowledge of the information in this lesson. You
can find the answers to these questions and explanations of why each answer choice is correct
or incorrect in the “Answers” section at the end of this chapter.
1. Which statements best describe SQL Server Development Tools (SSDT)? (Choose all
that apply.)
A. SSDT is an extension of the SQL Server Management Studio that can be used to
create SSIS packages by means of a special wizard.
B. SSDT is a special edition of the SQL Server Management Studio, designed to provide
an improved user experience to developers who are not particularly familiar
with database administration.
C. SSDT is a special edition of Visual Studio, distributed with SQL Server 2012, providing
a rich database development tool set.
D. SSDT is a new service in SQL Server 2012 that can be used to perform SQL Server
maintenance tasks, such as data movements and similar data management processes.
2. Which of the following statements about simple and complex data movements are
true? (Choose all that apply.)
A. Simple data movements only have a single data source and a single data
destination.
B. Complex data movements require data to be transformed before it can be stored
at the destination.
C. In simple data movements, data transformations are limited to data type
conversion.
D. In complex data movements, additional programmatic logic is required to merge
source data with destination data.
3. Which of the following statements are true? (Choose all that apply.)
A. An SSIS package can contain one or more SSDT solutions, each performing a
specific data management operation.
B. An SSIS project can contain one or more SSIS packages.
C. An SSIS project can contain exactly one SSIS package.
D. SSIS packages contain programmatic logic used in data movements and data
transformation operations.
Lesson Summary
Lesson 3: Introducing Control Flow, Data Flow, and
Connection Managers . . . . 109
■■ Existing SSIS packages can be added to SSIS projects in SQL Server Data Tools (SSDT).
■■ Control flows contain the definitions of data management operations.
■■ Control flows determine the order and the conditions of execution.
■■ SSIS package settings can be parameterized, which allows them to be changed without
direct access to SSIS package definitions.
Lesson Review
Answer the following questions to test your knowledge of the information in this lesson. You
can find the answers to these questions and explanations of why each answer choice is correct
or incorrect in the “Answers” section at the end of this chapter.
1. The Execute SQL Task allows you to execute SQL statements and commands against
the data store. What tools do you have at your disposal when developing SSIS packages
to develop and test a SQL command? Choose all that apply.
A. SQL Server Management Studio (SSMS)
B. SQL Server Data Tools (SSDT)
C. The Execute SQL Task Editor
D. SQL Server Enterprise Manager (SSEM)
2. You need to execute two data flow operations in parallel after an Execute SQL Task has
been completed. How can you achieve that? (Choose all that apply.)
A. There is no way for two data flow operations to be executed in parallel in the same
SSIS package.
B. You can place both data flows inside the same data flow task and create a precedence
constraint leading from the preceding Execute SQL Task to the data flow task.
C. You can create two separate data flow tasks and create two precedence constraints
leading from the preceding Execute SQL Task to each of the two data flow tasks.
D. You can create two separate data flow tasks, place them inside a third data flow
task, and create a precedence constraint leading from the preceding Execute SQL
Task to the third data flow task.
3. Which precedence constraint can you use to allow Task B to execute after Task A even
if Task A has failed?
A. The failure precedence constraint, leading from Task A to Task B.
B. The success precedence constraint, leading from Task A to Task B.
C. The completion precedence constraint, leading from Task A to Task B.
D. Use two precedence constraints—a success precedence constraint, and a failure
precedence constraint, both leading from Task A to Task B.

Suggested Practices CHAPTER 3 125
Case Scenarios
In the following case scenarios, you apply what you’ve learned about creating SSIS packages.
You can find the answers to these questions in the “Answers” section at the end of this
chapter.
Case Scenario 1: Copying Production Data to Development
Your IT solution has been deployed to production, version one is complete, and it is now time
to start the work on the next version. To keep the data in the development and testing environment
as up to date as possible, your manager has asked you to design a data movement
solution to be used on a regular basis to copy a subset of production data from the production
data store into the development data store.
1. What method would you use to transfer the data on demand?
2. How would you maximize the reusability of the solution?
Case Scenario 2: Connection Manager Parameterization
Data warehousing maintenance solutions have outgrown your company’s existing infrastructure,
so new servers had to be installed, and this time all data warehousing applications will
use a dedicated network. In phase 1, all of your SSIS solutions must be redeployed to new
servers, and the system administrator has decided that SSIS projects deserve more network
bandwidth, so in phase 2 you will be allowed to dedicate a greater share of the network
bandwidth to your data movement processes.
1. How much additional development work will you have to do to complete phase 1?
2. What will you have to do to reconfigure all of the connection managers to use larger
network packets for phase 2?
Suggested Practices
To help you successfully master the exam objectives presented in this chapter, complete the
following tasks.
Use the Right Tool
If you can combine the benefits of early deployment with the benefits of providing ultimate
stability for the entire data warehouse solution, you might be able to achieve “the best of
both worlds.” For instance, early deployment might allow other members of your team to
begin their work sooner (for example, if you deploy an initial version of the data warehouse
before the model has been finalized, this might allow report development tasks to start early).

126 CHAPTER 3 Creating SSIS Packages
And stabilizing the entire data warehouse solution has obvious benefits (for example, if you
implement changes to the data warehouse model in stages, this might allow the iterations in
report development to be as reasonable as possible).
■■ Practice 1 Develop an initial data movement by using the Import and Export Wizard,
using views in the source data store to emulate data transformations.
■■ Practice 2 Modify the initial data movement—add proper data transformation logic
as well as appropriate logic to merge new or modified data with existing data.
Account for the Differences Between Development and
Production Environments
After a final version of a data warehousing solution has been deployed to production, any
additional work on the current version, even if these development activities could in fact be
reduced to “tweaking,” will eventually cause delays in the development of the next version.
With good parameterization, the burden of “tweaking” an existing solution is lifted from the
shoulders of the developer and is placed on the shoulders of the administrator.
■■ Practice 1 Review your existing data movement solutions, and create a list of settings
that could be beneficial to their maintenance in production.
■■ Practice 2 Create a list of elements in your data movement solution that could be
parameterized, but due to their possible effect on the operational characteristics of the
solution probably should not be parameterized.

Answers CHAPTER 3 127
Answers
This section contains answers to the lesson review questions and solutions to the case scenarios
in this chapter.
Lesson 1
1. Correct Answers: B and D
A. Incorrect: Even though this might seem like the quickest solution, it might only be
quick to develop. Copying a large amount of data from the production environment
to a testing environment should be avoided, especially if most of the data is
just going to be discarded from the destination database afterward.
B. Correct: It might appear cumbersome to design several SSIS packages for a single
data movement operation, but this approach will solve the principal problem while
also following good data management practices, such as avoiding unnecessary
data movements.
C. Incorrect: The Edit SQL option in the Column Mappings window of the Import and
Export Wizard cannot be used to modify the data retrieval query, only the destination
table definition.
D. Correct: An SSIS package created by the Import and Export Wizard can be edited
by using SSDT.
2. Correct Answers: B and D
A. Incorrect: Dropping and re-creating tables cannot be used to merge data.
B. Correct: You can use SSDT to add data merging capabilities to an SSIS package
created by the Import and Export Wizard.
C. Incorrect: No such option exists in the Import and Export Wizard.
D. Correct: You can use SSDT to design pretty much any kind of data movement
processes, especially when you want complete control over the operations needed
by the process, but keep in mind that designing SSIS packages “from scratch” may
not be as time efficient as possible.
3. Correct Answer: C
A. Incorrect: SSIS package files are not stored in binary format.
B. Incorrect: SSIS package files might appear as if they are saved as plain text files,
but they are actually well-formed XML files.
C. Correct: SSIS package files are stored in XML format; the DTSX file extension is
used for distinction.
D. Incorrect: SSIS packages are not Microsoft Word documents.

128 CHAPTER 3 Creating SSIS Packages
Lesson 2
1. Correct Answer: C
A. Incorrect: SSDT is not an extension of SSMS. It is a stand-alone application.
B. Incorrect: SSDT is not a special edition of SSMS. It is a special edition of Visual
Studio.
C. Correct: SSDT is a special edition of Visual Studio, with a complete database development
tool set.
D. Incorrect: SSDT is not a service.
2. Correct Answers: B and D
A. Incorrect: Simple data movements can have as many data sources and as many
data destinations as needed.
B. Correct: Data transformations are present in complex data movements.
C. Incorrect: Typically, in simple data movements, no transformations are needed,
because the data is transferred unchanged. However, it is possible to transform the
data at the source—such as by making retrieval queries or by using views or other
similar techniques.
D. Correct: Additional programmatic logic to merge source data with destination
data is present in complex data movements.
3. Correct Answers: B and D
A. Incorrect: SSIS packages cannot contain SSDT solutions.
B. Correct: An SSIS project can contain as many SSIS packages as needed.
C. Incorrect: An SSIS project can contain more than a single SSIS package.
D. Correct: SSIS packages contain the programmatic logic used in data management
operations, such as data movements and data transformations.
Lesson 3
1. Correct Answers: A and B
A. Correct: SSMS provides all the necessary functionalities to develop and test
SQL code.
B. Correct: SSDT does provide a query designer; it is available from the Data menu,
under Transact-SQL Editor/New Query Connection. Alternatively, the query editor
can also be started from the SQL Server Object Explorer by right-clicking a database
node, and selecting New Query... from the shortcut menu.
C. Incorrect: The Execute SQL Task Editor is just a text box into which you can type or
paste a SQL statement.
D. Incorrect: The Enterprise Manager was replaced with SSMS in SQL Server 2005.

Answers CHAPTER 3 129
2. Correct Answers: B and C
A. Incorrect: Parallel data flow execution is supported.
B. Correct: You can place multiple data flow operations inside the same data flow task.
C. Correct: You can, of course, place data flows in separate data flow tasks, and you
can create multiple precedence constraints leading from or to the same task, as
long as any two tasks are connected to each other only once.
D. Incorrect: You cannot place a data flow task inside a data flow task, because it
cannot contain tasks, only data flow components.
3. Correct Answer: C
A. Incorrect: The failure precedence constraint will allow Task B to execute only if
Task A has failed.
B. Incorrect: The success precedence constraint will prevent Task B from executing
if Task A fails.
C. Correct: The completion precedence constraint will allow Task B to execute
regardless of whether Task A has succeeded or has failed.
D. Incorrect: Only a single precedence constraint can be used to connect two
distinct tasks.
Case Scenario 1
1. An SSIS package stored in the file system, in the database, or in an unscheduled SQL
Server Agent Job would be appropriate.
2. At the very least, the SSIS package would have to be parameterized so that it can be
configured appropriately for the specific environment in which it is going to be used.
Additionally, the programmatic logic should account for merging new or modified
data with existing data.
Case Scenario 2
1. A properly parameterized SSIS package can be redeployed and reconfigured as many
times as needed, without the need for any additional development activities.
2. The network packet size property of OLE DB connections is not exposed to parameterization;
therefore, the entire connection string would have to be parameterized
Designing and Implementing Control Flow
Lesson Summary
Lesson 1: Connection Managers
■■ Connection managers are used to establish connections to data sources.
■■ Different data sources require different types of connection managers.
■■ The usability of a connection manager within an SSIS project or an SSIS package is
determined by its scope.
Lesson Review
Answer the following questions to test your knowledge of the information in this lesson. You
can find the answers to these questions and explanations of why each answer choice is correct
or incorrect in the “Answers” section at the end of this chapter.
1. You need to extract data from delimited text files. What connection manager type
would you choose?
A. A Flat File connection manager
B. An OLE DB connection manager
C. An ADO.NET connection manager
D. A File connection manager
2. Some of the data your company processes is sent in from partners via email.
How would you configure an SMTP connection manager to extract files from email
messages?
A. In the SMTP connection manager, configure the OperationMode setting to Send
And Receive.
B. It is not possible to use the SMTP connection manager in this way, because it can
only be used by SSIS to send email messages.
C. The SMTP connection manager supports sending and receiving email messages by
default, so no additional configuration is necessary.
D. It is not possible to use the SMTP connection manager for this; use the IMAP (Internet
Message Access Protocol) connection manager instead.
3. You need to extract data from a table in a SQL Server 2012 database. What connection
manager types can you use? (Choose all that apply.)
A. An ODBC connection manager
B. An OLE DB connection manager
C. A File connection manager
D. An ADO.NET connection manager
Lesson Summary
Lesson 2: Control Flow Tasks and Containers
■■ A rich collection of tasks supporting the most common data management operations
is provided by the SSIS design model.
■■ Control flow is defined by precedence constraints that determine the order and conditions
of execution.
■■ Tasks representing logical units of work can be grouped in containers.
■■ Loop containers allow a unit of work to be executed repeatedly.
Lesson Review
Answer the following questions to test your knowledge of the information in this lesson. You
can find the answers to these questions and explanations of why each answer choice is correct
or incorrect in the “Answers” section at the end of this chapter.
1. In your SSIS solution, you need to load a large set of rows into the database as quickly
as possible. The rows are stored in a delimited text file, and only one source column
needs its data type converted from String (used by the source column) to Decimal
(used by the destination column). What control flow task would be most suitable for
this operation?
A. The File System task would be perfect in this case, because it can read data from
files and can be configured to handle data type conversions.

164 Chapter 4 Designing and Implementing Control Flow
B. The Bulk Insert task would be the most appropriate, because it is the quickest and
can handle data type conversions.
C. The data flow task would have to be used, because the data needs to be transformed
before it can be loaded into the table.
D. No single control flow task can be used for this operation, because the data needs
to be extracted from the source file, transformed, and then loaded into the destination
table. At least three different tasks would have to be used—the Bulk Insert
task to load the data into a staging database, a Data Conversion task to convert
the data appropriately, and finally, an Execute SQL task to merge the transformed
data with existing destination data.
2. A part of your data consolidation process involves extracting data from Excel workbooks.
Occasionally, the data contains errors that cannot be corrected automatically.
How can you handle this problem by using SSIS?
A. Redirect the failed data flow task to an External Process task, open the problematic
Excel file in Excel, and prompt the user to correct the file before continuing the
data consolidation process.
B. Redirect the failed data flow task to a File System task that moves the erroneous
file to a dedicated location where an information worker can correct it later.
C. If the error cannot be corrected automatically, there is no way for SSIS to continue
with the automated data consolidation process.
D. None of the answers above are correct. Due to Excel’s strict data validation rules,
an Excel file cannot ever contain erroneous data.
3. In your ETL process, a few values need to be retrieved from a database at run time,
based on another value available at run time, and they cannot be retrieved as part of
any data flow task. Which task can you use in this case?
A. The Execute T-SQL Statement task
B. The Execute SQL task
C. The Expression task
D. The Execute Process task
Lesson Summary
Lesson 3: Precedence Constraints
■■ Precedence constraints determine the order of execution and the conditions that must
be met for the process to either continue or stop.
■■ Precedence constraints can even be used to allow the process to recover from failures.
Lesson Review
Answer the following questions to test your knowledge of the information in this lesson. You
can find the answers to these questions and explanations of why each answer choice is correct
or incorrect in the “Answers” section at the end of this chapter.
1. How is the order of execution, or the sequence of operations, defined in an SSIS
package?
A. The SSIS run time engine determines the order of execution automatically, based
on the type of operations, the available software and hardware resources, and the
size of data.
B. The sequence is defined by using precedence constraints.
C. The sequence is defined by using the Sequence container.
D. The sequence is defined at design time by using precedence constraints and
Sequence containers, but at run time the SSIS engine executes the operations in
the order set by the most appropriate execution plan for maximum performance.

170 Chapter 4 Designing and Implementing Control Flow
2. How does the Failure constraint affect the execution order?
A. The following task will only execute after the preceding task has failed.
B. The following task will only execute if the preceding task has not failed.
C. The following task will never execute, because this constraint is only used at
design time.
D. The following task will execute regardless of whether the preceding task has failed,
but an error will be written to the SSIS log.
3. In your ETL process, there are three external processes that need to be executed in
sequence, but you do not want to stop execution if any of them fails. Can this be
achieved by using precedence constraints? If so, which precedence constraints can
be used?
A. No, this cannot be achieved just by using precedence constraints.
B. Yes, this can be achieved by using completion precedence constraints between the
first and the second and between the second and the third Execute Process tasks,
and by using a success precedence constraint between the third Execute Process
task and the following task.
C. Yes, this can be achieved by using completion precedence constraints between
the first and the second, between the second and the third, and between the third
Execute Process task and the following task.
D. Yes, this can be achieved by using failure precedence constraints between the first
and the second, and between the second and the third Execute Process tasks, and
by using a completion precedence constraint between the third Execute Process
task and the following task.
Case Scenarios
In the following case scenarios, you apply what you’ve learned about designing and implementing
control flow. You can find the answers to these questions in the “Answers” section at
the end of this chapter.
Case Scenario 1: Creating a Cleanup Process
In your data management system, there are two data warehouses (that is, in addition to
the operational data store); the principal data warehouse contains all the data, including all
historical data, and the secondary data warehouse is used by web applications, exposing your
data to customers, and should therefore only contain current data. Your data warehousing
solution is already in place, moving data into both data warehouses.

Suggested Practices Chapter 4 171
You have been tasked with creating an additional process to determine which data is no
longer current and must therefore be removed from the secondary data warehouse.
1. How will you determine which rows need to be removed?
2. What technique would you use to perform the removal as efficiently as possible?
Case Scenario 2: Integrating External Processes
In its data management scenarios, your company uses a mix of proprietary solutions that your
team has developed and third party, off-the-shelf solutions that to you, a seasoned developer,
appear just like black boxes—you can trust these solutions to work as expected without
even a faint clue about how this is actually done.
In your data warehousing solution, you need to consolidate data from a solution of your
own and from two diverse black boxes, one of which has a special data extraction tool (a
stand-alone application) that retrieves data from the internal store and saves it to files in the
file system, while the other one exposes its API layer, providing access to its internal data
retrieval functionalities.
1. What functionalities does SSIS provide that can be used to integrate such diverse solutions
into a single SSIS process?
2. How would you use the SQL Server platform to solve this problem?
Suggested Practices
To help you successfully master the exam objectives presented in this chapter, complete the
following tasks.
A Complete Data Movement Solution
Think about all the processes involved in preparing data to be consumed by the analytical
tools used by your managers when determining company strategy. The data must be consolidated,
cleansed, loaded into the data warehouse (which includes transforming it appropriately
and merging it with existing data), and finally, it must be loaded into the multidimensional
SQL Server Analysis Database.
All of these processes need to be performed automatically, at predefined times, and provide
current and coherent results in the end. Otherwise, the operations and—eventually—

even the future of the company might be at stake.

No comments:

Post a Comment