Friday, November 21, 2014

Prepared interview question on SSIS 2012

What do you need to set in order to use checkpoint properties at the package level, after you have turned the checkpoint properties on? 
Ans:You need to set the FailPackageOnFailure property to True for tasks to write to the checkpoint file. However, if you want to rerun any successful tasks that occur before the failure task, you need to use a sequence container around the group of related tasks that require transactions.
If you set the CheckpointUsage to Always, the checkpoint file must be present or the package will not start.
BEST PRACTICES Managing multiple tasks with transactions and restartability

To do this, you need to enable checkpoints in the package and then use a Sequence Container that holds all the associated tasks. The Sequence Container needs to have the transactions turned on by setting the TransactionOption to Required at the container level.

When a package has the restartability settings enabled and checkpoints set, it will manage the checkpoint file during its execution. The following processes happen behind the scenes for a package that has checkpoints set:
1.        When the package loads for execution, the checkpoint file is created.
2.        For every task or container that is configured for checkpoints, the checkpoint file is updated to include the latest runtime information such as variable values and the successful tasks.
3.        If the package fails, then the file remains in place so that the next time the package is executed, it will read the checkpoint file for execution. Alternatively, if you want the package to begin from the first task or container, the file can be manually deleted.
4.        If the package execution is successful, then the checkpoint file will be deleted automatically.
Note: not all systems support DTC, and a transaction cannot be forced on a non-compliant system.
How does SSIS Validation Works
SSIS Validation is the process of ensuring the package will execute successfully during runtime with its current property settings. There are two types of package validation available in SSIS.
• Design Time Validation (Early Validation) – Validates all of components and metadata of the package prior to executing the package
• Run Time Validation (Late validation) – Validates the individual components and metadata of the package prior to running the component
Designtime validation
Workoffline(Project level)(ssis2012)
when you open a package, takes place
When you execute a package, the SSIS design time and run time validation takes place. In essence, during run time the package is actually validated twice.

This will speed up package opening and eliminate and early validation while working in the BIDS designer.
On the down side, the Work Offline property prevents the following.
·         You cannot run or debug any packages while in Work Offline property = “TRUE”
·         Once you close and save the package, the Work Offline setting is set to default setting of “FALSE
·         This is a project level setting, not a package level setting
You can override the default Work Offline setting by hacking the dtsproj.user file in the project.

SSIS Validations Properties
let take a look at the properties and options to manage validation at a project, package and individual component level.
Two step validation process utilized by SSIS.
Package level validation
Individual component level
Delay Validation: The delay validation property is available at the package level and at the individual component level. The Delay Validation property setting is not available for individual data flow components.
Package level Delay validation
Individual component level delay validation
is only applicable at runtime, not during design time(true)
Ex:execute sql task, required data connection
True for executesql task then the data connection error is removed in SSDT designer,so it is applied at design time,if we set the delayvalidation=true.
I would not recommend setting early Delay Validation = “TRUE” at the package level under any circumstances.

I would only change Delay Validation setting at the component level under the following conditions
·         I have a large package which uses a data source multiple times and the connection is slow or is a remote connection.
·         I have a database object that is created during run time and is referenced in another downstream component. (i.e. creation of a temporary table)  I would turn off early validation at the downstream component level to allow the package to run and delay early validation
If I am dynamically setting properties of a data connection which is not available during the early validation property.
Validate External Metadata
The Validate External Metadata property is similar to the Delay validation property but is available only in individual Data Flow Components only. The default setting for the property = “False”. Changing the Validate External Metadata property to true eliminates early package validation but has no effect on run time validation in SSIS.
I would recommend change the Validate External Metadata from its default setting under the following conditions.
·         Getting data from a data flow source task using a flat file or excel  that does not exist at runtime
·         A source or destination tasks where the database components is not available at runtime
·         Work Offline (Connection Level)
·         SQL 2012 introduces the ability to Work Offline at the package level and connection level.
Package parameters allow you to modify package execution without having to edit and redeploy the package.
Package deployment model:
Step1: Deploy package (contains parameters using the package deployment model)
Step2: execute package
Package parameter
Project parameters
The parameters are not called during execution
If the package contains package parameters and expressions within the package use the parameters, the resulting values are applied at runtime.
If the package contains project parameters, the package execution may fail.

2. Typically, the principal difference between development and production environments is
in the configuration of data stores. In development, all data can reside on the same server
(even in the same database). In fact, because for development a subset of data is usually all
that is needed (or available) to the developer, all stored development data could easily be
placed on the developer’s personal computer. Therefore, you should account for the following
differences between the development and the production environments when developing
SSIS solutions:
■■ Connections:source &destinations
Data platforms:sql server versions diff on prod and dev
3. What is the principal difference between a success constraint and a completion constraint?
A success constraint will only allow the process to continue to the following task if the preceding task completed successfully, whereas a completion constraint will allow the process to continue as soon as the preceding task has completed, regardless of the outcome.
4. What is the difference between the Union All and the Merge transformation?
The Merge transformation is similar to Union All, but with Merge, the sources have to be sorted and the sort position is preserved.
5. What is the difference between the Lookup and Merge Join transformations?
The Lookup transformation does not need a sorted input; it is a non-blocking transformation, and in cases when more matching rows exist from the referenced dataset, only the first one will be retrieved. This means that the transformation will never retrieve more rows than exist in the input rows. With the Merge Join transformation more rows can be retrieved, because all matching data is retrieved.
6. There is an important difference between lineage and other columns:
lineage columns are never exposed to end users and are never shown on end users’ reports.
To summarize, a dimension may contain the following types of columns:
■■ Keys Used to identify entities
■■ Name columns Used for human names of entities
■■ Attributes Used for pivoting in analyses
■■ Member properties Used for labels in a report
■■ Lineage columns Used for auditing, and never exposed to end users

7. You should understand the differences between individual SSISDB logging levels in order to be able to choose the most appropriate one for a particular purpose. For instance, by turning logging off, you could slightly improve execution performance at the cost of losing all ability to detect execution errors. On the other hand, using verbose logging will provide a lot of information about the execution but will lead to less-than-optimal performance.

·         Quick Check

1. You add a sequence container to a package that contains several tasks, one of which calls a command on a legacy system and another of which a Data Flow Task imports data into SQL Server. Even with DTC started and transactions turned on, your sequence container fails before the tasks even run. What is the problem?

2. What do you need to set in order to use checkpoint properties at the package level, after you have turned the checkpoint properties on?

Quick Check Answers

1. The transactions featured in SSIS use the DTC service. However, not all systems support DTC, and a transaction cannot be forced on a non-compliant system, so the container will fail. You should remove the legacy task from the sequence container that has the transaction.

2. You need to set the FailPackageOnFailure property to True for tasks to write to the checkpoint file. However, if you want to rerun any successful tasks that occur before the failure task, you need to use a sequence container around the group of related tasks that require transactions.
·         Quick Check

1. A Data Conversion transformation is failing in the middle of the data flow execution, and you need to determine what is causing the error. How should you proceed?

Ans: . To determine what is causing the error, configure the Data Conversion transformation error path to Flat File so that any rows that are failing conversion are sent out to a file. Then, create a data viewer on the error path, and run the package in BIDS. This technique will capture the errors in a file and display the rows in the designer for troubleshooting.

2. Your package contains a string variable that you are updating, using a Script Task, to be a file path and file name. Your package is failing at a File System Task that is configured to use the variable to move the file to a different folder on the server. How do you troubleshoot the package? Ans: . Because the Script Task can contain embedded breakpoints in the code, set a breakpoint in the script so that you will be able to execute the package and step through the lines of code, observing the value of the variable to check the code and accuracy.
3. You would like to log all the package errors to a custom database table you have created for auditing purposes. How can you accomplish this task?
Ans: By using the OnError event handler assigned to the package level, you can also use an Execute SQL Task that calls a stored procedure, passing in the SourceName and ErrorDescription variable values. The procedure can then track these details into a metadata storage table for auditing.

No comments:

Post a Comment