Thursday, October 16, 2014


SQL Server triggers execution precedence View(s): 7621 I had created two update triggers for two columns in a table. When i use update statement on this table on which order or precedence trigger will fires? Answer 1) -------------------------------------------------------------------------------- There is no guarantee as to what order they'll fire in. But still you can use sp_settriggerorder to define the first and last trigger to execute. For the rest the order is undefined. If precedence is matters to you then put all the logic in one trigger.

Replace Function: The reason is in SQL 2005 REPLACE trims trailing spaces whereas in SQL 2008 it preserves the space characters.

Full cache mode(Lookup) Partial & No cache(Lookup) comparison happens inside SSIS engine itself (Case sensitive) comparison takes place in T-SQL engine and so it will be case insensitive by default (Case In-sensitive) If same case in-sensitive functionality use derived column then lookup

Checkpoint data is not saved for ForLoop and ForEach Loop containers. Whenever package restarts the For Loop and ForEach Loop containers and the child containers are
run again. So as conclusion for loop a container if a child container in the loop runs successfully, it is not recorded in the checkpoint file, instead it is rerun.
There is work around for this as listed on Microsoft website: put loop task inside the sequence container task. Now one big thing which you have to remember that checkpoint applies for control flow task only it means if there is failure in data-flow task it will rerun the data-flow task again. For example there is data-flow task where out of 50000 records 20000 transferred in to destination. At this point a general error occurs and fails the data-flow task. Now what will checkpoint do will the data-flow continue for rest records or it will start from new when package restart. Answer simply entire data-flow task will restart from beginning.

Q: The webcast mentions SSIS restartability, but not checkpointing. Why?

There are several reasons for this:
  • Checkpointing is not that straightforward to implement.
  • Checkpointing works well for only a limited number of scenarios. Looping or parallelism are difficult to handle.
  • Checkpoints are not that flexible.
  • Event handlers are problematic.
  • Checkpoints do not store variables of type Object.
SQL Server MVP Jamie Thomson explains his objections in the blog post Why I don't use SSIS checkpoint files (it's an old post, some links are broken). The discussion is also interesting to read.
Personally I prefer to create small modular packages. Each package only does a basic task that shouldn't take too long. For example, load a dimension. I won't create one single huge package that will load all dimensions of a data warehouse. The most time is usually spent in the data flow, which isn't covered by a checkpoint anyway. If I would have an Execute SQL Task at the end of the package taking an hour to run, I will most likely put it in a separate package so it can easily be restarted.
If you have a valid use case for using checkpoint files, you can check out the tip Integration Services Checkpoints to restart package from failure.

Q. Would you recommend using “Check Points” in SSIS?
As per my experience I could say “NO” as there are compatibility issues with various options hence using checkpoints may give unpredictable results. Checkpoints doesn’t work properly when a SSIS package contains
  • Complex logic
  • Iterations/Loops
  • Transactions Enabled
  • “Object” type variables
  • Parallel execution
Checkpoints works fine when the package is having straightforward control flow with a single thread.

Q: What are the most efficient ways to run a large aggregate SQL Server Integration Services Job? I split my data flows to do aggregation, then handle the results separately.

Not exactly a question, but rather a statement on my best practice that aggregations (and sorting) should be done in the SQL Server database, not in the SSIS data flow. The reason for this is that SSIS uses blocking components to do these types of transformations which means they need to read all of the rows into memory before they can output even one single row.
Separating the flow into separate streams has certain merit, since you're doing a "divide and conquer" strategy. High-level this corresponds with the same map-reduce method of big data systems. However, this approach has two options: you either execute the different data flows in parallel or you execute them one after another. The first option still has the issue that everything has to be loaded into memory at once. The second option has a performance impact since a data flow always has to wait for the previous data flow to finish. Both options still need sorted data. This might be resolved by reading from a table with an index supporting the sort order, since then the database engine can just read the sorted data. You still have to mark the input as sorted in the advanced editor of the source component in the data flow. The tip Merge multiple data sources with SQL Server Integration Services shows how you can do this.

My opinion is that in most cases, the database engine is best suited to handle large set based operations such as aggregating and sorting data. The data flow is capable of handling small data sets which fit in memory without issues. The SSIS package however becomes less scalable for larger data loads. SQL Server can scale reasonable well. If the aggregation is too much even for SQL Server to handle, alternative options such as the map-reduce method can be considered.

The both transformations accept one input and give us multiple outputs
Directs each row of the source to every o/p
Directs each row to a single o/p

Both transformations can add new columns.
Dervied column
Copy column
Can add new columns without any help from existing columns.
can add new columns only through existing columns 
can assign different transformations and data types to the new columns
No options available
supports error output 
whereas Copy column  can’t support error output

Project parameters
Package parameters
 A project parameter can be shared among all of the packages in an SSIS project.
A package parameter is exactly the same as a project parameter – except that the scope of a package parameter is the individual package it resides in
 You want to use a project parameter when, at run-time, the value is the same for all packages
 You want to use a package parameter when, at run-time, the value is different for each package. 
 project parameters do *not* have an expressions property to define their value

Ex:For instance if you wanted to have a single parameter containing a server name to be used by multiple package

Note: Parameters can also pass values to all kinds of objects in SSIS – basically any property that allows an expression.
2.You can think of parameters are the replacement for package configurations which were used in earlier versions of SSIS.
3.You can use a parameter anywhere in an expression by referring it using the following syntax:
The evaluation order is similar to using a variable in an expression and then executing a package. Hence, the value gets evaluated and assigned to the property at Validation phase.

Within a package, SSIS variables have an expression property. 

A variable can be scoped to the package or an object within the package (there’s no such thing as a project variable though). 

Variables often consume values from parameters within their expressions (an example of this is in the next screen shot).  Variables can also communicate values to objects like Execute SQL tasks, For Each containers, Send Mail tasks, and so on.

control flow defines the tasks used in performing data management operations;
it determines the order in which these tasks are executed and the conditions of their execution.

Dataflow:is a special contral flow task  used specifically in data movement operations and data transformations.

Lineage columns:used for auditing and never exposed to end users

Mywords:implement Staging table and other objects in a sepearte dbase.

Usage Staging table:temporarily store source data before cleansing it or merging it with data from other sources.

staging table-->serve as intermediate layer (b/w DW and source tables) and then starts reqular ETL process.

DSA:The part of a DW containing staging tables is called the data staging area(DSA).

Implementing Fact Tables:It is not necessary that all foreign keys together uniquely identify each row of a fact table.

inferred member:A row in a demension added during fact table load is called an inferred member

You can add lineage information to your dimensions and fact tables to audit changes to your DW on a row level(summary)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      

Tasks representing logical units of work can be grouped in containers.

SSIS does not support receiving email messages.

Design time validation can be disabled for every ssis object using delayvalidition property =>True

 A failure constraint will allow the following task to begin if the preceding
task has encountered errors and failed.

The External Process tasks will be executed in sequence, and the process  will continue, even if one or more of them fail.

Use a Raw File destination if you have to temporarily store data to be used by SSIS

■■ Use the Resolve References dialog box to solve mapping errors.

 Which data flow transformation would you use if had to combine data from two different
database tables that exist on two different servers? 

Variables and expressions can also be used to extend the elementary functionality of
precedence constraints based on information available at run time that usually is not available at design time.

Lookup: more matching rows exist from the referenced dataset,only first one will be retreived
Mergejoin:more rows will retreived,because all matching data is retreving

The Merge transformation is similar to Union All, but with Merge, the sources have to be sorted and the sort position is preserved.

Is it possible to change the settings of an SSIS object at run time?

When are property expressions evaluated as a package is running?

Answer Unlike parameters that are read at the start of package execution, property expressions are updated when the property is accessed by the package during package execution.
A property expression can change the value of a property in the middle of package execution, so that the new value is read when the property is needed by the package.

The Recordset Destination is a handy component to use when you do not need to (nor want to) persist your data. One drawback of using the in-memory ADO dataset is that Integration Services does not provide any out-of-box methods to get the row count in the dataset. This is when Row Count transformation becomes invaluable to me.

setbased update:avoid the usage of delete or update oeration,this recommandation is mostly focused on large fact table.even we need few update or deletes to dim table's

using Oledbcommand(row by row) or staging +execute sqltask

DFT-->spaces->RC click->entry point package(ssis package design pattern is to make use of parent package to control the execution of several child packages)

SSIS 2012 Imporvements:

Flatfile source :flatfilesource parser:it can now support ragged-right delimited files and embedded qualifiers

Grouping:DFT,it is like sequence container in controlflow,except it exist only in designer,and also not affect the way dataflow actually runs.
we can also disable,if compilicate dataflow.

Parameter configurations are stored within a visualstudioprojectfile(.dtproj).To save any changes made in manage parameters values dialogbox,you must save the project can change variable values inside a package execution,but you cannot change parameter values
2.From a usage perspective,you usually use parameters to define the contract b/w a package and its caller,whereas you use variables primarily for exchanging
information insida a package.

parent package and child package require some info to perform its job;like name of the dimensiontable,it should be update.

package Parameter are explicit way to pass the info,you can easily see what info is required when you open the package.

it is also easy for ssis to validate whether the value passed parameters were assigned values at runtime

what to configuration:

2.contrlow flow--is executed based on conditional expression on it,this expression might depend on a variable value that need to provided before package execution.
3.sql statement(require to change) for execute sql task

SSIS Varaiables supports char,DBNull and object,ssis parameters do not.

indirect configuration using enviornment varaiables: problem is xml filepath,connectionstring,but moving packages from systems to systems problem,solution is
use the enviornment vairable to configure those parameters.

variables configuration:/set option to override the property value of the variable.

Myword:engine exposes while configuring,validating,executing a package.

integrate external processes:Execute Process task or a Script task with the appropriate business logic to control the execution of the external applications.
To improve reusability and simplify deployment, a Custom task could also be developed to replace the Script task.

Project deployment model:with introduction of project-scoped parameters and connection managers, it becomes possible for individual SSIS packages to share resources

Variables:Variables are not exposed outside of SSIS packages and cannot be accessed from other packages. This shortcoming is reduced with the introduction of
parameters, which can be used to pass scalar values to child packages. On the other hand, variables never were the only method available for exchanging data between

the elements of an SSIS solution. (Hint: consider files and staging tables.)

New today

The default value of MaxBufferSize can be changed, but it cannot be larger than 100 MB.

Breakpoints are a powerful design-time troubleshooting option.,You can set a breakpoint on any control task.

 Data viewers enable you to monitor data passing through the pipeline at design time.

Data taps are only available when you are executing a package deployed to the SSISDB catalog.



No comments:

Post a Comment