Wednesday, September 10, 2014

Data flow reads data from source(s)

Data is pushed in a row-based pipeline

It optionally passes through the one or more preprogrammed or ad-hoc transformations

·         Streaming transformation improve scalablity

Destination(s) writes data to disk,db…

Control flow dictates in which order tasks execute,data flow is one of these tasks

(synchronous transformation
Partially Blocking Transformation (asynchronous transformation
Blocking Transformation (asynchronous transformation):
Row-by-row basis
Introduces new buffers in memory layout
Must see all data before passing on rows
Do not block data flow in the pipeline
Transformed data is copied into new buffers
Block the data flow – can be heavy on memory
Data is not copied around, only pointers
May also use “private buffers” to assist with transforming data
Examples: Data Conversion, Derived Columns, Copy column, Multicast, Row count, Lookup, Data Flow with Percentage Sampling
Examples: Merge, Merge Join, Union All etc.
Examples: Sort, Aggregate,Data Flow with Row Sampling
So why is the Percentage Sampling not blocking? It just takes the same percentage of each buffer and can therefore be synchronous.



No comments:

Post a Comment