A
transformation changes the data in the required format before loading it
to the destination or passing the data down the path. The transformation
can be categorized in Synchronous and Asynchronous transformation.
A
transformation is called synchronous when it processes each incoming row
(modify the data in required format in place only so that the layout of
the result-set remains same) and passes them down the hierarchy/path. It
means, output rows are synchronous with the input rows (1:1 relationship
between input and output rows) and hence it uses the same allocated buffer
set/memory and does not require additional memory. Please note, these
kinds of transformations have lower memory requirements as they work on a
row-by-row basis (and hence run quite faster) and do not block the data
flow in the pipeline. Some of the examples are : Lookup, Derived Columns,
Data Conversion, Copy column, Multicast, Row count transformations, etc.
A
transformation is called Asynchronous when it requires all incoming rows
to be stored locally in the memory before it can start producing output
rows.
For example, with an Aggregate Transformation, it requires all the
rows to be loaded and stored in memory before it can aggregate and produce
the output rows. This way you can see input rows are not in sync with
output rows and more memory is required to store the whole set of data (no
memory reuse) for both the data input and output. These kind of
transformations have higher memory requirements (and there are high
chances of buffer spooling to disk if insufficient memory is available)
and generally runs slower. The asynchronous transformations are also
called "blocking transformations" because of its nature of
blocking the output rows unless all input rows are read into memory.
No comments :
Post a Comment