Asynchronous
transformations, as discussed in last question, can be further divided in
two categories depending on their blocking behavior:
Partially Blocking Transformations do not
block the output until a full read of the inputs occur. However, they
require new buffers/memory to be allocated to store the newly created
result-set because the output from these kind of transformations differs
from the input set.
For example, Merge Join transformation joins two
sorted inputs and produces a merged output. In this case if you notice,
the data flow pipeline engine creates two input sets of memory, but the
merged output from the transformation requires another set of output
buffers as structure of the output rows which are different from the
input rows. It means the memory requirement for this type of
transformations is higher than synchronous transformations where the
transformation is completed in place.
Full Blocking Transformations, apart from
requiring an additional set of output buffers, also blocks the output
completely unless the whole input set is read.
For example, the Sort
Transformation requires all input rows to be available before it can
start sorting and pass down the rows to the output path. These kind of
transformations are most expensive and should be used only as needed. For
example, if you can get sorted data from the source system, use that
logic instead of using a Sort transformation to sort the data in
transit/memory
No comments :
Post a Comment