Sunday, 1 November 2015

Sequential File stage

Sequential File stage


     It is one of the file stages which it can be used to reading the data from file or writing the data to file. It can support single input link or single output link and as well as reject link.

                                         



  When I was go for properties of sequential file stage


Partitioning Techniques

div dir="ltr" style="text-align: left;" trbidi="on">

Partitioning Techniques


In this data partitioning method the data splits into various partitions distribute across the processors.
     The data partitioning techniques are
a)    Auto
b)    Hash
c)     Modulus
d)    Random
e)    Range
f)     Round Robin
g)    Same

The default partition technique is Auto.

Round Robin:-  the first record goes to first processing node, second record goes to the second processing node and so on….. This method is useful for creating equal size of partition.
Hash:- The records with the same values for the hash-key field given to the same processing node.
Modulus:- This partition is based on key column module. This partition is similar to hash partition.
Random:- The records are randomly distributed across all processing nodes.
Range:- The related records are distributed across the one node . The range is specified based on key column.

Auto:- This is most common method. The data stage determines the best partition method to use depending upon the type of stage.

Configuration File

It is normal text file. it is having the information about the processing and storage resources that are available for usage during parallel job execution.
    
The default configuration file is having like

a)Node:  it is logical processing unit which performs all ETL operations.
b)Pools: it is a collections of nodes.
c)Fast Name: it is server name. by using this name it was executed our ETL jobs.
d)Resource disk: it is permanent memory area which stores all Repository components.
e)
Resource Scratch disk:it is temporary memory area where the staging operation will be performed.

difference between server jobs and parallel jobs

difference between server jobs and parallel jobs

Server jobs:- 
a) In server jobs it handles less volume of data with more performance. 
b) It is having less number of components. 
c) Data processing will be slow. 
d) It’s purely work on SMP (Symmetric Multi Processing). 
e) It is highly impact usage of transformer stage. 

Parallel jobs:- 
a) It handles high volume of data. 
b) It’s work on parallel processing concepts. 
c) It applies parallism techniques. 
d) It follows MPP (Massively parallel Processing). 
e) It is having more number of components compared to server jobs. 
f) It’s work on orchestrate framework