Sunday, 1 November 2015

Sequential File stage

Sequential File stage


     It is one of the file stages which it can be used to reading the data from file or writing the data to file. It can support single input link or single output link and as well as reject link.

                                         



  When I was go for properties of sequential file stage


Partitioning Techniques

div dir="ltr" style="text-align: left;" trbidi="on">

Partitioning Techniques


In this data partitioning method the data splits into various partitions distribute across the processors.
     The data partitioning techniques are
a)    Auto
b)    Hash
c)     Modulus
d)    Random
e)    Range
f)     Round Robin
g)    Same

The default partition technique is Auto.

Round Robin:-  the first record goes to first processing node, second record goes to the second processing node and so on….. This method is useful for creating equal size of partition.
Hash:- The records with the same values for the hash-key field given to the same processing node.
Modulus:- This partition is based on key column module. This partition is similar to hash partition.
Random:- The records are randomly distributed across all processing nodes.
Range:- The related records are distributed across the one node . The range is specified based on key column.

Auto:- This is most common method. The data stage determines the best partition method to use depending upon the type of stage.

Configuration File

It is normal text file. it is having the information about the processing and storage resources that are available for usage during parallel job execution.
    
The default configuration file is having like

a)Node:  it is logical processing unit which performs all ETL operations.
b)Pools: it is a collections of nodes.
c)Fast Name: it is server name. by using this name it was executed our ETL jobs.
d)Resource disk: it is permanent memory area which stores all Repository components.
e)
Resource Scratch disk:it is temporary memory area where the staging operation will be performed.

difference between server jobs and parallel jobs

difference between server jobs and parallel jobs

Server jobs:- 
a) In server jobs it handles less volume of data with more performance. 
b) It is having less number of components. 
c) Data processing will be slow. 
d) It’s purely work on SMP (Symmetric Multi Processing). 
e) It is highly impact usage of transformer stage. 

Parallel jobs:- 
a) It handles high volume of data. 
b) It’s work on parallel processing concepts. 
c) It applies parallism techniques. 
d) It follows MPP (Massively parallel Processing). 
e) It is having more number of components compared to server jobs. 
f) It’s work on orchestrate framework

Tuesday, 22 September 2015

DataStage components are


1.DataStage Designer.
2.DataStage Administrator.
3.Datastage Director.

1.Datastage Designer:
=>It is used to design jobs.
=>All DataStage Activities are done in this Job.
=>It is used for the import and export the projects to view and edit the contents of the Repository.
=>For a DataStage Designer, He should know this part very well.


2.DataSatge Administartor:
=>It is used to create the project.
=>Delete the projects.
=>setting the environment variables and also add user environment variables.
=>This is handle by DataStage Administrator.

3.DataSatge Director:
=>It is used to run a Job.
=>Scheduling the Job.
=>This is handled by DataStage Developer/Operator.

What can DataStage Do



=>Design jobs for Extraction, Transformation and Loading (ETL).

=>Ideal tools for data integration projects such as data warehouse, data marts,and system migrations

=>Import, Export, Create and Managed Metadata for use with in job.

=>Schedule, run, and Monitor jobs all with in dataStage.


About DataStage


=>Its is a GUI tool.

=>It was introduced in the year 1997 with the name Dataintegrator and the company introduced this product is Vmark in UK.

=>Vmark introduce the products as an normal ETL Tool with great aspects, They have changed the product name as DataStage and the changed there company name too as Ascential. They named as product as Ascential DataStage.


=>That later on they developed the product by Integrating with orchestrate tools and MKS tool kit.In the year 2005 IBM taken the product dataStage changed name IBM DataSatge and fix many Bug's.

=>IBM is the brand name as well known many people in the world. They will marked in the product is huge,  Informatica ETl tool is the competitor of DataStage.

=>Now we call as IBM Infospher DataStage.

=>Initially it started from 5.0 Version Later some new versions are launched in to the market, Now  IBM has released latest version 11.5.