Surrogate Key Generation In Datastage Parallel Job

Surrogate Key Generation In Datastage Parallel Job Opportunities
Surrogate Key Generation In Datastage Parallel Jobs
Surrogate Key Generation In Datastage Parallel Job Application
Surrogate Key Generation In Datastage Parallel Job Interview
Surrogate Key Generation In Datastage Parallel Job Description

Dec 22, 2017 This is a training video on how to create some dummy data in Datastage using the Row generator stage. For more trainings - Email - [email protected].
Apr 04, 2012 server job runs on on node whereas parallel job runs on more than one node. Even the server jobs run on UNIX most of the major installation are on UNIX platfoam and comming to the differences. There is a major difference in job architecture. Server jobs process in sequence one stage after other. While Parallel job process in parallel.
Nov 20, 2013 The Row Generator stage is a Development/Debug stage. It has no input links, and a single output link. The Row Generator stage produces a set of mock data fitting the specified meta data.
Jan 16, 2012 The Surrogate Key Generator stage is a processing stage that generates surrogate key columns and maintains the key source. A surrogate key is a unique primary key that is not derived from the data that it represents, therefore changes to the data will not change the primary key. In a star schema database, surrogate keys are used to join a fact.

Deleting the key source To delete the surrogate key source, design a job that uses a Surrogate Key Generator stage by itself, with no input or output links. Updating the state file To update the state file, add a Surrogate Key Generator stage to a job with a single input link from another stage. May 07, 2013 A parallel job has a surrogate key stage that creates unique IDs, however it is limited in that it does not support conditional code and it may be more efficient to add a counter to an existing transformer rather than add a new stage.

In this post, We will see how to generate surrogate key for data, where we have to use surrogate key stage.
A) Design :
Below design is a demo design of job. Here our data source is a row generator which is generating rows. In real time scenario, Source can be a flat file, DB stages, Passive Stage or can be a Active stage also.
In Row Generator Stage, we are generating a col 'Name'.

B) Surrogate Key Stage Properties :
In Surrogate Stage, fill all the properties of Surrogate Key Stage like below...
Generated Output Column Name = Skey
Source Name = <the path of Surrogate State File (which we generated) >
Source Type = Flat File
File Block Size = User specified

Surrogate Key Generation In Datastage Parallel Job Opportunities

User Specified Block Size = 1 ( by this we can control the gaps in surrogate keys )

C) Output Column Mapping :

D) OutPut File :
Below I have attached output file which have surrogate keys generated with data.

More post in Series :

I hope, Now you got how to generate the Surrogate Keys for the data without having headache of management of keys. But there is still a scenario what if our state file got corrupted or deleted ? We will discuss this in next post.....Keep Reading.....
Like the Facebook Page & join Group
https://www.facebook.com/DataStage4you
https://www.facebook.com/groups/DataStage4you

Surrogate Key Generation In Datastage Parallel Job

https://twitter.com/datastage4you

When using the Surrogate Key Generator Stage with Database Sequence, please note that before using the sequence values:

1. The 'Sequence' needs to be created in the database in order to use it. Sequence creation script:

CREATE SEQUENCE Sequence_Name INCREMENT BY 1 START WITH 1 NOMAXVALUE CACHE 10;

Surrogate Key Generation In Datastage Parallel Jobs

2. For a Database sequence, note here to use appropriate action with respect to the following:

Surrogate Key Generation In Datastage Parallel Job Application

CYCLE:

Specify CYCLE to indicate that the sequence continues to generate values after reaching either its maximum or minimum value. After an ascending sequence reaches its maximum value, it generates its minimum value. After a descending sequence reaches its minimum, it generates its maximum value.

Specify NOCYCLE to indicate that the sequence cannot generate more values after reaching its maximum or minimum value. This is the default.

3. Test the Sequence on the database server side with script:

Surrogate Key Generation In Datastage Parallel Job Interview

4. Create sequence environmental variable on DataStage server side, to make the stage/job reusable

5. Use the following in the Surrogate Key Generator Stage:

Source Name =#db_server#.#db_name#.#schema#.<sequence_name>

Surrogate Key Generation In Datastage Parallel Job Description

The source name here should not be the<table_name> but the oracle sequence name, which we had created in the above steps.

6. Alternatively, In the Target DB Stage you can when simplyinserting into your target table, use <sequence name>.nextvalin your insert statement.

Thanks!