6.Talend Data Integration Data Processing Operations-Part 2

In this section, tJoin ,tmap components will be covered.

The complete documentation of Talend can be found here.

Wait a moment! If you are a newbie to Talend, then I will strongly recommend you first to go through the other tutorial posts mentioned in order here.

Joining Data sources

The component tJoin can be used to join the data coming from two sources. We can perform Left join and Inner join.

Scenario: The employee table will be joined with the Department table using inner join to display the department name of the employees, Both the tables are having the records as mentioned below. Here Inner Join will be performed.

So,when we make the inner join as mentioned below

Now,We will do this using talend

STEP1: Drag and drop tJoin,tLogRow and another tLogRow , one to capture the output and another to capture the rejection. Also drag and drop tDBInput twice one for employee table and another for department table lookup as below , Join these components as mentioned below.

STEP2: Click on tJoin component properties, go to edit schema, and select the columns as below. Department Name and Department Id are selected from tDBInput_2, After setting a popup will appear that will ask to propagate these changes, click yes so that schema changes will happen to tLogRow as well.

Ensure the Key definitions ,columns from the output and type of Join are set as below. If we do not check Inner Join then left join will be performed.

STEP3:Set the component properties to Table in both tLogRow_1 and tLogRow_2 .

STEP4:Run the job and check the output as below.Compare this output with SQL output that ws shown above.

Transformation from Datasources and loading into multiple targets

In the above scenario, we have rejections since department 40 and 50 are not available in the lookup table and all these rejections have gone into one file.But if we want capture these rejections deparmtnet wise in a separate file , then we need to have multiple outputs which is not possible in tJoin , So we need to go for tMap. tMap is a powerful component that provides more functionality than tjoin.It can have multiple outputs and also multiple inputs and also supports data transformations like filtering and intermittent expressions.

Scenario : As mentioned above , we will look up employee table with department table and capture the rejections department wise into a seperate file.

STEP1: Drag and drop tMap and another 3 tFileOutputDelimited components , one to capture the output and others to capture the rejections for the rejected departments. Also drag and drop tDBInput twice one for employee table and another for department table lookup as below , Join these components as mentioned below.

STEP2: Click on tMap component properties, Initially it looks like below. Click once all the highlighted areas to open the properties of row2,Dept1,Dept2,Dept3.

Make the changes as below. Drag and drop the DEPARTMENT_ID column from row1 to row2 in the Expr Key row. Set the Join Model to Inner Join, Set the Catch Output Reject true in Dept1 since we need to catch the join output here. Set the Catch Lookup Inner Join Reject to true in Dept40 since we need to catch the reject output here.Set the expression editor to row1.DEPARTMENT_ID!=50 since we need to catch rows with department id as 40. In the same way, set for Dept50 as well as shown below..

STEP3:Set the properties all the files tFileOutputDelimited as below.

STEP4:Run the job and check the output as below.

This completes the tutorial for Data Processing components.

Leave a Comment