Talend Data Integration Components and Connectors:

In this section, we are going to learn about the data integration components and connectors, which are used while creating a job.

The Connectors and components perform all the operations in Talend, and it provides 800+ connectors, and components to perform multiple actions.

The components are available in the palette panel, and there are 21 main categories, which belong to the components.

By doing drag and drop in the designer panel, we can choose the connectors, and it automatically creates the java code.

After that, save the Talend code and execute it.

We are showing a list of the components available in the palette panel in the below image,

Talend Data Integration Components and Connectors

The above list is widely used as the connectors and components for the Talend data integration.

Let us see some commonly used components for the data integration in Talend studio,

Components for Data IntegrationDescription
tMysqlConnectionIt is used to connect the MySQL database, which is defined in the component.
tMysqlInputIt is used to run the database query to read a database and extract fields (tables, views, etc.) depending on the query.
tMysqlOutputIt is used to write, update, and modify data in the MySQL database.
tFileInputDelimitedIt reads a delimited file row by row and divides them into separate fields, and passes it to the next component.
tFileOutputDelimitedIt is used to get the output from the input data in a delimited file based on the defined schema.
tFileInputExcelIt reads an excel file row by row and divides them into separate fields, and passes it to the next component.
tFileOutputExcelIt is used to write an MS Excel file with different data values based on a defined schema.
tFileListIt is used to get all the files and directories from a given file mask pattern.
tFileArchiveIt is used to compress a set of files or folders into a zip, gzip, or tar.gz archive file.
tRowGeneratorIt provides an editor where we can write functions or choose expressions to generate our sample data.
tMsgBoxIt returns a dialog box with the message specified and an OK button.
tLogRowIt is used to monitor the data which is getting processed. And it always displays data/output in the run console.
tPreJobIt defines the sub-jobs that will run before our actual job started.
tMaptMap is used to transform and route the data from single or multiple sources to single and various destinations.
tJoinIt is used to join two tables by performing inner and outer joins between the main flow and the lookup flow.
tJavaIt enables you to use personalized java code in the Talend program.
tRunJobIt is used to manage the complex job systems by running one Talend job after another.
tCloudStartIt is used to start instances on AmazonEC2(Amazon Elastic Compute Cloud)
tCloudStopIt is used to change the status of a launched instance on Amazon EC2(Amazon Elastic Compute Cloud)
tDotNETInstantiateIt is used to invoke the constructor of a .NET object, which is intended for later reuse.
tDotNETRowIt helps us to transform the data by utilizing the custom or built-in.NET classes.
tDB2ConnectionIt is used to open a connection in a specified database, which can be reused in the subsequent subjob or subjobs.
tFileFetchIt is used to retrieve a file through the given protocol (HTTP, HTTPS, FTP, or SMB).
tFTPCloseIt helps us to close an active FTP connection to release the taken resources.
tFTPConnectionIt is used to open the FTP connection to transfer the file in a single transaction.
tFTPDeleteIt is used to delete the files or folders in a specified directory on the FTP server.
tFileInputJSONIt is used to extract JSON data from a file and transfer the data to a file, database table, etc.
tFileOutputJSONIt helps us to receive the data and rewrites it in a JSON structured data block in an output file.
tFileInputXMLIt reads the XML structure related file row by row and breaks them up into fields and sends those fields, which is defined in the schema for the next component.
tFileOutputXMLIt writes an XML file with separated data values based on a defined schema.
tReplicateIt is used to duplicate the incoming schema into two identical output flows.

Connectors:

  • Row
  • Iterate
  • Triggers
  • Link
Talend Data Integration Components and Connectors

Row:

The row connector is used to maintain the actual data flow, some of the following row connectors are as below,

  • Main
  • Lookup
  • Filter
  • Rejects
  • ErrorRejects
  • Output
  • Unique/duplicates
  • Multiple input/output

Main:

The most commonly used row connection is Main because it helps to pass on the data flows from one component to the other and iterate on each row or reading input data based on the component properties setting.

Note:
We cannot connect two input components with the help of the Main row connection.
One incoming Row connection is possible per component because we will not be able to link twice the same target component using the Main row connection.

The second-row connection will be called as Lookup.

For connecting the two-component with the help of Main row connection,

Right-click on the input component, and select Row → Main on the connection list as we can see in the below image,

Talend Data Integration Components and Connectors

Or,

We can click on the component to highlight it, then right-click it or click on the O icon, which is visible on the side of it, then drag the cursor towards the destination component, which automatically creates a Row → Main type of connection.

Lookup:

The Lookup row connection is used when we want to connect multiple input flows.

It is a sub-flow component of the main flow component, which means that it is allowed to receive more than one incoming flows.

For connecting the lookup row connection, right-click on the row which needs to be changed and one popup menu will open, then click on the Set this connection as Main to turn the lookup row into the main row, as we can see in the below image,

Talend Data Integration Components and Connectors

Filter:

The filter row connection is used to connect the tFilterRow component specifically to an output component. It is used to collect the data matching for the filtering criteria.

Rejects:

The Rejects row connection is used to connect processing components to the output component.

It is used to collect the data, which does not match the filter or not valid for the expected output.

It also allows us to track the data which cannot be processed for reasons like the wrong type, undefined null value, etc. on some components.

When the Die on error option is deactivated, the reject connection got enabled.

ErrorRejects:

The ErrorRejects connection is used to connect the tMap components to the output component.

It is enabled when we clear the Die on Error checkbox in the tMap editor, and it collects data, which cannot be processed on some components.

Output:

The output row connection is used to connect a tMap component to one or more output components.

Unique/Duplicate:

The unique/duplicate row connection is used for connecting a tUniqRow to the output components.

The Unique row connection is used to collect the rows, which are found first in the incoming flow, and this flow of unique data is directed to the related output component or else to another processing subjob.

The Duplicate row connection is used to collect the possible duplicate of the first related rows.

Multiple input/output:

This type of row connection is used to handle the data through various inputs and outputs.

Combine:

A combine row connection is used to connect one CombinedSQL component to another.

Iterate:

To perform a loop on files contained in a directory, rows available in a file or the database entries is done by iterate connectors.

It is mainly used to connect the star component of flow (in a subjob).

Triggers:

The trigger connectors are used to create a dependency between jobs and Subjob, which are triggered one after the other according to the trigger's nature.

Talend Data Integration Components and Connectors

There are two types of triggers available in Talend:

  • Subjob triggers
  • Component triggers
Subjob triggersDescription
OnSubjobOKIt is used to trigger the next subjob on the condition where the subjob is completed without any error.
OnSubjobErrorIt is used to trigger the next subjob when the first (Main) subjob is not completed correctly.
Run ifIt triggers a subjob or a component when the condition is met.

Component triggersdescription
OnComponentOkThis type of connection is used to trigger the target component once the execution of the source component is completed without any error.
OnComponentErrorIt will trigger the subjob or a component as soon as an error is encountered in the primary job.

Link:

The link connector is used only with ETL components. This type of connection does not handle the actual data but only the metadata, which concerns the operating table.