Javatpoint Logo
Javatpoint Logo

Top 30+ Most Asked Ab Initio Interview Questions and Answers

1) What do you understand by Ab Initio? / Give a brief introduction of Abinitio.

Ab Initio, also known as Abinitio, is a tool used to extract, transform and load data. 'Abinitio' is a Latin word that means 'from the beginning'. It was named Abinitio because Sheryl Handler and their team started it after the bankruptcy of their previous company. Sheryl Handler was the former CEO of Thinking Machines Corporation, and he decided to start this company as a new beginning when the Thinking Machines Corporation went bankrupt.

It is mainly used for data analysis, data manipulation, batch processing, and graphical user interface (GUI) based parallel processing for businesses.


2) What is Ab Initio Software?

Ab Initio Software is an American multinational private enterprise software corporation headquartered in Lexington, Massachusetts. Ab Initio Software specializes in high-volume data processing applications and enterprise application integration. The Ab Initio software provides several products on a platform for parallel data processing applications.


3) Which industries mainly used Abinitio?

Abinitio Software applications are most widely used in Business Intelligence Data Processing Platforms to build most business applications such as operational systems, distributed application integration, complex event processing to data warehousing, and data quality management systems.


4) What is the use of Ab Initio Software applications?

The Ab Initio Software applications are mainly used to perform functions related to fourth generation data analysis, batch processing, complex events, quantitative and qualitative data processing, data manipulation, and graphical user interface (GUI)-based parallel processing software which is commonly used to extract, transform, and load (ETL) data.


5) What do you know about the history of Ab Initio Software?

The Ab Initio Software was founded in 1995 by Sheryl Handler and several other employees of Thinking Machines Corporation after the company's bankruptcy. Sheryl Handler was the former CEO of Thinking Machines Corporation, and he decided to start this company when the Thinking Machines Corporation went bankrupt.


6) What are the most important components of the architecture of Abinitio?

The most important components that the architecture of Abinitio includes are as follows:

  • GDE (Graphical Development Environment)
  • Co-operating System
  • Enterprise meta-environment (EME)
  • Conduct-IT

7) What is the most important role of Co-operating system in Abinitio?

The most important role of Co-operating system in Abinitio is to provide the following features:

  • It manages and runs the Abinitio graph and controls the ETL processes.
  • It provides ETL processes monitoring and debugging.
  • It provides Ab initio extensions to the operating system.
  • It is also responsible for meta-data management and interaction with the EME.

8) Is it possible to run a graph infinitely in Ab Initio? If yes, how?

Yes, it is possible to run a graph infinitely in Ab Initio. To do so, the graph end script should call the .ksh file of the graph. After that, if the graph name is xyz.mp then in the end script of the graph, it should call to xyz.ksh. By following the above steps, we can run the graph for infinitely.


9) In how many segments the Abinitio EME can be segregated?

The Abinitio EME can be logically segregated into two segments:

  • Data Integration Portion
  • User Interface ( It is used to access the meta-data information)

10) What do you understand by roll-up component?

The roll-up component facilitates users to collect or group the records on certain field values. It is called for each of the records in the group and consists of initializing 2 and Rollup 3.


11) How can you connect EME to Abinitio Server?

Following are some ways to connect EME to Abinitio Server.

  • Login to EME web interface- http://serverhost:[serverport]/abinitio to connect EME to Abinitio Server.
  • Set AB_AIR_ROOT
  • We can connect to the EME data store through GDE.
  • We can also use air-command to connect EME to Abinitio Server.

12) What do you understand by SANDBOX in Abinitio?

In Abinitio, the term SANDBOX is a collection of graphs and related files stored in a single directory tree and behaves as a group for version control, navigation, migration, and relocation. It is a safe and controlled environment to run graphs.


13) What do you understand by dependency analysis in Abinitio?

In Abinitio, dependency analysis is a process that EME uses to examine a project and trace how data is transferred and transformed- from component-to-component, field-by-field, within, and between graphs.


14) What is data encoding in Abinitio?

In Abinitio, data encoding is an approach that is used to keep data confidential. In this approach, we ensure that the information remains in a form that cannot be understood by someone else other than the sender and the receiver.


15) What are the different types of file extensions used in Abinitio?

Following is a list of different types of file extensions used in Abinitio:

  • .mp: This file extension is used to store Abinitio graph or graph components.
  • .mpc: This file extension is used to specify a custom component or program.
  • .mdc: This file extension is used to specify data-set or custom data-set components.
  • .dml: This file extension is used to specify data manipulation language file or record type definition.
  • .xfr: This file extension is used to specify transform function files.
  • .dat: This file extension is used to specify data files (multifile or serial file).

16) What information does a .dbc file extension provide to connect to the database?

The .dbc file extension provides the following information to connect to the database:

  • It provides the name and version number of the database you want to connect to.
  • It also specifies the computer's name on which the database instance or server runs to which you want to connect or install the database remote access software.
  • It specifies the server's name, database instance, or provider you want to link.

17) What do you understand by the "lookup" file in Abinitio?

In Abinitio, the lookup file is used to define one or more serial files (also known as flat files). It is a physical file that stores the data for the Lookup. It is a two-dimensional table of data that has been stored in a disk file. It stores the name and display format for each column of data depending on the file format.


18) What are the different types of parallelism used in Abinitio?

There are mainly three types of parallelism used in Abinitio. They are:

  • Component parallelism: The component parallelism is used by a graph with multiple processes executing simultaneously on separate data.
  • Data parallelism: The data parallelism is used by a graph that works with data divided into segments and operates on each segment respectively.
  • Pipeline parallelism: The pipeline parallelism is used by a graph that deals with multiple components executing simultaneously on the same data. In this parallelism, each component in the pipeline reads continuously from the upstream components, processes data, and writes to downstream components. It facilitates both components to operate in parallel.

19) What is the usage of dedup component and replicate component in Abinitio?

In Abinitio, the dedup component is used to eliminate duplicate records. On the other hand, the replicate component combines the data records from the inputs into one run and writes a copy of that run to each of its output ports.


20) What do you understand by Partition? What are the different types of partition components in Abinitio?

Partition is a process used in Abinitio for dividing data sets into multiple small sets for further processing. Following is a list of different types of partition components in Abinitio:

  • Partition by Round-Robin: The Round-Robin Partition is used for distributing data evenly, in block size chunks, across the output partitions.
  • Partition by Range: The Partition by Range facilitates users to divide data evenly among nodes, according to the set of partitioning ranges and keys.
  • Partition by Percentage: The Partition by Percentage is used to distribute data in a way that the output is proportional to fractions of 100.
  • Partition by Load balance: The Partition by Load balance is used for dynamic load balancing.
  • Partition by Expression: The Partition by Expression is used to divide data according to a DML expression.
  • Partition by Key: The Partition by Key is used to group data by a key.

21) What do you understand by de-partition in Abinitio?

De-partition is used to read data from multiple flows or operations and re-join data records from different flows. Several de-partition components are available in Abinitio, such as Gather, Merge, Interleave, Concatenation, etc.


22) What do you understand by the overflow errors?

The overflow errors are the errors that occur when the computer cannot process the bulk data. While processing data, overflow errors occur if the bulky calculations exceed the range of memory provided to them.


23) What are some of the air commands used in Abintio?

Following is a list of some air commands used in Abinitio:

  • air object Is<EME path for the object-/Projects/edf/..>: This air command is used to see the listings of objects in a directory inside the project.
  • air object rm<EME path for the object-/Projects/edf/..>: This air command is used to remove an object from the repository.
  • air object versions-verbose<EME path for the object-/Projects/edf/..>: This air command is used to give the object's version history.

Note: Apart from these, there are some other air commands for Abinitio, such as air object cat, air object modify, airlock show user, etc.


24) What is the use of syntax for m_dump in Abinitio?

In Abinitio, the syntax for m_dump is used to view the data in multifile from the UNIX prompt. Following are the commands for m_dump:

  • m_dump a.dml a.dat: This command is used to print the data as it manifested from GDE when we view data in formatted text.
  • m_dump a.dml a.dat>b.dat: This command is used in output. The output is re-directed in b.dat and acts as a serial file.b.dat that can be referred to when required.

25) What do you understand by Sort Component in Abinitio?

In Abinitio, the Sort Component is used to re-order the data. It consists of two parameters, "Key" and "Max-core".

  • Key: The key parameter is one of the parameters for the sort component. It is used to determine the collation order.
  • Max-core: The max-core parameter controls how often the sort component dumps data from memory to disk.

26) What is the difference between a DB config (.dbc file) and a CFG (.cfg) file?

The DB config file (.dbc file) consists of the information required for Ab Initio to connect to the database to extract or load tables or views. On the other hand, the .cfg file is the table configuration file created by db_config while using components like Load DB Table.


27) Is Ab Initio an ETL tool? What is an ETL tool?

ETL is an acronym that stands for Extract, Transform and Load. The ETL tool is software that works with the client-server model.

Ab Initio works as an ETL tool. It is a fourth-generation data analysis, data manipulation, and batch-processing graphical user interface (GUI)-based parallel processing tool used to Extract, Transform and Load (ETL) data.


28) What do you understand by a local lookup?

Local lookup file contains documentation or data records that can be settled in the major or main memory. It can be used to retrieve records much faster than it retrieves data from a disk. For this, transform functions are used by Local lookup.


29) What is the difference between the sandbox and EME? Can we perform checkin and checkout through sandbox?

Sandbox is a work area used to develop, test, or run code associated with a given project. A specific sandbox is associated with only one project whereas a project can be checked out to several sandboxes. We can hold only one version of the code within the sandbox at any time. On the other hand, the EME is a data store that contains all versions of the code checked into it.


30) What do you understand by local and formal parameters?

Local and formal parameters are both graph-level parameters, but there is a key difference between them. In the local parameter, we need to initialize the worth at the announcement. On the other hand, there is no need to initialize the data in formal parameters. It will produce at the time of operation of the graph for that parameter.


31) What is the difference between check point and phase in Ab Initio?

A list of differences between check point and phase in Ab Initio:

Check point Phase
A check point is a recovery point that is created when a graph fails in the middle of the process. A graph consists of phases. If a graph is created with phases, each phase is assigned to some part of the memory.
The rest of the process will be continued after the check point. All the phases run one by one.
Data from the check point is fetched and continues to execute after correction. In phase, the intermediate file will be deleted.

32) What do you understand by the rollup component? How can you do it?

Rollup is a way to group the records on a particular field. If a user wants to group the records on particular field values, rollup is the best way. It is a multi-stage transform function that contains the following mandatory functions.

  • Initialise
  • Rollup
  • Finalise

33) What is the difference between scientific data processing and commercial data processing?

In scientific data processing, data is processed with a great amount of computation, i.e., arithmetic operations. A limited amount of data is provided as input in this processing, and bulk data is there at the outcome. On the other hand, commercial data processing is completely different. In commercial data processing, the outcome is limited compared to the input data. The computational operations are also limited in commercial data processing.





You may also like:


Learn Latest Tutorials


Preparation


Trending Technologies


B.Tech / MCA