Wednesday, January 22, 2014

Informatica Interview Questions


1.       What is a mapping transformation?
Ans: A transformation is a repository object that generates, modifies, or passes data. The Designer provides a set of transformations that perform specific functions. For example, an Aggregator transformation performs calculations on groups of data.

2.       What is the difference between an expression transformation and an aggregator transformation?
Ans: Expression transformation permits you to perform  calculations row by row basis only. In Aggregator you can perform calculations on groups.

3.       What is a source qualifier? What is meant by Query Override?
Ans: Source Qualifier represents the rows that the PowerCenter Server reads from a relational or flat file source when it runs a session. When a relational or a flat file source definition is added to a mapping, it is connected to a Source Qualifier transformation.
PowerCenter Server generates a query for each Source Qualifier Transformation whenever it runs the session. The default query is SELET statement containing all the source columns. Source Qualifier has capability to override this default query by changing the default settings of the transformation properties. The list of selected ports or the order they appear in the default query should not be changed in overridden query.

4.       What is aggregator transformation?
Ans: The Aggregator transformation allows performing aggregate calculations, such as averages and sums. Unlike Expression Transformation, the Aggregator transformation can only be used to perform calculations on groups. The Expression transformation permits calculations on a row-by-row basis only. Aggregator Transformation contains group by ports that indicate how to group the data. While grouping the data, the aggregator transformation outputs the last row of each group unless otherwise specified in the transformation properties.Various group by functions available in Informatica are : AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM, VARIANCE.

5.       How Union Transformation is used?
Ans:  The union transformation is a multiple input group transformation that can be used to merge data from various sources (or pipelines). This transformation works just like UNION ALL statement in SQL, that is used to combine result set of two or more SELECT statements.

6.       What is a look up transformation?
Ans:  This transformation is used to lookup data in a flat file or a relational table, view or synonym. It compares lookup transformation ports (input ports) to the source column values based on the lookup condition. Later returned values can be passed to other transformations.

7.       Can a lookup be done on Flat Files?
Ans:  Yes.

8.       What are the contents in a session log in general?
Ans:
 -    Allocation of system shared memory
-    Execution of Pre-session commands/ Post-session commands
-    Session Initialization
-    Creation of SQL commands for reader/writer threads
-    Start/End timings for target loading
-    Error encountered during session
-    Load summary of Reader/Writer/ DTM statistics

9.       What are the different tracing levels available?
Ans:
Normal - Initialization and status information, Errors encountered, Transformation errors, rows skipped, summarize session details (Not at the level of individual rows)
Terse - Initialization information as well as error messages and notification of rejected data
Verbose Initialization -    Addition to normal tracing, Names of Index, Data files used and detailed transformation statistics.
Verbose Data- Addition to Verbose Init, Each row that passes in to mapping detailed transformation statistics.

10.   How to implement normalization without normalizer transformation in informatica?
Ans:  For example, if we have three columns that are to be normalized as shown below,

Customer Name
Email1
Email2
Email3
A
B
C
                                I.            Place three different Expression transformations EXP1, EXP2, EXP3
                              II.            Take Email1 and Customer Name to EXP1, Email2 and Customer Name to EXP2…
                            III.            Place an union transformation and create three different groups
                            IV.            Link the above expression outputs to the union transformation

11.   Explain the error handling methodology in your project?
Ans: Error handling will be implemented for a column in the following cases
                                I.            The column values need to be in a range
                              II.            The column should only hold specific characters like ‘Y’ and ‘N’ (Boolean)
                            III.            The column value should satisfy a specific condition
        For these kind of columns we create separate ports with the suffix “ _er”. Here we check for the above conditions and the column value obeys the condition then null else an errorcode with description is mentioned. The condition looks like
IIF(ColumnValue<> condition, errorcode_errordescripton, null)

               A port Record_status will be maintained for checking the row is qualified or not.
IIF(ISNULL(port1_err) AND ISNULL(port2_err) …, ‘VALID’, ‘NOT VALID’)

A router followed by the expression will check for the ‘VALID’ status of the record and ‘NOT VALID’ records will be routed to the error table. We maintain an error table for every source table.

12.   Explain the dynamic lookup implementation
Ans: Enable the “Dynamic Lookup cache” option in the properties tab of the lookup. As soon as you create a dynamic lookup, Informatica adds one extra port called “NewLookupRow” in the ports tab. Using the value of this port, the rows can be routed for insert, update or to do nothing. You just need to use a Router or Filter transformation followed by an Update Strategy.
                                I.            0 = Integration Service does not update or insert the row in the cache.
                              II.            1 = Integration Service inserts the row into the cache.
                            III.            2 = Integration Service updates the row in the cache
When the same port is used in router, it can be routed for Insert or update

13.   How a mapping should be designed to extract the source data from the last extraction time?
Ans:
                                      I.            Create a workflow as follows
start-->session1-->assignmenttask
                                    II.            Session1 will load the data into your target, meaning this session will perform your actual data load.
                                  III.             Assignmenttask: Create a workflow variable for the following
$$SessionStartTime = $session1.StartTime
The start time will be stored in the cache and will be used in the mapping for the next run
                                  IV.            The session1’s mapping will have a mapping variable $$PrevSessStart and the same will be used in SQLOverride for filtering the old records. The query would be something like
Select COL1, COL2
From Source_Table
Where created time > $$PrevSessStart
                                    V.            In the Pre-session variable assignment tab for session 1, set as follows
$$PrevSessStart =$$SessionStartTime

14.   What are the steps in SDLC?
                                   I.            Requirements gathering
                                 II.            Analysis
                               III.            Design
                               IV.            Coding
                                 V.            Testing
                               VI.            Delivery and Maintenance

15.   What is the difference in using a sub-query and a Join?
Ans: Using a JOIN risks duplicating the information in the resultset for the parent table if there are more than one child records related to it, because a JOIN returns the rows that match. Which means if you want unique values from the parent table while using JOINs, you need to look at using either DISTINCT or a GROUP BY clause. But none of this is a concern if a subquery is used. On the other hand, Subquery performance will be less when compared to a join.

16.   Explain about the types of partitioning in informatica
Ans:
                                I.            Round-robin partitioning. The Informatica Server distributes data evenly among all partitions. Use round-robin partitioning where you want each partition to process approximately the same number of rows.
                              II.            Hash partitioning. The Informatica Server applies a hash function to a partition key to group data among partitions. If you select hash auto-keys, the Informatica Server uses all grouped or sorted ports as the partition key. If you select hash user keys, you specify a number of ports to form the partition key. Use hash partitioning where you want to ensure that the Informatica Server processes groups of rows with the same partition key in the same partition.


                            III.            Key range partitioning. You specify one or more ports to form a compound partition key. The Informatica Server passes data to each partition depending on the ranges you specify for each port. Use key range partitioning where the sources or targets in the pipeline are partitioned by key range.
                            IV.            Pass-through partitioning. The Informatica Server passes all rows at one partition point to the next partition point without redistributing them. Choose pass-through partitioning where you want to create an additional pipeline stage to improve performance, but do not want to change the distribution of data across partitions

17.   How would you configure a normalizer transformation?
Ans: For example, we have a following source data to be normalized
                                I.            Create column names CustomerName and Email in the Normalizer tab in the Normalizer transformation
                              II.            Set the “occurs” property  as three for Email port
                            III.            Now, you can see the following ports created in the Ports tab,
a)      4 input ports CustomerName_in, Email_in1, Email_in2, Email_in3
b)      2 output ports CustomerName, Email
c)       2 audit fields namely GK_Email, GCID_Email.
GK_ field will act as a sequence generator and GCID will hold the occurrence number, that  is 1, 2 or 3 in our case.

18.   When the source flat file is in remote server, how will you read the file?
Ans: Use FTP connection object in workflow manager

19.   What is DTM?
Ans: Data Transformation Manager (DTM) is responsible for the data movement from source to target specified in the mapping.
                                I.            DTM process allocates DTM process memory
                              II.            DTM initializes the session and fetches mapping
                            III.            DTM executes pre-session commands and procedures
                            IV.            DTM creates reader, writer, transformation threads for each pipeline
                              V.            DTM executes post-session commands and procedures
                            VI.            DTM writes historical incremental aggregation/lookup to repository

20.   How will you approach a mapping in case of performance bottlenecks
Ans: Look for performance bottlenecks in the following order:
                                I.            Target: Replace the target table with a flat file in the mapping copy and check. If the run time is very much reduced then the target table is the cause.
Solution:
a)      Have the database administrator optimize database performance by optimizing the query.
b)      Increase the database network packet size.
c)       Configure index and key constraints.
                              II.            Source: Add a Filter transformation after each source qualifier. Set the filter condition to false so that no data is processed passed the Filter transformation. If the time it takes to run the new session differs drastically, you have a source bottleneck. Also you can run a sample query in the DB and check.
Solution:
a)        Have the database administrator optimize database performance by optimizing  the query.
b)        Increase the database network packet size.
c)         Configure index and key constraints.
d)        If there is a long delay between the two time measurements in a database query, you can use an optimizer hint.
                            III.            Transformation: If you determine that you do not have a source or target bottleneck, you may have a mapping bottleneck.
Solution: Generally, you reduce the number of transformations in the mapping and delete unnecessary links between transformations to optimize the mapping.

21.   How will you optimize an aggregator transformation
Ans:
                                I.            Grouping by simple columns - use integer values
                              II.            Filtering Data before you aggregate
                            III.            Limiting Port Connections - Limit the number of connected input/output or output ports to reduce the amount of data the Aggregator transformation stores in the data cache.
                            IV.            Using sorted Input

22.   How will you optimize a Joiner transformation
Ans:
                                I.            Designate the master source as the source with fewer duplicate key values
                              II.            Designate the master source as the source with fewer rows
                            III.            Use the Source Qualifier transformation to perform the join
                            IV.            Join sorted data when possible - minimize disk input and output

23.   How will you optimize a Lookup transformation
Ans:
                                I.            Be sure to delete the unwanted columns from the lookup as they affect the lookup cache very much.
                              II.            If the Lookup transformation is after the source qualifier and there is no active transformation in-between, you can as well go for the SQL over ride of source qualifier
                            III.            The cache that you assigned for the lookup is not sufficient to hold the data or index of the lookup. Whatever data that doesn't fit into the cache is spilt into the cache files designated in $PMCacheDir. When the PowerCenter doesn't find the data you are lookingup in the cache, it swaps the data from the file to the cache and keeps doing this until it finds the data. This is quite expensive for obvious reasons being an I/O operation.    Increase the cache so that the whole data resides in the memory
                            IV.            Sequential and Concurrent caches: The 8.x version of PowerCenter gives us this wonderful option to build the caches of the lookups either concurrently or in a sequential manner depending on the business rule. If no business rule dictates otherwise, concurrent cache building is a very handy option.

24.   What are the different tasks in Workflow Manager?
Ans: Session, Event-Raise, Event-Wait, Timer, Decision, Assignment, Control, Email tasks

25.   What is the difference between a Static and Dynamic cache?
Ans:
Static Cache: You cannot insert or update the cache. The informatica server cache the lookup table data in the start of the session. It returns a value from the cache when the condition is true. When the condition is not true, informatica server returns the default value for connected transformations and null for unconnected transformations.
Dynamic Cache: You can insert rows into the cache as you pass to the target. The informatica server inserts rows into cache when the condition is false. This indicates that the row is not in the cache or target table. Can be used only in connected lookup transformations

26.   What is the difference between a persistent and non-persistent cache?
Ans: Persistent Cache: Lookup transformation can be configured to save the cached data to be saved in a flat file, so that, it can be used for the next run. This can be named or un-named. Named cache can be used in multiple mapping.
Non-Persistent cache: Cached data for lookup operation is not stored in the disk space and the cache should be loaded from the database every time.










27.   What is the difference between a connected look up and unconnected look up?
Connected Lookup
Unconnected Lookup
Connected lookup takes input values directly from other transformations in the pipeline.
Unconnected lookup doesn’t take inputs directly from any other transformation, but it can be used in any transformation (like expression) and can be invoked as a function using :LKP expression. So, an unconnected lookup can be called multiple times in a mapping.
Dynamic caching can be used
Dynamic caching is not applicable
Returns default value when there is no matching record
Returns NULL when there is no matching record
Returns as many columns in the lookup table
Only one return port

28.   What is a worklet?
Ans: Worklets are objects that represent a set of workflow tasks that allow to reuse a set of worflow logic in several window.
Use of Worklet:- You can bind many of the tasks in one place so that they can easily get identified and also they can be of a specific purpose

29.   What is a mapplet?
Ans: A mapplet is a reusable object that is created using mapplet designer. The mapplet contains set of transformations and it allows us to reuse that transformation logic in multiple mappings.

30.   What are the unsupported repository objects in a mapplet?
Ans:
COBOL source definitions
Joiner Transformations
Normalizer Transformations
Non-Reusable Sequence generator transformations
Pre-session or Post-session stored procedures
Target definitons
XML Source definitons
IBM MQ Source definitions




31.   what is the difference between a mapplet and a reusable transformation?
Ans: A Mapplet is a set of transformations where as a reusable transformation is single transformation. Mapplet is created in a Mapplet designer where as a reusable transformation is created in Transformation developer or Mapping designer.

32.   What is the use of update override query in the target instance?
                                I.            The update can be performed base on a target column value.
                              II.            The update can be performed without defining a key column in informatica and in the database

33.   Consider a source flat file with comma delimited values. In the target table, only one column gets loaded. What would be the issue?
Ans: The delimiter character might not be set correctly in the source instance properties

34.   When the ports are linked correctly from Source Qualifier to the target, the target table columns are getting the values interchanged from one another. What would be the issue?
Ans: The Source qualifier might have a SQL override and that may have the column order different from the port order in the ports tab.

35.   What is a data warehouse?
Ans: It is a process for building decision support systems and knowledge management enviroment that supports both day-to-day tactical decision making and long-term business strategies. 
Bill Inmon’s definition:  "Subject-oriented, integrated, time variant, non-volatile collection of data in support of management's decision making process."
Ralph Kimball’s definition:  "A copy of transaction data specifically structured for query and analysis."

36.   List some of the transformations?
Aggregator - Active & Connected
source qualifier - Active & Connected
Filter - Active & Connected
Expression - Passive & Connected
joiner - Active & Connected
lookup - passive & connected/unconnected
HTTP - passive & connected
normalizer - active & connected
rank - active & connected
router - active & connected
sequence - passive & connected
sorter - active & connected
stored procedure - passive & connected/unconnected
union - active & connected
Java – active & connected
SQL – active/passive & connected

37.   Explain a mapping plan for a scenario where source is a flat file and the target should be loaded with the last 20 rows  of the flat file
                                I.            From the source qualifier, link the ports to an Expression transformation and create an additional I/O port “Seq_no”
                              II.            Use a Sequence generator for numbering the rows. Link the nextval port to the Seq_no
                            III.            From the source qualifier, link the ports to an aggregator transformation and get the count of records with no ‘group by’ in a port “Count”
                            IV.            Use joiner to join the two transformation outputs and “Full outer” as the type of join
                              V.            In the router transformation, create a group with the condition
IIF(Count-Seq_no < 20, True, False)

38.   Give a mapping plan for loading one target table with the distinct records and the other with the duplicate records. Assume source to be a flat file
Ans: Assuming that we have a key column “EMP_NAME”
                                I.            Sorter: Sort the records with EMP_NAME
                              II.            Expression:  Create a variable port “v_COUNT” with the expression
IIF(EMP_NAME = v_EMP_NAME_old, v_COUNT+1, 1)
                            III.            The variable port v_EMP_NAME_old should be created below the port v_COUNT  with the expression EMP_NAME
                            IV.            Create an output port o_COUNT with the expression value as v_COUNT
                              V.            Router: Create two different groups
Distinct: o_COUNT = 1
Duplicate: o_COUNT >1

39.   What is the use of index, what is the impact in the performance when DML operation is performed in the indexed table?
Ans: Oracle uses indexes to avoid the need for large-table, full-table scans and disk sorts, which are required when the SQL optimizer cannot find an efficient way to service the SELECT query
The performance will be less when the DML operations like INSERT, UPDATE, DELETE are carried out.

40.   Write a query to delete the duplicates from a table
Ans: Assuming the database to be Oracle
Delete
 from table_name
where rowid not exists  (select max(rowid)
from  table_name group by key_column)

41.   What is the main use of update strategy?
Ans:  Update strategy can be used for performing an INSERT/UPDATE based on some logic. For example, IIF(CONDITION, DD_INSERT, DD_UPDATE)
UPDATE and INSERT can be achieved with a single target instance

42.   Assume there are two target tables. First 5 records should be loaded in the first target table, next 5 to the other table, and the next 5 again to the first target. Explain a mapping plan
Ans:  Sequence Generator: Check the cycle option and the maximum value as 10
Expression: Link the records from the SQ and create a port COUNT for Sequence generator value
Router: Create two groups
Target1 – COUNT <= 5
Target2 – COUNT > 5

43.   How will you load an audit table                to maintain the session run properties?
1. Create a workflow as follows
Start-->session1-->assignmenttask-->session2
2. Session1 will load the data into your target, meaning, this session will perform your actual data load.
3. Assignmenttask :
a. Create workflow variables for the following
$$workflowname , $$sessionStartTime , $$SessionEndTime , $$Successrows , $$FailedRows etc...
b. Use the Expression tab in the Assignment Task and assign as follows:
$$workflowname = $PMWorkflowName
$$sessionStartTime = $session1.StartTime
$$SessionEndTime = $session1.Endtime
4. Session 2
a. This should call a mapping say m_sessionLog
b. This mapping m_sessionLog should have mapping variables for the above defined workflow variables such as $$wfname , $$stime , $$etime  etc..
c. This mapping m_sessionLog should use a dummy source and inside the expression you must assign the output ports to use the mapping variables. Meaning, output port such as
workflowname=$$wfname
starttime=$$$stime
d. Create an audit table in your database as Target and connect all the required output ports to the target which is your audit table.
e. In the Pre-session variable assignment tab for session 2 set as follows
mapping variable = workflow variable, in your case
$$wfname=$$workflowname
$$stime=$$sessionStartTime
5. Run the workflow. Session1 will achieve your business requirement.
Once the session is completed the assignment task will assign the session1 statistics to the appropriate workflow variables
Session2 will assign the workflow variables to the mapping variables and load the session stats to the audit table

44.   What are the transformations cannot be used upstream from a union transformation?
Ans: Sequence generator and Update strategy

45.   What is a degenerated dimension?
Ans: A degenerate dimension acts as a dimension key in the fact table, however does not join to a corresponding dimension table because all its interesting attributes have already been placed in other analytic dimensions. This key will be combined with other foreign keys of the dimension tables to form the fact table’s primary key.

46.   Configuring Mapping variables
Ans: Go to the tab Mapping-->parameter and variables-->$$variable_name
There are 3 aggregation types, namely, COUNT, MIN, MAX
Once the variable is created, it can be used in the expression transformation with the functions like SETCOUNTVARIABLE($$variable_name), SETMAXVARIABLE( $$Variable_name, value ), SETVARIABLE ($$Variable_name, value)
SETCOUNTVARIABLE($$variable_name)
Counts the rows evaluated by the function and increments the current value of a mapping variable based on the count. Increases the current value by one for each row marked for insertion. Decreases the current value by one for each row marked for deletion. Keeps the current value the same for each row marked for update or reject.
SETMAXVARIABLE( $$Variable_name, value )
Sets the current value of a mapping variable to the higher among the two values mentioned above. The function executes only if a row is marked as insert.
SETVARIABLE ($$Time, SYSDATE)
Use with mapping variables with Max/Min aggregation type. The SETVARIABLE function executes only if a row is marked as insert or update.

47.   OPB and REP tables in the metadata
Ans: OPB tables are the basic repository tables which are created when we create a repository. These tables are continuously updated whenever we make a change in our repository like creating new objects, modifying existing objects, deleting objects or when we run a workflow
REP are the views not the tables.These are the views which are created on our OPB tables for querying purposes.




48.   Useful OPB tables and its contents
Ans:
OPB_SUBJECT - This table stores the name of each PowerCenter repository folder.
OPB_MAPPING - This table stores the name and ID of each mapping and its corresponding folder. Usage: Join any of the repository tables that have MAPPING_ID as column with that of MAPPING_ID in this table to know the mapping name.
OPB_TASK - This table stores the name and ID of each task like session, workflow and its corresponding folder. Usage: Join any of the repository tables that have TASK_ID as column with that of TASK_ID/SESSION_ID in this table to know the task name. Observe that the session and also workflow are stored as tasks in the repository. TASK_TYPE for session is 68 and that of the workflow is 71.
OPB_SESSION - This table stores the linkage between the session and the corresponding mapping. As informed in the earlier paragraph, you can use the SESSION_ID in this table to join with TASK_ID of OPB_TASK table.
OPB_TASK_ATTR - This is the table that stores the attribute values (like Session log name etc) for tasks. Usage: Use the ATTR_ID of this table to that of the ATTR_ID of OPB_ATTR table to find what each attribute in this table means. You can know more about OPB_ATTR table in the next paragraphs.
OPB_WIDGET - This table stores the names and IDs of all the transformations with their folder details. Usage: Use WIDGET_ID from this table to that of the WIDGET_ID of any of the tables to know the transformation name and the folder details. Use this table in conjunction with OPB_WIDGET_ATTR or OPB_WIDGET_EXPR to know more about each transformation etc.
OPB_WIDGET_FIELD - This table stores the names and IDs of all the transformation fields for each of the transformations. Usage: Take the FIELD_ID from this table and match it against the FIELD_ID of any of the tables like OPB_WIDGET_DEP and you can get the corresponding information.
OPB_WIDGET_ATTR - This table stores all the properties details about each of the transformations. Usage: Use the ATTR_ID of this table to that of the ATTR_ID of OPB_ATTR table to find what each attribute in this transformation means.
OPB_EXPRESSION - This table stores the details of the expressions used anywhere in PowerCenter. Usage: Use this table in conjunction with OPB_WIDGET/OPB_WIDGET_INST and OPB_WIDGET_EXPR to get the expressions in the Expression transformation for a particular, mapping or a set.
OPB_ATTR-This table has a list of attributes and their default values if any. You can get the ATTR_ID from this table and look it up against any of the tables where you can get the attribute value. You should also make a note of the ATTR_TYPE, OBJECT_TYPE_ID before you pick up the ATTR_ID. You can find the same ATTR_ID in the table, but with different ATTR_TYPE or OBJECT_TYPE_ID.
OPB_COMPONENT- This table stores the component details like Post-Session-Success-Email, commands in Post-Session/pre-Session etc. Usage: Match the TASK_ID with that of the SESSION_ID in OPB_SESSION table to get the SESSION_NAME and to get the shell command or batch command that is there for the session, join this table with OPB_TASK_VAL_LIST table on TASK_ID.
OPB_CFG_ATTR- This table stores the attribute values for Session Object configuration like "Save Session log by", Session log path etc.

49.   How does Push-Down Optimization work?
Ans: One can push transformation logic to the source or target database using pushdown optimization. The Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the source or the target database which executes the SQL queries to process the transformations. The amount of transformation logic one can push to the database depends on the database, transformation logic, mapping and session configuration. The Integration Service analyzes the transformation logic it can push to the database and executes the SQL statement generated against the source or target tables, and it processes any transformation logic that it cannot push to the database.

50.   Types of Push-down optimization
Ans: Using source-side pushdown optimization:
The Integration Service pushes as much transformation logic as possible to the source database. The Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the source database and executes the corresponding SELECT statement.

Using target-side pushdown optimization:
The Integration Service pushes as much transformation logic as possible to the target database. The Integration Service analyzes the mapping from the target to the source or until it reaches an upstream transformation it cannot push to the target database. It generates an INSERT, DELETE, or UPDATE statement based on the transformation logic for each transformation it can push to the database and executes the DML.

Using full pushdown optimization:
The Integration Service pushes as much transformation logic as possible to both source and target databases. If you configure a session for full pushdown optimization, and the Integration Service cannot push all the transformation logic to the database, it performs source-side or target-side pushdown optimization instead. Also the source and target must be on the same database. The Integration Service analyzes the mapping starting with the source and analyzes each transformation in the pipeline until it analyzes the target. When it can push all transformation logic to the database, it generates an INSERT SELECT statement to run on the database. The statement incorporates transformation logic from all the transformations in the mapping. If the Integration Service can push only part of the transformation logic to the database, it does not fail the session, it pushes as much transformation logic to the source and target database as possible and then processes the remaining transformation logic.


Configuring Parameters for Pushdown Optimization
Depending on the database workload, we might want to use source-side, target-side, or full pushdown optimization at different times and for that we can use the $$PushdownConfig mapping parameter. The settings in the $$PushdownConfig parameter override the pushdown optimization settings in the session properties. Create $$PushdownConfig parameter in the Mapping Designer , in session property for Pushdown Optimization attribute select $$PushdownConfig and define the parameter in the parameter file.

The possible values may be,
1. none i.e the integration service itself processes all the transformations,
2. Source [Seq View],
3. Target [Seq View],
4. Full [Seq View]

Handling Error when Pushdown Optimization is enabled
When the Integration Service pushes transformation logic to the database, it cannot track errors that occur in the database.

When the Integration Service runs a session configured for full pushdown optimization and an error occurs, the database handles the errors. When the database handles errors, the Integration Service does not write reject rows to the reject file.

If we configure a session for full pushdown optimization and the session fails, the Integration Service cannot perform incremental recovery because the database processes the transformations. Instead, the database rolls back the transactions. If the database server fails, it rolls back transactions when it restarts. If the Integration Service fails, the database server rolls back the transaction.

51.   What are the prerequisites to be completed before using the Unstructured data transformation?
Ans: The Unstructured Data transformation is a transformation that processes unstructured and semi-structured file formats, such as messaging formats, HTML pages and PDF documents.
Data Transformation has the following components:
                                I.            Data Transformation Studio: A visual editor to design and configure transformation projects.
                              II.            Data Transformation Service: A Data Transformation project that is deployed to the Data Transformation Repository and is ready to run.
                            III.            Data Transformation repository: A directory that stores executable services that you create in Data Transformation Studio. You can deploy projects to different repositories, such as repositories for test and production services.
                            IV.            Data Transformation Engine: A processor that runs the services that you deploy to the repository.
Data transformation should be installed after the power center installation.

52.   What is the syntax for running a workflow through PMCMD command
Ans:  pmcmd startworkflow -s -d -u -p -f workflow_name
-s server
-d domain
-u username
-p password
-f folder

53.   Some error codes shown in the session log
Ans: BR - error related reader process, including ERP, flat file, relation sources
CMN - error related databases, memory allocation, lookup, joiner and internal errors
DBGR - error related to debugger
SE, TM, WID - errors related to transformations
PMF - error related caching in aggregator, lookup, joiner, rank
RR - error related to relational sources
EP - error related to external procedure
LM - error related Load Manager
REP - error related to repository functions
VAR - error related to mapping variable
WRT - error related to writer

54.   What is a code page?
Ans: A code page contains encoding to specify characters in a set of one or more languages. The code page is selected based on source of the data. For example if source contains Japanese text then the code page should be selected to support Japanese text.
When a code page is chosen, the program or application for which the code page is set, refers to a specific set of data that describes the characters the application recognizes. This influences the way that application stores, receives, and sends character data.


55.   How to execute PL/SQL script from Informatica mapping?
Ans: Stored Procedure (SP) transformation can be used to execute PL/SQL Scripts. In SP Transformation PL/SQL procedure name can be specified. Whenever the session is executed, the session will call the pl/sql procedure.

56.   What are the two types of errors that occur in a session run?
Ans: Fatal and Non-Fatal errors
Non-Fatal Errors: It is an error that does not force the session to stop on its first occurrence. Establish the error threshold in the session property tab with the “stop on errors” option. When you enable this option, the server counts Non-Fatal errors that occur in the reader, writer and transformations.
                                I.            Reader errors can include alignment errors while running a session in Unicode mode.
                              II.            Writer errors can include key constraint violations, loading NULL into the NOT-NULL field and database errors.
                            III.            Transformation errors can include conversion errors and any condition set up as an ERROR, Such as NULL Input.
Fatal Errors: This occurs when the server cannot access the source, target or repository. This can include loss of connection or target database errors, such as lack of database space to load data.
If the session uses normalizer (or) sequence generator transformations, the server cannot update the sequence values in the repository, and a fatal error occurs.

57.   What is a commit interval option in the session property?
Ans: A commit interval is the interval at which the server commits data to relational targets during a session.

58.   What are the different types of commit interval?
Ans: Target based commit and Source based commit
(a)    Target based commit
-    Server commits data based on the no of target rows and the key constraints on the target table. The commit point also depends on the buffer block size and the commit interval.
-    During a session, the server continues to fill the writer buffer, after it reaches the commit interval. When the buffer block is full, the Informatica server issues a commit command. As a result, the amount of data committed at the commit point generally exceeds the commit interval.
-    The server commits data to each target based on primary/foreign key constraints.
(b)    Source based commit
-    Server commits data based on the number of source rows. The commit point is the commit interval you configure in the session properties.
-    During a session, the server commits data to the target based on the number of rows from an active source in a single pipeline. The rows are referred to as source rows.
-    A pipeline consists of a source qualifier and all the transformations and targets that receive data from source qualifier.
-    Although the Filter, Router and Update Strategy transformations are active transformations, the server does not use them as active sources in a source based commit session.
-    When a server runs a session, it identifies the active source for each pipeline in the mapping. The server generates a commit row from the active source at every commit interval.
-    When each target in the pipeline receives the commit rows the server performs the commit.

59.   Explain about Reject Loading?
Ans:
                                I.            During a session, the server creates a reject file for each target instance in the mapping. If the writer of the target rejects data, the server writes the rejected row into the reject file.
                              II.            You can correct those rejected data and re-load them to relational targets, using the reject loading utility. (You cannot load rejected data into a flat file target)
                            III.            Reading Rejected data:
Ex: 3,D,1,D,D,0,D,1094345609,D,0,0.00
There are two main things, Row indicator and Column indicator
Row indicator: Row indicator tells the writer, what to do with the row of wrong data.

Row indicator
Meaning
Rejected By
0
Insert
Writer or target
1
Update
Writer or target
2
Delete
Writer or target
3
Reject
Writer


Column indicator: Column indicator is followed by the first column of data, and another column indicator. They appear after every column of data and define the type of data preceding it

Column Indicator
Meaning
Writer Treats as
D
Valid Data
Good Data. The target accepts it unless a database error occurs, such as finding a duplicate key
?
Overflow
Bad Data
N
Null
Bad Data
T
Truncated
Bad Data

60.   Tasks carried out during a session run
Ans: 1.    LM locks the session and read session properties
2.    LM reads parameter file
3.    LM expands server/session variables and parameters
4.    LM verifies permission and privileges
5.    LM validates source and target code page
6.    LM creates session log file
7.    LM creates DTM process
8.    DTM process allocates DTM process memory
9.    DTM initializes the session and fetches mapping
10.    DTM executes pre-session commands and procedures
11.    DTM creates reader, writer, transformation threads for each pipeline
12.    DTM executes post-session commands and procedures
13.    DTM writes historical incremental aggregation/lookup to repository
14.    LM sends post-session emails

61.   Explain about the various threads created during a session run
Ans:
Various threads
functions
Master thread
Handles stop and abort requests from load manager
Mapping thread
One thread for each session. Fetches session and mapping information. Compiles mapping. Cleans up after execution
Reader thread
One thread for each partition. Relational sources use relational threads and Flat files use file threads
Writer thread
One thread for each partition writes to target
Transformation thread
One or more transformation for each partition
62.   Informatica Pipeline Partitioning Explained
Ans: Each mapping contains one or more pipelines. A pipeline consists of a source qualifier, all the transformations and the target. When the Integration Service runs the session, it can achieve higher performance by partitioning the pipeline and performing the extract, transformation, and load for each partition in parallel.

A partition is a pipeline stage that executes in a single reader, transformation, or writer thread. The number of partitions in any pipeline stage equals the number of threads in the stage. By default, the Integration Service creates one partition in every pipeline stage. If we have the Informatica Partitioning option, we can configure multiple partitions for a single pipeline stage.

Partition Points: In the session properties we can add or edit partition points. When we change partition points we can define the partition type and add or delete partitions(number of partitions).

Number of partitions:  A partition is a pipeline stage that executes in a single thread. If you use PowerCenter, you can set the number of partitions at any partition point
When you add partitions, you increase the number processing threads, which can improve session performance.

Partition types: The Informatica Server specifies a default partition type at each partition point. If you use PowerCenter, you can change the partition type. The partition type controls how the Informatica Server redistributes data among partitions at partition points.

The Workflow Manager allows you to specify the following partition types
• Round-robin partitioning. The Informatica Server distributes data evenly among all partitions. Use round-robin partitioning where you want each partition to process approximately the same number of rows.
• Hash partitioning. The Informatica Server applies a hash function to a partition key to group data among partitions. If you select hash auto-keys, the Informatica Server uses all grouped or sorted ports as the partition key. If you select hash user keys, you specify a number of ports to form the partition key. Use hash partitioning where you want to ensure that the Informatica Server processes groups of rows with the same partition key in the same partition.
• Key range partitioning. You specify one or more ports to form a compound partition key. The Informatica Server passes data to each partition depending on the ranges you specify for each port. Use key range partitioning where the sources or targets in the pipeline are partitioned by key range.
• Pass-through partitioning. The Informatica Server passes all rows at one partition point to the next partition point without redistributing them. Choose pass-through partitioning where you want to create an additional pipeline stage to improve performance, but do not want to change the distribution of data across partitions.

63.   How to setup Event Raise and Event Wait tasks?
Ans:
                                I.            Create an event -  right click the workflow and edit-> events tab -> create an event
                              II.            Place Eventraise task in the workflow space and give the created event name in the "user defined event" (properties tab)
                            III.            Place eventwait task in the workflow space-> there are 2 options in events tab,
                             Pre-defined: This is a file watch
                             User-defined: This is used for the event created on the workflow properties
When the Eventraise is executed, it creates the event, it triggers the event wait to continue

64.   Time task:
Ans:  You can specify the period of time to wait before the Integration Service runs the next task in the workflow with the Timer task.
Two options:
Absolute time - give the exact time when to start the next task or refer a date-time variable
Relative time - give the hours, minute, seconds - from the start time of this task
                                                - From the start time of the parent workflow/worklet
                                                - From the start time of the top-level workflow

65.   Decision task:
Ans: Use the Decision task instead of multiple link conditions in a workflow. Instead of specifying multiple link conditions, use the predefined condition variable in a Decision task to simplify link conditions.

66.   Assignment task:
Ans: You can assign a value to a user-defined workflow variable with the Assignment task.

67.   CONTROL TASK
Ans: We can use the Control task to stop, abort, or fail the top-level workflow or the parent workflow based on an input link condition.  
Control Option
Description
Fail Me Fails the control task.
Fail Parent Marks the status of the WF or worklet that contains the
Control task as failed.
Stop Parent Stops the WF or worklet that contains the Control task.
Abort Parent Aborts the WF or worklet that contains the Control task.
Fail Top-Level WF
Fails the workflow that is running.
Stop Top-Level WF
Stops the workflow that is running.
Abort Top-Level WF
Aborts the workflow that is running.

68.   Explain about the FIRST(LAST) aggregate function
Ans: Returns the first value found within a port or group. Optionally, you can apply a filter to limit the rows the Integration Service reads. You can nest only one other aggregate function within FIRST.
SYNTAX: FIRST( value [, filter_condition ] )
EXAMPLE: FIRST( ITEM_NAME, ITEM_PRICE > 10 )

69.   How the GET_DATE_PART() expression function works?
Ans: GET_DATE_PART( DATE_SHIPPED, 'HH12' )
GET_DATE_PART( DATE_SHIPPED, 'DD' )
GET_DATE_PART( DATE_SHIPPED, 'MON' )

70.   Tell some of the expression function you have worked on?
Ans:
                                I.            GREATEST( QUANTITY1, QUANTITY2, QUANTITY3 )
                              II.            INDEXOF( ITEM_NAME, ‘diving hood’, ‘flashlight’, ‘safety knife’)
1 if the input value matches string1, 2 if the input value matches string2, and so on.
0 if the input value is not found.
NULL if the input is a null value.
                            III.            INITCAP( string )
                            IV.            INSTR( COMPANY, 'a', 1, 2 )
                              V.            IS_DATE() returns 0 or 1
                            VI.            Is_Number()
                          VII.            Is_spaces() Returns whether a string value consists entirely of spaces.
                        VIII.            LAST_DAY() Returns the date of the last day of the month for each date in a port.
                            IX.            SUBSTR()
                              X.            REPLACESTR ( CaseFlag, InputPort, OldString1, [OldString2, ... OldStringN,] NewString )
                            XI.            REVERSE( string )
                          XII.            RPAD( ITEM_NAME, 16, '.')

71.   Can you update a table based on a condition, give an example
Ans:
Update employee set employeename = case when empid = 1 then 'RK'
                                                                                        when empid = 2 then 'Valluri'
                                                                                 else null
                                                                                 end



72.   Explain Java transformation
                                I.            Ans: The Java transformation provides a simple native programming interface to define transformation functionality with the Java programming language.
                              II.            Java transformation has active/passive selection option
                            III.            Tabs in java transformation: General, ports, properties, Java Code
                            IV.            In Java code tab we can perform coding on various stages like
On Input Row
On End of Data
On Reciting transaction
Java expression
I have used coding in “On Input Row” option, Some examples are given below,
if(isNull("Port5") && Port4.equals("STHA"))
{Port3=-1;
Port2=0;
generateRow(); }
In the above example, Ports are input/output ports that are linked in and out of the java transformation. The above said “IF” condition will check whether the condition is satisfied and passes the input data to the next transformation, when the condition is satisfied. The input row will be passed out when the generateRow() function is used. The output row will have the port2 and port3 values changed and the other values will be passed through.

73.   What is a factless fact table?
Ans:  Factless fact tables are used for tracking a process or collecting stats. They are called so because, the fact table does not have aggregatable numeric values or information. They are mere key values with reference to the dimensions from which the stats can be collected.

74.   Difference between the DW and a Transactional system
Ans:
Data Warehouse
Operational/Transactional
Subject oriented
Application oriented
Summarized, refined
Detailed
Represents value over time
Accurate as of moment
Supports managerial needs
Supports day-to-day needs
Read only data
Can be updated
Batch processing
Real time transactions
Completely different life cycle
Software Development Life Cycle
Analysis driven
Transaction driven
Dimensional model
Entity Relational Diagram
Large amount of data
Small amount of data
Relaxed availability
High availability
Flexible structure 
Static structure

75.   What is Business Intelligence(BI)?
Ans: Business Intelligence is a set of business processes for collecting and analyzing business information.  BI functions include trend analysis, aggregation of data, drilling down to complex levels of detail, slice-dice, data rotation for comparative viewing.

76.   What are the different types of Analytical Processing Methodologies
Ans: 
OLAP(On-Line Analytical Processing):  Querying and presenting data from data warehouse exemplifying as multiple dimensions.
ROLAP(Relational OLAP):  Applications and set of user interfaces that retrieve data from RDBMS and present as dimensional model.
MOLAP(Multidimensional OLAP):  Applications, set of user interfaces and database technologies that have dimensional model.
DOLAP(Desktop OLAP):  Designed for low-end, single user.  Data is stored/downloaded on the desktop.
HOLAP(Hybrid OLAP) is a combination of all the above OLAP methodologies.


No comments:

Post a Comment