1. What is a mapping transformation?
Ans: A transformation is a
repository object that generates, modifies, or passes data. The Designer
provides a set of transformations that perform specific functions. For example,
an Aggregator transformation performs calculations on groups of data.
2. What is the difference between an expression transformation and an
aggregator transformation?
Ans: Expression
transformation permits you to perform
calculations row by row basis only. In Aggregator you can perform
calculations on groups.
3. What is a source qualifier? What is meant by Query Override?
Ans: Source Qualifier
represents the rows that the PowerCenter Server reads from a relational or flat
file source when it runs a session. When a relational or a flat file source definition
is added to a mapping, it is connected to a Source Qualifier transformation.
PowerCenter Server
generates a query for each Source Qualifier Transformation whenever it runs the
session. The default query is SELET statement containing all the source columns.
Source Qualifier has capability to override this default query by changing the
default settings of the transformation properties. The list of selected ports
or the order they appear in the default query should not be changed in
overridden query.
4. What is aggregator transformation?
Ans: The Aggregator
transformation allows performing aggregate calculations, such as averages and
sums. Unlike Expression Transformation, the Aggregator transformation can only
be used to perform calculations on groups. The Expression transformation
permits calculations on a row-by-row basis only. Aggregator Transformation
contains group by ports that indicate how to group the data. While grouping the
data, the aggregator transformation outputs the last row of each group unless
otherwise specified in the transformation properties.Various group by functions
available in Informatica are : AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN,
PERCENTILE, STDDEV, SUM, VARIANCE.
5. How Union Transformation is used?
Ans: The union transformation is a multiple input
group transformation that can be used to merge data from various sources (or
pipelines). This transformation works just like UNION ALL statement in SQL,
that is used to combine result set of two or more SELECT statements.
6. What is a look up transformation?
Ans: This transformation is used to lookup data in
a flat file or a relational table, view or synonym. It compares lookup
transformation ports (input ports) to the source column values based on the lookup
condition. Later returned values can be passed to other transformations.
7. Can a lookup be done on Flat Files?
Ans: Yes.
8. What are the contents in a session log in general?
Ans:
-
Allocation of system shared memory
- Execution of Pre-session commands/
Post-session commands
- Session Initialization
- Creation of SQL commands for reader/writer
threads
- Start/End timings for target loading
- Error encountered during session
- Load summary of Reader/Writer/ DTM
statistics
9. What are the different tracing levels available?
Ans:
Normal - Initialization
and status information, Errors encountered, Transformation errors, rows
skipped, summarize session details (Not at the level of individual rows)
Terse - Initialization
information as well as error messages and notification of rejected data
Verbose Initialization
- Addition to normal tracing, Names of
Index, Data files used and detailed transformation statistics.
Verbose Data- Addition to
Verbose Init, Each row that passes in to mapping detailed transformation
statistics.
10. How to implement normalization without normalizer transformation in
informatica?
Ans: For example, if we have three columns that
are to be normalized as shown below,
Customer Name
|
Email1
|
Email2
|
Email3
|
A
|
|||
B
|
|||
C
|
I.
Place three different
Expression transformations EXP1, EXP2, EXP3
II.
Take Email1 and Customer Name
to EXP1, Email2 and Customer Name to EXP2…
III.
Place an union transformation
and create three different groups
IV.
Link the above expression
outputs to the union transformation
11. Explain the error handling methodology in your project?
Ans: Error handling will
be implemented for a column in the following cases
I.
The column values need to be in
a range
II.
The column should only hold
specific characters like ‘Y’ and ‘N’ (Boolean)
III.
The column value should satisfy
a specific condition
For these kind of columns we create
separate ports with the suffix “ _er”. Here we check for the above conditions
and the column value obeys the condition then null else an errorcode with
description is mentioned. The condition looks like
IIF(ColumnValue<> condition, errorcode_errordescripton, null)
A port
Record_status will be maintained for checking the row is qualified or not.
IIF(ISNULL(port1_err) AND
ISNULL(port2_err) …, ‘VALID’, ‘NOT VALID’)
A router followed by the
expression will check for the ‘VALID’ status of the record and ‘NOT VALID’
records will be routed to the error table. We maintain an error table for every
source table.
12. Explain the dynamic lookup implementation
Ans: Enable the “Dynamic
Lookup cache” option in the properties tab of the lookup. As soon as you create
a dynamic lookup, Informatica adds one extra port called “NewLookupRow”
in the ports tab. Using the value of this port, the rows can be routed for
insert, update or to do nothing. You just need to use a Router or Filter
transformation followed by an Update Strategy.
I.
0 = Integration Service does
not update or insert the row in the cache.
II.
1 = Integration Service inserts
the row into the cache.
III.
2 = Integration Service updates
the row in the cache
When the same port is used
in router, it can be routed for Insert or update
13. How a mapping should be designed to extract the source data from the
last extraction time?
Ans:
I.
Create a workflow as follows
start-->session1-->assignmenttask
II.
Session1 will load the data
into your target, meaning this session will perform your actual data load.
III.
Assignmenttask: Create a workflow variable for
the following
$$SessionStartTime = $session1.StartTime
The start time will be stored in the cache and will be
used in the mapping for the next run
IV.
The session1’s mapping will
have a mapping variable $$PrevSessStart and the same will be used in
SQLOverride for filtering the old records. The query would be something like
Select COL1, COL2
From Source_Table
Where created time > $$PrevSessStart
V.
In the Pre-session variable
assignment tab for session 1, set as follows
$$PrevSessStart =$$SessionStartTime
14. What are the steps in SDLC?
I.
Requirements gathering
II.
Analysis
III.
Design
IV.
Coding
V.
Testing
VI.
Delivery and Maintenance
15. What is the difference in using a sub-query and a Join?
Ans: Using a JOIN risks duplicating
the information in the resultset for the parent table if there are more than
one child records related to it, because a JOIN returns the rows that match.
Which means if you want unique values from the parent table while using JOINs,
you need to look at using either DISTINCT or a GROUP BY clause. But none of
this is a concern if a subquery is used. On the other hand, Subquery
performance will be less when compared to a join.
16. Explain about the types of partitioning in informatica
Ans:
I.
Round-robin partitioning. The
Informatica Server distributes data evenly among all partitions. Use
round-robin partitioning where you want each partition to process approximately
the same number of rows.
II.
Hash partitioning. The
Informatica Server applies a hash function to a partition key to group data
among partitions. If you select hash auto-keys, the Informatica Server uses all
grouped or sorted ports as the partition key. If you select hash user keys, you
specify a number of ports to form the partition key. Use hash partitioning
where you want to ensure that the Informatica Server processes groups of rows
with the same partition key in the same partition.
III.
Key range partitioning. You
specify one or more ports to form a compound partition key. The Informatica
Server passes data to each partition depending on the ranges you specify for each
port. Use key range partitioning where the sources or targets in the pipeline
are partitioned by key range.
IV.
Pass-through partitioning. The
Informatica Server passes all rows at one partition point to the next partition
point without redistributing them. Choose pass-through partitioning where you
want to create an additional pipeline stage to improve performance, but do not
want to change the distribution of data across partitions
17. How would you configure a normalizer transformation?
Ans: For example, we have
a following source data to be normalized
CustomerName
|
Email1
|
Email2
|
Email3
|
A
|
|||
B
|
|||
C
|
I.
Create column names CustomerName
and Email in the Normalizer tab in the Normalizer transformation
II.
Set the “occurs” property as three for Email port
III.
Now, you can see the following
ports created in the Ports tab,
a) 4 input ports CustomerName_in, Email_in1, Email_in2, Email_in3
b) 2 output ports CustomerName, Email
c) 2 audit fields namely GK_Email, GCID_Email.
GK_ field will act as a sequence generator and GCID will
hold the occurrence number, that is 1, 2
or 3 in our case.
18. When the source flat file is in remote server, how will you read the
file?
Ans: Use FTP connection
object in workflow manager
19. What is DTM?
Ans: Data Transformation
Manager (DTM) is responsible for the data movement from source to target
specified in the mapping.
I.
DTM process allocates DTM
process memory
II.
DTM initializes the session and
fetches mapping
III.
DTM executes pre-session
commands and procedures
IV.
DTM creates reader, writer,
transformation threads for each pipeline
V.
DTM executes post-session
commands and procedures
VI.
DTM writes historical
incremental aggregation/lookup to repository
20. How will you approach a mapping in case of performance bottlenecks
Ans: Look for performance
bottlenecks in the following order:
I.
Target: Replace the target
table with a flat file in the mapping copy and check. If the run time is very
much reduced then the target table is the cause.
Solution:
a) Have the database administrator optimize database performance by
optimizing the query.
b) Increase the database network packet size.
c) Configure index and key constraints.
II.
Source: Add a Filter
transformation after each source qualifier. Set the filter condition to false
so that no data is processed passed the Filter transformation. If the time it
takes to run the new session differs drastically, you have a source bottleneck.
Also you can run a sample query in the DB and check.
Solution:
a) Have the database
administrator optimize database performance by optimizing the query.
b) Increase the database
network packet size.
c) Configure index and key
constraints.
d) If there is a long delay
between the two time measurements in a database query, you can use an optimizer
hint.
III.
Transformation: If you
determine that you do not have a source or target bottleneck, you may have a
mapping bottleneck.
Solution: Generally, you reduce the number of
transformations in the mapping and delete unnecessary links between
transformations to optimize the mapping.
21. How will you optimize an aggregator transformation
Ans:
I.
Grouping by simple columns -
use integer values
II.
Filtering Data before you
aggregate
III.
Limiting Port Connections -
Limit the number of connected input/output or output ports to reduce the amount
of data the Aggregator transformation stores in the data cache.
IV.
Using sorted Input
22. How will you optimize a Joiner transformation
Ans:
I.
Designate the master source as
the source with fewer duplicate key values
II.
Designate the master source as
the source with fewer rows
III.
Use the Source Qualifier
transformation to perform the join
IV.
Join sorted data when possible
- minimize disk input and output
23. How will you optimize a Lookup transformation
Ans:
I.
Be sure to delete the unwanted
columns from the lookup as they affect the lookup cache very much.
II.
If the Lookup transformation is
after the source qualifier and there is no active transformation in-between,
you can as well go for the SQL over ride of source qualifier
III.
The cache that you assigned for
the lookup is not sufficient to hold the data or index of the lookup. Whatever
data that doesn't fit into the cache is spilt into the cache files designated
in $PMCacheDir. When the PowerCenter doesn't find the data you are lookingup in
the cache, it swaps the data from the file to the cache and keeps doing this
until it finds the data. This is quite expensive for obvious reasons being an
I/O operation. Increase the cache so
that the whole data resides in the memory
IV.
Sequential and Concurrent
caches: The 8.x version of PowerCenter gives us this wonderful option to build
the caches of the lookups either concurrently or in a sequential manner
depending on the business rule. If no business rule dictates otherwise,
concurrent cache building is a very handy option.
24. What are the different tasks in Workflow Manager?
Ans: Session, Event-Raise,
Event-Wait, Timer, Decision, Assignment, Control, Email tasks
25. What is the difference between a Static and Dynamic cache?
Ans:
Static Cache: You cannot
insert or update the cache. The informatica server cache the lookup table data
in the start of the session. It returns a value from the cache when the
condition is true. When the condition is not true, informatica server returns
the default value for connected transformations and null for unconnected
transformations.
Dynamic Cache: You can insert rows into the cache as you pass to the target. The informatica server inserts rows into cache when the condition is false. This indicates that the row is not in the cache or target table. Can be used only in connected lookup transformations
Dynamic Cache: You can insert rows into the cache as you pass to the target. The informatica server inserts rows into cache when the condition is false. This indicates that the row is not in the cache or target table. Can be used only in connected lookup transformations
26. What is the difference between a persistent and non-persistent
cache?
Ans: Persistent Cache:
Lookup transformation can be configured to save the cached data to be saved in
a flat file, so that, it can be used for the next run. This can be named or
un-named. Named cache can be used in multiple mapping.
Non-Persistent cache:
Cached data for lookup operation is not stored in the disk space and the cache
should be loaded from the database every time.
27. What is the difference between a connected look up and unconnected
look up?
Connected Lookup
|
Unconnected Lookup
|
Connected lookup takes input values directly from other
transformations in the pipeline.
|
Unconnected lookup doesn’t take inputs directly from any other
transformation, but it can be used in any transformation (like expression)
and can be invoked as a function using :LKP expression. So, an unconnected
lookup can be called multiple times in a mapping.
|
Dynamic caching can be used
|
Dynamic caching is not applicable
|
Returns default value when there is no matching record
|
Returns NULL when there is no matching record
|
Returns as many columns in the lookup table
|
Only one return port
|
28. What is a worklet?
Ans: Worklets are objects
that represent a set of workflow tasks that allow to reuse a set of worflow
logic in several window.
Use of Worklet:- You can
bind many of the tasks in one place so that they can easily get identified and
also they can be of a specific purpose
29. What is a mapplet?
Ans: A mapplet is a
reusable object that is created using mapplet designer. The mapplet contains
set of transformations and it allows us to reuse that transformation logic in
multiple mappings.
30. What are the unsupported repository objects in a mapplet?
Ans:
COBOL source definitions
Joiner Transformations
Normalizer Transformations
Non-Reusable Sequence
generator transformations
Pre-session or
Post-session stored procedures
Target definitons
XML Source definitons
IBM MQ Source definitions
31. what is the difference between a mapplet and a reusable
transformation?
Ans: A Mapplet is a set of
transformations where as a reusable transformation is single transformation.
Mapplet is created in a Mapplet designer where as a reusable transformation is
created in Transformation developer or Mapping designer.
32. What is the use of update override query in the target instance?
I.
The update can be performed
base on a target column value.
II.
The update can be performed
without defining a key column in informatica and in the database
33. Consider a source flat file with comma delimited values. In the
target table, only one column gets loaded. What would be the issue?
Ans: The delimiter
character might not be set correctly in the source instance properties
34. When the ports are linked correctly from Source Qualifier to the
target, the target table columns are getting the values interchanged from one
another. What would be the issue?
Ans: The Source qualifier
might have a SQL override and that may have the column order different from the
port order in the ports tab.
35. What is a data warehouse?
Ans: It is a process for
building decision support systems and knowledge management enviroment that
supports both day-to-day tactical decision making and long-term business
strategies.
Bill Inmon’s
definition: "Subject-oriented,
integrated, time variant, non-volatile collection of data in support of
management's decision making process."
Ralph Kimball’s
definition: "A copy of transaction
data specifically structured for query and analysis."
36. List some of the transformations?
Aggregator - Active &
Connected
source qualifier - Active
& Connected
Filter - Active &
Connected
Expression - Passive &
Connected
joiner - Active &
Connected
lookup - passive &
connected/unconnected
HTTP - passive &
connected
normalizer - active &
connected
rank - active &
connected
router - active &
connected
sequence - passive &
connected
sorter - active &
connected
stored procedure - passive
& connected/unconnected
union - active &
connected
Java – active &
connected
SQL – active/passive &
connected
37. Explain a mapping plan for a scenario where source is a flat file and
the target should be loaded with the last 20 rows of the flat file
I.
From the source qualifier, link
the ports to an Expression transformation and create an additional I/O port
“Seq_no”
II.
Use a Sequence generator for
numbering the rows. Link the nextval port to the Seq_no
III.
From the source qualifier, link
the ports to an aggregator transformation and get the count of records with no
‘group by’ in a port “Count”
IV.
Use joiner to join the two
transformation outputs and “Full outer” as the type of join
V.
In the router transformation,
create a group with the condition
IIF(Count-Seq_no < 20, True,
False)
38. Give a mapping plan for loading one target table with the distinct
records and the other with the duplicate records. Assume source to be a flat
file
Ans: Assuming that we have
a key column “EMP_NAME”
I.
Sorter: Sort the records with
EMP_NAME
II.
Expression: Create a variable port “v_COUNT” with the
expression
IIF(EMP_NAME = v_EMP_NAME_old, v_COUNT+1,
1)
III.
The variable port v_EMP_NAME_old
should be created below the port v_COUNT
with the expression EMP_NAME
IV.
Create an output port o_COUNT
with the expression value as v_COUNT
V.
Router: Create two different
groups
Distinct: o_COUNT = 1
Duplicate: o_COUNT >1
39. What is the use of index, what is the impact in the performance when
DML operation is performed in the indexed table?
Ans:
Oracle uses indexes to avoid the need for large-table, full-table scans and
disk sorts, which are required when the SQL optimizer cannot find an efficient
way to service the SELECT query
The
performance will be less when the DML operations like INSERT, UPDATE, DELETE
are carried out.
40. Write a query to delete the duplicates from a table
Ans: Assuming the database
to be Oracle
Delete
from table_name
where rowid not
exists (select max(rowid)
from
table_name group by key_column)
41. What is the main use of update strategy?
Ans: Update strategy can be used for performing an
INSERT/UPDATE based on some logic. For example, IIF(CONDITION, DD_INSERT,
DD_UPDATE)
UPDATE and INSERT can be
achieved with a single target instance
42. Assume there are two target tables. First 5 records should be loaded
in the first target table, next 5 to the other table, and the next 5 again to
the first target. Explain a mapping plan
Ans: Sequence Generator: Check the cycle option
and the maximum value as 10
Expression: Link the
records from the SQ and create a port COUNT for Sequence generator value
Router: Create two groups
Target1 – COUNT <= 5
Target2 – COUNT > 5
43. How will you load an audit table
to maintain the session run properties?
1. Create a workflow as
follows
Start-->session1-->assignmenttask-->session2
2. Session1 will load the
data into your target, meaning, this session will perform your actual data
load.
3. Assignmenttask :
a. Create workflow
variables for the following
$$workflowname , $$sessionStartTime
, $$SessionEndTime , $$Successrows , $$FailedRows etc...
b. Use the Expression tab
in the Assignment Task and assign as follows:
$$workflowname =
$PMWorkflowName
$$sessionStartTime =
$session1.StartTime
$$SessionEndTime =
$session1.Endtime
4. Session 2
a. This should call a
mapping say m_sessionLog
b. This mapping
m_sessionLog should have mapping variables for the above defined workflow
variables such as $$wfname , $$stime , $$etime
etc..
c. This mapping
m_sessionLog should use a dummy source and inside the expression you must
assign the output ports to use the mapping variables. Meaning, output port such
as
workflowname=$$wfname
starttime=$$$stime
d. Create an audit table
in your database as Target and connect all the required output ports to the
target which is your audit table.
e. In the Pre-session
variable assignment tab for session 2 set as follows
mapping variable =
workflow variable, in your case
$$wfname=$$workflowname
$$stime=$$sessionStartTime
5. Run the workflow. Session1
will achieve your business requirement.
Once the session is
completed the assignment task will assign the session1 statistics to the
appropriate workflow variables
Session2 will assign the
workflow variables to the mapping variables and load the session stats to the
audit table
44. What are the transformations cannot be used upstream from a union
transformation?
Ans: Sequence generator
and Update strategy
45. What is a degenerated dimension?
Ans: A degenerate
dimension acts as a dimension key in the fact table, however does not join to a
corresponding dimension table because all its interesting attributes have
already been placed in other analytic dimensions. This key will be combined
with other foreign keys of the dimension tables to form the fact table’s
primary key.
46. Configuring Mapping variables
Ans: Go to the tab
Mapping-->parameter and variables-->$$variable_name
There are 3 aggregation
types, namely, COUNT, MIN, MAX
Once the variable is
created, it can be used in the expression transformation with the functions
like SETCOUNTVARIABLE($$variable_name), SETMAXVARIABLE( $$Variable_name, value
), SETVARIABLE ($$Variable_name, value)
SETCOUNTVARIABLE($$variable_name)
Counts the rows evaluated
by the function and increments the current value of a mapping variable based on
the count. Increases the current value by one for each row marked for
insertion. Decreases the current value by one for each row marked for deletion.
Keeps the current value the same for each row marked for update or reject.
SETMAXVARIABLE(
$$Variable_name, value )
Sets the current value of
a mapping variable to the higher among the two values mentioned above. The
function executes only if a row is marked as insert.
SETVARIABLE ($$Time,
SYSDATE)
Use with mapping variables
with Max/Min aggregation type. The SETVARIABLE function executes only if a row
is marked as insert or update.
47. OPB and REP tables in the metadata
Ans: OPB tables are the
basic repository tables which are created when we create a repository. These
tables are continuously updated whenever we make a change in our repository
like creating new objects, modifying existing objects, deleting objects or when
we run a workflow
REP are the views not the
tables.These are the views which are created on our OPB tables for querying
purposes.
48. Useful OPB tables and its contents
Ans:
OPB_SUBJECT - This table
stores the name of each PowerCenter repository folder.
OPB_MAPPING - This table
stores the name and ID of each mapping and its corresponding folder. Usage:
Join any of the repository tables that have MAPPING_ID as column with that of
MAPPING_ID in this table to know the mapping name.
OPB_TASK - This table
stores the name and ID of each task like session, workflow and its
corresponding folder. Usage: Join any of the repository tables that have
TASK_ID as column with that of TASK_ID/SESSION_ID in this table to know the
task name. Observe that the session and also workflow are stored as tasks in
the repository. TASK_TYPE for session is 68 and that of the workflow is 71.
OPB_SESSION - This table
stores the linkage between the session and the corresponding mapping. As
informed in the earlier paragraph, you can use the SESSION_ID in this table to
join with TASK_ID of OPB_TASK table.
OPB_TASK_ATTR - This is
the table that stores the attribute values (like Session log name etc) for
tasks. Usage: Use the ATTR_ID of this table to that of the ATTR_ID of OPB_ATTR
table to find what each attribute in this table means. You can know more about
OPB_ATTR table in the next paragraphs.
OPB_WIDGET - This table
stores the names and IDs of all the transformations with their folder details.
Usage: Use WIDGET_ID from this table to that of the WIDGET_ID of any of the
tables to know the transformation name and the folder details. Use this table in
conjunction with OPB_WIDGET_ATTR or OPB_WIDGET_EXPR to know more about each
transformation etc.
OPB_WIDGET_FIELD - This
table stores the names and IDs of all the transformation fields for each of the
transformations. Usage: Take the FIELD_ID from this table and match it against
the FIELD_ID of any of the tables like OPB_WIDGET_DEP and you can get the
corresponding information.
OPB_WIDGET_ATTR - This
table stores all the properties details about each of the transformations.
Usage: Use the ATTR_ID of this table to that of the ATTR_ID of OPB_ATTR table
to find what each attribute in this transformation means.
OPB_EXPRESSION - This
table stores the details of the expressions used anywhere in PowerCenter.
Usage: Use this table in conjunction with OPB_WIDGET/OPB_WIDGET_INST and
OPB_WIDGET_EXPR to get the expressions in the Expression transformation for a
particular, mapping or a set.
OPB_ATTR-This table has a
list of attributes and their default values if any. You can get the ATTR_ID
from this table and look it up against any of the tables where you can get the
attribute value. You should also make a note of the ATTR_TYPE, OBJECT_TYPE_ID
before you pick up the ATTR_ID. You can find the same ATTR_ID in the table, but
with different ATTR_TYPE or OBJECT_TYPE_ID.
OPB_COMPONENT- This table
stores the component details like Post-Session-Success-Email, commands in
Post-Session/pre-Session etc. Usage: Match the TASK_ID with that of the
SESSION_ID in OPB_SESSION table to get the SESSION_NAME and to get the shell
command or batch command that is there for the session, join this table with
OPB_TASK_VAL_LIST table on TASK_ID.
OPB_CFG_ATTR- This table
stores the attribute values for Session Object configuration like "Save
Session log by", Session log path etc.
49. How does Push-Down Optimization work?
Ans: One can push
transformation logic to the source or target database using pushdown
optimization. The Integration Service translates the transformation logic into
SQL queries and sends the SQL queries to the source or the target database
which executes the SQL queries to process the transformations. The amount of
transformation logic one can push to the database depends on the database,
transformation logic, mapping and session configuration. The Integration
Service analyzes the transformation logic it can push to the database and
executes the SQL statement generated against the source or target tables, and
it processes any transformation logic that it cannot push to the database.
50. Types of Push-down optimization
Ans: Using source-side
pushdown optimization:
The Integration Service
pushes as much transformation logic as possible to the source database. The
Integration Service analyzes the mapping from the source to the target or until
it reaches a downstream transformation it cannot push to the source database
and executes the corresponding SELECT statement.
Using target-side pushdown
optimization:
The Integration Service
pushes as much transformation logic as possible to the target database. The
Integration Service analyzes the mapping from the target to the source or until
it reaches an upstream transformation it cannot push to the target database. It
generates an INSERT, DELETE, or UPDATE statement based on the transformation
logic for each transformation it can push to the database and executes the DML.
Using full pushdown
optimization:
The Integration Service
pushes as much transformation logic as possible to both source and target
databases. If you configure a session for full pushdown optimization, and the
Integration Service cannot push all the transformation logic to the database,
it performs source-side or target-side pushdown optimization instead. Also the
source and target must be on the same database. The Integration Service
analyzes the mapping starting with the source and analyzes each transformation
in the pipeline until it analyzes the target. When it can push all
transformation logic to the database, it generates an INSERT SELECT statement
to run on the database. The statement incorporates transformation logic from
all the transformations in the mapping. If the Integration Service can push
only part of the transformation logic to the database, it does not fail the
session, it pushes as much transformation logic to the source and target
database as possible and then processes the remaining transformation logic.
Configuring Parameters for
Pushdown Optimization
Depending on the database
workload, we might want to use source-side, target-side, or full pushdown
optimization at different times and for that we can use the $$PushdownConfig
mapping parameter. The settings in the $$PushdownConfig parameter override the
pushdown optimization settings in the session properties. Create
$$PushdownConfig parameter in the Mapping Designer , in session property for
Pushdown Optimization attribute select $$PushdownConfig and define the
parameter in the parameter file.
The possible values may
be,
1. none i.e the
integration service itself processes all the transformations,
2. Source [Seq View],
3. Target [Seq View],
4. Full [Seq View]
Handling Error when
Pushdown Optimization is enabled
When the Integration
Service pushes transformation logic to the database, it cannot track errors
that occur in the database.
When the Integration
Service runs a session configured for full pushdown optimization and an error
occurs, the database handles the errors. When the database handles errors, the
Integration Service does not write reject rows to the reject file.
If we configure a session
for full pushdown optimization and the session fails, the Integration Service
cannot perform incremental recovery because the database processes the
transformations. Instead, the database rolls back the transactions. If the
database server fails, it rolls back transactions when it restarts. If the
Integration Service fails, the database server rolls back the transaction.
51. What are the prerequisites to be completed before using the
Unstructured data transformation?
Ans: The Unstructured Data
transformation is a transformation that processes unstructured and semi-structured
file formats, such as messaging formats, HTML pages and PDF documents.
Data Transformation has
the following components:
I.
Data Transformation Studio: A
visual editor to design and configure transformation projects.
II.
Data Transformation Service: A
Data Transformation project that is deployed to the Data Transformation
Repository and is ready to run.
III.
Data Transformation repository:
A directory that stores executable services that you create in Data
Transformation Studio. You can deploy projects to different repositories, such
as repositories for test and production services.
IV.
Data Transformation Engine: A
processor that runs the services that you deploy to the repository.
Data transformation should be installed after the power
center installation.
52. What is the syntax for running a workflow through PMCMD command
Ans: pmcmd startworkflow -s -d -u -p -f
workflow_name
-s server
-d domain
-u username
-p password
-f folder
53. Some error codes shown in the session log
Ans: BR - error related
reader process, including ERP, flat file, relation sources
CMN - error related
databases, memory allocation, lookup, joiner and internal errors
DBGR - error related to
debugger
SE, TM, WID - errors
related to transformations
PMF - error related
caching in aggregator, lookup, joiner, rank
RR - error related to
relational sources
EP - error related to
external procedure
LM - error related Load
Manager
REP - error related to
repository functions
VAR - error related to
mapping variable
WRT - error related to
writer
54. What is a code page?
Ans: A code page contains
encoding to specify characters in a set of one or more languages. The code page
is selected based on source of the data. For example if source contains
Japanese text then the code page should be selected to support Japanese text.
When a code page is
chosen, the program or application for which the code page is set, refers to a
specific set of data that describes the characters the application recognizes.
This influences the way that application stores, receives, and sends character
data.
55. How to execute PL/SQL script from Informatica mapping?
Ans: Stored Procedure (SP)
transformation can be used to execute PL/SQL Scripts. In SP Transformation
PL/SQL procedure name can be specified. Whenever the session is executed, the
session will call the pl/sql procedure.
56. What are the two types of errors that occur in a session run?
Ans: Fatal and Non-Fatal
errors
Non-Fatal Errors: It is an
error that does not force the session to stop on its first occurrence.
Establish the error threshold in the session property tab with the “stop on
errors” option. When you enable this option, the server counts Non-Fatal errors
that occur in the reader, writer and transformations.
I.
Reader errors can include
alignment errors while running a session in Unicode mode.
II.
Writer errors can include key
constraint violations, loading NULL into the NOT-NULL field and database
errors.
III.
Transformation errors can
include conversion errors and any condition set up as an ERROR, Such as NULL
Input.
Fatal Errors: This occurs when
the server cannot access the source, target or repository. This can include
loss of connection or target database errors, such as lack of database space to
load data.
If the session uses
normalizer (or) sequence generator transformations, the server cannot update
the sequence values in the repository, and a fatal error occurs.
57. What is a commit interval option in the session property?
Ans: A commit interval is
the interval at which the server commits data to relational targets during a
session.
58. What are the different types of commit interval?
Ans: Target based commit
and Source based commit
(a) Target based commit
- Server commits data based on the no of
target rows and the key constraints on the target table. The commit point also
depends on the buffer block size and the commit interval.
- During a session, the server continues to
fill the writer buffer, after it reaches the commit interval. When the buffer
block is full, the Informatica server issues a commit command. As a result, the
amount of data committed at the commit point generally exceeds the commit
interval.
- The server commits data to each target
based on primary/foreign key constraints.
(b) Source based commit
- Server commits data based on the number of
source rows. The commit point is the commit interval you configure in the
session properties.
- During a session, the server commits data
to the target based on the number of rows from an active source in a single
pipeline. The rows are referred to as source rows.
- A pipeline consists of a source qualifier
and all the transformations and targets that receive data from source
qualifier.
- Although the Filter, Router and Update
Strategy transformations are active transformations, the server does not use
them as active sources in a source based commit session.
- When a server runs a session, it identifies
the active source for each pipeline in the mapping. The server generates a
commit row from the active source at every commit interval.
- When each target in the pipeline receives
the commit rows the server performs the commit.
59. Explain about Reject Loading?
Ans:
I.
During a session, the server
creates a reject file for each target instance in the mapping. If the writer of
the target rejects data, the server writes the rejected row into the reject
file.
II.
You can correct those rejected
data and re-load them to relational targets, using the reject loading utility. (You
cannot load rejected data into a flat file target)
III.
Reading Rejected data:
Ex: 3,D,1,D,D,0,D,1094345609,D,0,0.00
There are two main things, Row indicator and Column
indicator
Row indicator: Row indicator tells the writer, what to
do with the row of wrong data.
Row indicator
|
Meaning
|
Rejected By
|
0
|
Insert
|
Writer or target
|
1
|
Update
|
Writer or target
|
2
|
Delete
|
Writer or target
|
3
|
Reject
|
Writer
|
Column indicator: Column indicator is followed by the
first column of data, and another column indicator. They appear after every
column of data and define the type of data preceding it
Column Indicator
|
Meaning
|
Writer Treats as
|
D
|
Valid Data
|
Good Data. The target accepts it unless a database error occurs,
such as finding a duplicate key
|
?
|
Overflow
|
Bad Data
|
N
|
Null
|
Bad Data
|
T
|
Truncated
|
Bad Data
|
60. Tasks carried out during a session run
Ans: 1. LM locks the session and read session
properties
2. LM reads parameter file
3. LM expands server/session variables and
parameters
4. LM verifies permission and privileges
5. LM validates source and target code page
6. LM creates session log file
7. LM creates DTM process
8. DTM process allocates DTM process memory
9. DTM initializes the session and fetches
mapping
10. DTM executes pre-session commands and
procedures
11. DTM creates reader, writer, transformation
threads for each pipeline
12. DTM executes post-session commands and
procedures
13. DTM writes historical incremental
aggregation/lookup to repository
14. LM sends post-session emails
61. Explain about the various threads created during a session run
Ans:
Various threads
|
functions
|
Master thread
|
Handles stop and abort requests from load manager
|
Mapping thread
|
One thread for each session. Fetches session and mapping
information. Compiles mapping. Cleans up after execution
|
Reader thread
|
One thread for each partition. Relational sources use relational
threads and Flat files use file threads
|
Writer thread
|
One thread for each partition writes to target
|
Transformation thread
|
One or more transformation for each partition
|
62. Informatica Pipeline Partitioning Explained
Ans: Each mapping contains
one or more pipelines. A pipeline consists of a source qualifier, all the
transformations and the target. When the Integration Service runs the session,
it can achieve higher performance by partitioning the pipeline and performing
the extract, transformation, and load for each partition in parallel.
A partition is a pipeline
stage that executes in a single reader, transformation, or writer thread. The
number of partitions in any pipeline stage equals the number of threads in the stage.
By default, the Integration Service creates one partition in every pipeline
stage. If we have the Informatica Partitioning option, we can configure
multiple partitions for a single pipeline stage.
Partition Points: In the session properties we can add or edit partition points. When
we change partition points we can define the partition type and add or delete
partitions(number of partitions).
Number of partitions: A partition is a pipeline
stage that executes in a single thread. If you use PowerCenter, you can set the
number of partitions at any partition point
When you add partitions,
you increase the number processing threads, which can improve session
performance.
Partition types: The Informatica Server specifies a default partition type at each
partition point. If you use PowerCenter, you can change the partition type. The
partition type controls how the Informatica Server redistributes data among
partitions at partition points.
The Workflow Manager
allows you to specify the following partition types
• Round-robin
partitioning. The Informatica Server distributes data evenly among all
partitions. Use round-robin partitioning where you want each partition to
process approximately the same number of rows.
• Hash partitioning. The
Informatica Server applies a hash function to a partition key to group data
among partitions. If you select hash auto-keys, the Informatica Server uses all
grouped or sorted ports as the partition key. If you select hash user keys, you
specify a number of ports to form the partition key. Use hash partitioning
where you want to ensure that the Informatica Server processes groups of rows
with the same partition key in the same partition.
• Key range partitioning.
You specify one or more ports to form a compound partition key. The Informatica
Server passes data to each partition depending on the ranges you specify for
each port. Use key range partitioning where the sources or targets in the
pipeline are partitioned by key range.
• Pass-through
partitioning. The Informatica Server passes all rows at one partition point to
the next partition point without redistributing them. Choose pass-through
partitioning where you want to create an additional pipeline stage to improve
performance, but do not want to change the distribution of data across
partitions.
63. How to setup Event Raise and Event Wait tasks?
Ans:
I.
Create an event - right click the workflow and edit-> events
tab -> create an event
II.
Place Eventraise task in the
workflow space and give the created event name in the "user defined
event" (properties tab)
III.
Place eventwait task in the
workflow space-> there are 2 options in events tab,
Pre-defined: This
is a file watch
User-defined: This is used for the event created
on the workflow properties
When the Eventraise is executed,
it creates the event, it triggers the event wait to continue
64. Time task:
Ans: You can specify the period of time to wait
before the Integration Service runs the next task in the workflow with the
Timer task.
Two options:
Absolute time - give the
exact time when to start the next task or refer a date-time variable
Relative time - give the
hours, minute, seconds - from the start time of this task
- From the start time of the parent workflow/worklet
- From the start time of the top-level workflow
65. Decision task:
Ans: Use the Decision task
instead of multiple link conditions in a workflow. Instead of specifying multiple
link conditions, use the predefined condition variable in a Decision task to
simplify link conditions.
66. Assignment task:
Ans: You can assign a
value to a user-defined workflow variable with the Assignment task.
67. CONTROL TASK
Ans: We can use the Control
task to stop, abort, or fail the top-level workflow or the parent workflow
based on an input link condition.
Control
Option
|
Description
|
| Fail Me |
Fails the control task. |
| Fail Parent |
Marks the status of the WF or worklet that contains the Control task as failed. |
| Stop Parent |
Stops the WF or worklet that contains the Control task. |
| Abort Parent |
Aborts the WF or worklet that contains the Control task. |
Fail Top-Level WF
|
Fails the workflow that is
running.
|
Stop Top-Level WF
|
Stops the workflow that is
running.
|
Abort Top-Level WF
|
Aborts the workflow that is
running.
|
68. Explain about the FIRST(LAST) aggregate function
Ans: Returns the first
value found within a port or group. Optionally, you can apply a filter to limit
the rows the Integration Service reads. You can nest only one other aggregate
function within FIRST.
SYNTAX: FIRST( value [,
filter_condition ] )
EXAMPLE: FIRST( ITEM_NAME,
ITEM_PRICE > 10 )
69. How the GET_DATE_PART() expression function works?
Ans: GET_DATE_PART(
DATE_SHIPPED, 'HH12' )
GET_DATE_PART(
DATE_SHIPPED, 'DD' )
GET_DATE_PART(
DATE_SHIPPED, 'MON' )
70. Tell some of the expression function you have worked on?
Ans:
I.
GREATEST( QUANTITY1, QUANTITY2,
QUANTITY3 )
II.
INDEXOF( ITEM_NAME, ‘diving
hood’, ‘flashlight’, ‘safety knife’)
1 if the input value matches string1, 2 if the input
value matches string2, and so on.
0 if the input value is not found.
NULL if the input is a null value.
III.
INITCAP( string )
IV.
INSTR( COMPANY, 'a', 1, 2 )
V.
IS_DATE() returns 0 or 1
VI.
Is_Number()
VII.
Is_spaces() Returns whether a
string value consists entirely of spaces.
VIII.
LAST_DAY() Returns the date of
the last day of the month for each date in a port.
IX.
SUBSTR()
X.
REPLACESTR ( CaseFlag,
InputPort, OldString1, [OldString2, ... OldStringN,] NewString )
XI.
REVERSE( string )
XII.
RPAD( ITEM_NAME, 16, '.')
71. Can you update a table based on a condition, give an example
Ans:
Update employee set
employeename = case when empid = 1 then 'RK'
when empid = 2 then 'Valluri'
else
null
end
72. Explain Java transformation
I.
Ans: The Java transformation
provides a simple native programming interface to define transformation
functionality with the Java programming language.
II.
Java transformation has active/passive
selection option
III.
Tabs in java transformation:
General, ports, properties, Java Code
IV.
In Java code tab we can perform
coding on various stages like
On Input Row
On End of Data
On Reciting transaction
Java expression
I have used coding in “On
Input Row” option, Some examples are given below,
if(isNull("Port5") &&
Port4.equals("STHA"))
{Port3=-1;
Port2=0;
generateRow(); }
In the above example,
Ports are input/output ports that are linked in and out of the java
transformation. The above said “IF” condition will check whether the condition
is satisfied and passes the input data to the next transformation, when the
condition is satisfied. The input row will be passed out when the generateRow()
function is used. The output row will have the port2 and port3 values changed
and the other values will be passed through.
73. What is a factless fact table?
Ans: Factless fact tables are used for tracking a
process or collecting stats. They are called so because, the fact table does
not have aggregatable numeric values or information. They are mere key values
with reference to the dimensions from which the stats can be collected.
74. Difference between the DW and a Transactional system
Ans:
Data Warehouse
|
Operational/Transactional
|
Subject oriented
|
Application oriented
|
Summarized, refined
|
Detailed
|
Represents value over time
|
Accurate as of moment
|
Supports managerial needs
|
Supports day-to-day needs
|
Read only data
|
Can be updated
|
Batch processing
|
Real time transactions
|
Completely different life cycle
|
Software Development Life Cycle
|
Analysis driven
|
Transaction driven
|
Dimensional model
|
Entity Relational Diagram
|
Large amount of data
|
Small amount of data
|
Relaxed availability
|
High availability
|
Flexible structure
|
Static structure
|
75. What is Business Intelligence(BI)?
Ans: Business Intelligence
is a set of business processes for collecting and analyzing business
information. BI functions include trend
analysis, aggregation of data, drilling down to complex levels of detail,
slice-dice, data rotation for comparative viewing.
76. What are the different types of Analytical Processing Methodologies
Ans:
OLAP(On-Line Analytical
Processing): Querying and presenting data
from data warehouse exemplifying as multiple dimensions.
ROLAP(Relational
OLAP): Applications and set of user
interfaces that retrieve data from RDBMS and present as dimensional model.
MOLAP(Multidimensional
OLAP): Applications, set of user
interfaces and database technologies that have dimensional model.
DOLAP(Desktop OLAP): Designed for low-end, single user. Data is stored/downloaded on the desktop.
HOLAP(Hybrid OLAP) is a
combination of all the above OLAP methodologies.
No comments:
Post a Comment