Lookup

Description

Lookup helps you to match values of specified fields in two data sources.

Why to use

To compare values in data sources.

When to use

To determine the presence of a particular field from one data source in another data source.

When not to use

When the data sources do not contain any common fields.

Prerequisites

No prerequisites; it can be applied to any data source.

Input

Two data sources containing one or more common fields.

Output

Based on the option selected for the If non matching records found? property, matching/non-matching records are returned, or task fails.

Statistical Methods used

Limitations

No limitations; it can be applied to any data.

 Lookup is located under Model Studio () in Data Preparation, in the left task pane. Use the drag-and-drop method to use the feature in the canvas. Click the feature to view and select different properties for analysis.

Refer to Properties of Lookup.



















The Lookup functionality in Rubiscape allows you to connect two data sources and compare them based on certain conditions.

Lookup checks all the values in the given data sources and matches the records from one data source to another for the selected fields.

Lookup is not a stand-alone activity; it is to be connected to a predecessor. The predecessor can be a dataset or output obtained from any data pre-processing node. In the Lookup properties, you can specify the dataset to which you want to compare the values of the predecessor of Lookup.

Properties of Lookup

The available properties of the Lookup are as shown in the figure given below.


The table given below describes the different fields present on the properties of the Lookup.

Field

Description

Remark

Task Name

It is the name of the task selected on the workbook canvas.

You can click the text field to edit or modify the name of the task as required.

Lookup Connection

It allows you to select the dataset to compare with the predecessor of Lookup.

  • All the readers available in the current workspace are listed here.
  • You can select only one dataset.

Cache Size

It is set to 0 by default.

Lookup Condition

It allows you to add the condition to compare the values.

Refer to Adding Lookup Condition.

Lookup Features to Return

It allows you to select the features that are the columns of the data set selected in Lookup Connection.

These features are added to the result of Lookup.

  • You can select more than one feature.
  • If Fail Task is not selected for If non matching records found? and no lookup features are selected here, then Lookup returns the input data frame as it is.
  • If you select any features to return, those are added to the Lookup’s return data frame.

If non matching records found?

It allows you to select the action to be taken if non-matching records are found.

The available options are –

  • Fail Task – If the two data sources contain non-matching records with respect to the given lookup condition, the task fails.
  • Do Not Pass – If the two data sources contain non-matching records, the task is executed, but the non-matching records are not included in the Result.
  • Capture Non Matching Records – If the two data sources contain non-matching records, the task is executed, and non-matching records are included in the Result.
  • If Capture Non Matching Records option is selected, a lookup flag is added with the data to identify matching and non-matching records. For matching records, value 1 is added. For non-matching records, the value 0 is added.

Advanced

Node Configuration

It allows you to select the instance of the AWS server to provide control on the execution of a task in a workbook or workflow.

For more details, refer to Worker Node Configuration

Adding Lookup Condition

To add a lookup condition, follow the steps given below.

  1. In the Properties pane, click Lookup Condition.
    Lookup Condition page is displayed.
  2. Click Add Condition.
  3. From the Lookup Feature drop-down, select a feature of the lookup dataset.
  4. From the Condition drop-down, select the comparison operator.

    (info)Note:

    The available operators are <, <=, <>, =, >, and >=.

  5. From the Input Feature drop-down, select a feature of the input data source (the predecessor node of Lookup).
  6. Additionally, from the Sort On drop-down, select the feature to sort the data. This step is optional.
  7. From the Order By drop-down, select the ordering option – Ascending or Descending. This step is optional. The default value is Ascending.
    The complete dataset selected for the lookup feature is ordered accordingly.

    (info)Note:

    To add more conditions, repeat steps 2 to 7. If more than one query is added, they are joined using the logical AND operator.

  8. Click Done.

The specified conditions are added. If you have more than one condition, they are joined using the logical AND operator.

Example of Lookup

Consider a dataset Customer Order with the Customer ID, Name, Address, and OrderDate columns. The input data is shown in the figure below.

 
Figure: Input Data Snippet

The Lookup dataset - Customer Billing with CustID and Billing Date columns is shown in the figure below.


To lookup for customer id in the Billing Dataset, the Lookup properties are selected as below.

Property

Value

Lookup Connection

Customer Billing (A dataset containing CustID and BillingDate as shown in Figure 5)

Lookup Condition

CustID = IP Customer ID

Lookup Features to Return

 CustID

If non matching records found?

Capture Non Matching Records

 The result of the Lookup for Capture Non Matching Records is displayed in the figure below.


Figure: Output of Lookup – Capture Non Matching Records

As seen in the above figure, the non-matching records are captured, and their Lookup_Flag value is 0.

 If the property If non matching records found? is set to Do Not Pass, the result is displayed as shown in the figure below.


Figure: Output of Lookup – Do Not Pass Non Matching Records

As seen in the above figure, the non-matching records (2, 5, and 7) are not passed to the output.

 If the property If non matching records found? is set to Fail Task, the Lookup task fails as the datasets contain non-matching records. The Lookup task fails, as shown in the below figure.


Figure: Failed Lookup Task

The Trace Log displays an error, as shown in the figure below.


Figure: Trace Log for Failed Lookup Task

Table of Contents