Expression

Description

Expression involves creating additional features in a dataset by combining existing features in different ways using various expressions and functions. 

Why to use

For Data Preparation

When to use

  • To create additional features
  • To extract some hidden features from the current feature
  • To compute insights from the features 

When not to use

Prerequisites

No prerequisites; it can be applied to any data source.

Input

Any dataset

Output

Additional features created using existing features.

Statistical Methods used

Refer to Operators in Expression Builder.

Limitations

  • It can create only additional features.
  • It cannot create additional data points.

 

Expression is located under Model Studio ( ) in Data Preparation, in the Task Pane on the left. Use the drag-and-drop method to use the feature in the canvas. Click the feature to view and select different properties for analysis.

Refer to Properties of Expression.

















With the Expression function, you can generate new features by modifying a feature or combining two or more existing features from the dataset using arithmetic, logic, text, date, and other such operators.

These various types of operators can be used as standalone operators or combined to create new features. The Expression function provides the facility to validate the expression before using it on the data. New features are created only if the combination of operators produces a valid result.

Properties of Expression

The available properties of the Expression are as shown in the figure given below.

The table given below describes the different fields present on the Properties pane of the Expression.

Field

Description

Remark

Task Name

It is the name of the task selected on the workbook canvas.

You can click the text field to edit or modify the name of the task as required.

Features

It lists the newly created features.

Add Expression Feature

It allows you to create a new feature.

Advanced

Node Configuration

It allows you to select the instance of the AWS server to provide control on the execution of a task in a workbook or workflow.

For more details, refer to Worker Node Configuration

 

Creating a New Feature

The Expression function helps you to create new features in your dataset. To create a new feature, follow the steps given below.

  1. Create your algorithm flow with an Expression.
  2. Select the Expression.
    The Properties pane is displayed.
    On the Properties pane, click Add Expression Feature.

         

The Feature Definition page is displayed as shown in the figure below.

The table given below describes the different fields present on the Feature Definition page.

Field

Description

Remark

Feature name

It allows you to enter the name of the newly created feature.

Variable Type

It allows you to select the variable type of the feature.

The available options are,

  • Categorical
  • Numerical
  • Textual
  • Interval
  • Geographical

Data type

It displays the data type of the feature based on the selected variable type.


Hyperlink to access the online help for Expression.

Switch to Code Editor

It allows you to switch between the Code Editor and Expression Builder.

Expression Builder

It allows you to build the expression by adding the Operators, Features, and Constants on the canvas.

For building an expression, refer to Building an Expression.

Navigation Icon – Helps to change the relative position of the blocks on Expression Builder canvas.

Zoom in – It helps you to zoom in on the Expression Builder canvas.

Zoom out – It helps you to zoom out the Expression Builder canvas.

Delete – It helps you to delete a block or an expression from the Expression Builder canvas.

It deletes the built expressions and closes the Feature Definition page.

It validates the expression based on the selected features, operators, and the variable type of the newly created feature.

It adds a valid newly created expression to the Features list in the Properties pane of the Expression node.

  1. On the Feature Definition page, Enter the Feature name.
  2. Select the Variable Type from the drop-down.
  3. Build your expression. Refer to Building an Expression.

    (info)Note:

    Alternatively, you can use the Code Editor to build your expression. To build your expression using Code Editor, click Switch to Code Editor in the Expression Builder canvas top-right corner and type your code in Python language.

  4. Once the expression is ready, click Validate. This step is optional; however, it is highly recommended to validate the expression before running it. This helps to eliminate errors at an early stage.
    If the expression is valid, Expression is valid message is displayed. If it is invalid, build the expression again. When the valid expression is ready, click Add.

         

          The newly created Feature is added in the Features list in the Properties pane, as shown in the figure below.

         

          5. Run the Expression.
             After successful execution, the feature is added to the output of the Expression node. You can explore the Expression node to view the newly added feature.

Building an Expression

  1. From the left pane of Expression Builder, select the required Operator Category.
    The operators available under the selected category are displayed. Select the required Operator



    The Operator is added to the canvas.
    For details about operators in Expression Builder, refer to Operators in Expression Builder.

  2. Select Features from the left pane of the Expression Builder.
    The feature block is added to the canvas.
    Click the feature block and select the required feature from the drop-down.













          3. Drag and drop the feature block as the required operand in the Operator added in step 2.

              Repeat steps 3 to 5 for all the features required for the selected Operator.

(info)

Note:

To add a constant to the canvas, click Constant in step 3 and add the required constant block.


              A sample expression is shown in the figure below.

             

Operators in Expression Builder

The Expression Builder on the Feature Definition page has the elements listed below -

  • Operator categories
  • Constant types
  • Features available in the predecessor node

The different types of operators are explained in the sections below.

Comparison Operators

The different types of Comparison operators are explained in the table below.

Operator

Code Editor

Syntax/Description

Example

[Element1] == [Element2] 

Comparison -

Element1 == Element2
It returns True if the values of Element1 and Element2 are equal.

[Quantity] == 100
It returns True when the value of the Quantity feature is equal to 100; otherwise, it returns False.

[Element1] != [Element2] 

Not Equal To -

[Element1] != Element2
It returns True if the values of Element1 and Element2 are not equal.

[Quantity] != 0

It returns True when the value of the Quantity feature is not 0; otherwise, it returns False.

[Element1] < [Element2] 

Less Than -

[Element1] < Element2
It returns True if the value of Element1 is less than Element2.

[Quantity] < 50

It returns True when the value of the Quantity feature is less than 50; otherwise, it returns False.

[Element1] <= [Element2] 

Less Than or Equal To -

[Element1] <= Element2

It returns True if the value of Element1 is less than or equal to Element2.

[Quantity] <= 50

It returns True when the value of the Quantity feature is less than or equal to 50; otherwise, it returns False.

[Element1] > [Element2] 

Greater Than -

[Element1] > Element2

It returns true if the value of Feature is greater than Element2.

[Quantity] > 35

It returns true when the value of the Quantity feature is greater than 35; otherwise, it returns False.

[Element1] >= [Element2] 

Greater Than or Equal To -

[Element1] >= Element2

It returns true if the value of Element1 is greater than or equal to Element2.

[Quantity] >= 16.35

It returns true when the value of the Quantity feature is greater than or equal to 16.35; otherwise, it returns False.

 

(info) Note:

The Element1 and Element2 mentioned in the above table can be Feature, Constant, or another Expression that evaluates to a Numerical value.

Constant Types

The different types of Constants are explained in the table below.

Operator

Code Editor

Syntax/Description

Example/Remark

True 

False

Boolean – It is used to add a Boolean value in the expression.

Available options are True and False.

'Element1'

String – It is used to add a string to the expression.

Example : 'Age' or '12"

[[Element1], [Element2, [Element3]]

List – It is used to create a list of items.

You can connect other features/expressions to create a list.


or

[[Age], [Gender, [Education]]


In this example, a list is created using three features – Age, Gender, and Education.


123.01

Float – It is used to add a float value to the expression.

123.01

123

Integer – It is used to add an integer value to the expression.

123

pd.datetime(yyyy,mm,dd)

Date – It is used to add a date to the expression.

The date format is yyyy-mm-dd.

Example: 

pd.datetime(2023,12,31)


'Element1' : 'Element2',

Key Value Pair -

Example: 

'Education' : 'Engineer'

Conversion Operators

The different types of conversion operators are explained in the table below.

Operator

Code Editor

Syntax/Description

Example/Remark

bool(Element1)

To BOOLEAN(Element)
It converts the Element to a Boolean value.

Example:

bool([Owner])

float(Element1)

To NUMERIC(Element)
It converts the Element to a numeric value.

Example:

float(123)

int(Element1)

To INT (Element)
It converts the Element to an integer value.

int(12.5)

str(Element1)

To STR(Element)

It converts the Element to a string value.

str(Rubiscape)

datetime.strptime(Element1, "%d-%m-%Y")

To Date(Element)
It converts the Element to a date value.

Example:

datetime.strptime([Owner], "2-3-1998")


You can provide a Python date format for conversion.


(info)

Note:

The Element1 mentioned in the above table can be either a Feature, a Constant, or an Expression.

Logical Operators

The different types of Logical operators are explained in the table below.

Operator

Code Editor

Syntax/Description

Example/Remark

not([Element1])

NOT(Feature)
It returns Logical NOT of the Feature.

Any value which is not 0 or False is considered as True.

Example:

not([Color])

[Element1] and [Element2]

Element1 AND Element2

It returns True if Element1 and Element2 both evaluate to True.

Example:

[Color] and [Neck Style]

[Element1] or [Element2]

Element1 OR Element2

It returns True if either Element1 or Element2 evaluates to True.

Example:

[Color] or [Neck Style]

xor(Element1,Element2)

Element1 XOR Element2

It returns True if the values of Element1 and Element2 are not the same.

[Element1] == [Element2]

Element1 Equals Element2

It returns True if Element1 is equal to Element2.

Example:

[Price] == '1000'


(info)

Note:

The Element1 and Element2 mentioned in the above table can be Feature, Constant, or another Expression.

 Mathematical Operators

The different types of Mathematical operators are explained in the table below.

Operator

Code Editor

Syntax/Description

Example

[Element1] + [Element2]

Addition -
[Element1] + [Element2]

It is used to add,

  • Values from corresponding rows of two different features
  • A constant value to a feature
  • [Sales] + [Quantity]
    It adds values from corresponding rows of Sales and Quantity features.
  • [Sales] + 100
    It adds 100 to each row of the Sales feature.

[Element1] - [Element2]

Subtraction -

[Element1] – [Element2]

It is used to subtract,

  • Values in one feature from the corresponding values in another feature
  • A constant value from a feature
  • [Sales] – [Quantity]
    It subtracts the corresponding values in the Quantity feature from those in the Sales feature.
  • [Sales] – 100
    It subtracts 100 from every value in the Sales feature.

[Element1] * [Element2]

Multiplication

[Element1] * [Element2]

It is used to multiply

  • Values in one feature to the corresponding values in another feature
  •  A constant value to a feature
  • [Sales] * [Quantity]
    It multiplies the corresponding values in the Quantity feature to those in the Sales feature.
  • [Sales] * 100
    It multiplies 100 to every value in the Sales feature.

[Element1] / [Element2]

Division

[Element1] / [Element2]
It is used to divide

  • Values in one feature by the corresponding values in another feature
  • Values in a feature by a constant value
  • [Sales] / [Quantity]
    It divides the corresponding values in the Sales feature by those in the Quantity feature.
  • [Sales] / 100
    It divides each value in the Sales feature by 100.

math.sin(Element1)

Sin(Element)

It returns the sine value of the Element.

Example:

math.sin(20)

math.cos(Element1)

Cos(Element)

It returns the cosine value of the Element.

Example:

math.cos(20)

math.tan(Element1)

Tan(Element)

It returns the tangent of the Element.

Example:

math.tan(20)

math.floor(Element1)

Floor(Element)

It returns the floor value of the Element.

The floor value of a number is the closest integer less than or equal to the given number.

Example:

math.floor(20)

math.ceil(Element1)

CEIL(Element)

It returns the ceiling value of the Element.

The ceiling value of a number is the closest integer greater than or equal to the given number.

Example:

math.ceil(100)

abs(Element1)

ABS(Element)

It returns the absolute value of the Element.

Example:

abs([Qty])

[Element1]%[Element2]

Mod(Element1)( Element2)

It returns the remainder of the division Element1/Element2

Example:

[Cost]%[Qty]

math.pow([Element1],[Element2])

Power(Element1)( Element2)

It returns Element1 to the power of Element2

Example:

math.pow([Price],[Qty])

math.pi

It returns the value of the mathematical constant pi.

The value is 3.141593.

Example:

[Price] * math.pi

log([Element1])

NATURAL LOG([Element])

It is used to calculate the natural logarithm (logarithm to the base 10) of an Element.

NATURAL LOG ([Quantity])

If the value of Quantity is 5, this expression evaluates as log10(5).


Example:

log([Price])

exp(Element1)

EXPONENTIAL([Element])

It is used to raise the value of e (2.7183) to the power of Element.

EXPONENTIAL ([Quantity]) 

If the value of Quantity is 5, this expression evaluates as exp(5)

and e5  =  (2.7183)5 is calculated.

1/(Element1)

RECIPROCAL([Element])

It is used to calculate the reciprocal (power of –1) of Element.

RECIPROCAL (Quantity)

if the value of Quantity is 5, this expression evaluates to 1/5.

1/(Element1**2)

RECIPROCAL SQUARE([Element])

It is used to calculate the reciprocal of the square of Feature.

Example:

1/5**2

if the value of Quantity is 5, this expression evaluates to 1/(52).

1/(Element1**0.5)

RECIPROCAL SQUARE ROOT

It is used to calculate the reciprocal of the square root of Feature.

1/(5**0.5)


If the value of Quantity is 5, this expression evaluates to

(1/√5).

(Element1**2)

([Element])**2

It is used to calculate the square of the Element.

If the value of Quantity is 5, this expression evaluates to

(5)2.

sqrt(Element1)

sqrt([Element])

It is used to calculate the square root of the Element.

Example:

sqrt(5)

If the value of Quantity is 5, this expression evaluates to √5.


(info)

Note:

The Element1 and Element2 mentioned in the above table can be Feature, Constant, or another Expression that evaluates to a Numerical value.

Aggregation Operators

OperatorCode EditorSyntax/DescriptionExample/Remark

Agg([Element1], aggregationMethod = "count")

It used to get the aggregated value of the selected feature.

Available aggregation methos are 

  • count
  • max
  • mean
  • min
  • nunique
  • sum

Example:

Agg([Price], aggregationMethod = "max")

groupByExclude([Element1], measureName = [Element2], aggregationMethod = "count")

It exclude the selected dimension(s) in the view.

Available aggregation methos are 

  • count
  • max
  • mean
  • min
  • nunique
  • sum

Example:

groupByExclude([Size], measureName = [Qty], aggregationMethod = "min")

groupByInclude([Element1], measureName = [Element2], aggregationMethod = "count")

It includes the selected dimension(s) in the view.

Available aggregation methos are 

  • count
  • max
  • mean
  • min
  • nunique
  • sum

Example:

groupByInclude([Size], measureName = [Price], aggregationMethod = "max")

groupByFixed([Element1], measureName = [Element2], aggregationMethod = "count")

It fixes the selected dimension in the view.

Available aggregation methos are 

  • count
  • max
  • mean
  • min
  • nunique
  • sum

Example:

groupByFixed([Size], measureName = [Cost], aggregationMethod = "nunique")

Date Operators

The different types of Date operators are explained in the table below.

Operator

Code Editor

Syntax/Description

Example/Remark

pd.datetime(Element1,Element2,Element3)

Date(Element1, Element2, Element3)

It creates a date object with the given values.

Element1– Year

Element2 – Month

Element3 – Date

Example:

pd.datetime(2023,12,31)

Element1+ dateutil.relativedelta.relativedelta(years=Element2)

DateAdd(Element1, Time Component, Element2)

It creates a date feature by adding Element2 to the Time Component of Element1.

The newly created feature is of date type.

  • The available Time Components are
    • Year
    • Month
    • Week
    • Day
    • Hour
    • Minute
    • Second
  • DateAdd(Order_Date, Month, 3)
    This adds 3 to the month of Order_Date.

Example:

[Order Date]+ dateutil.relativedelta.relativedelta(weeks=25)

relativedelta([Element1], [Element2]).Value1

DateDiff(Element1, Time Component, Element2)

It creates a date feature by subtracting the Element2 to the Time Component of the Element1.

The newly created feature is of date type.

  • The available Time Components are
    • Year
    • Month
    • Week
    • Day
    • Hour
    • Minute
    • Second
  • DateDiff(Order_Date, Day, 5)
    This subtracts 5 from the Days of Order_Date.


Example:

relativedelta([Order Date], [Ship Date]).years


[Element1].replace(month=1, day=1, hour=0, minute=0, second=0)

DateTrunc(Element, Time Component)

It creates a date feature by extracting the Time Component from the Element. The rest of the components are given default values.

The newly created feature is of date type.

  • The available Time Components are
    • Year
    • Month
    • Week
    • Day
    • Hour
    • Minute
    • Second
  • DateTrunc (2029-06-03 00:00:00, Year)
    It extracts the Year component (2029) from the Order_Date, and the newly created feature is 2029-01-01 00:00:00.

Example:

[Order Date].replace(day=1, hour=0, minute=0, second=0)

Element1.day

Day(Element)

It returns the Day component of the Element.

The newly created feature is of integer type.

Example:

Day(2021-08-29 12:30:45)

It returns value 29.


hour(Element1)

Hour(Element)

It returns the Hour component of the Element.

The newly created feature is of integer type.

Example:

Hour(2021-08-29 12:30:45)

It returns value 12.


minute(Element1)

Minute(Element)

It returns the Day component of the Element.

The newly created feature is of integer type.

Example:

Minute(2021-08-29 12:30:45)

It returns value 30.


date.today()

Now

It returns the current date and time in the UTC timezone.

The newly created feature is of date type.

month(Element1)

Month(Element)

It returns the Month component of the Element.

The newly created feature is of integer type.

Example:

Month(2021-08-29 12:30:45)

It returns value 08.


quarter([Element1])

Quarter(Element)

It returns the Quarter of the year of the date in the Element.

The newly created feature is of integer type.

Example:

Quarter(2021-08-29 12:30:45)

It returns the value 3 since the date is from the third quarter of the year.


second([Element1])

Second(Element)

It returns the Second component of the Element.

The newly created feature is of integer type.

Example:

Second(2021-08-29 12:30:45)

It returns value 45.


week([Element1])

Week(Element)

It returns the week of the year of the date in the Element.

The newly created feature is of integer type.

Example:

Week(2021-08-29 12:30:45)

It returns the value 34 since the date is from the 34th week of the year.



weekday([Element1])

Weekday(Element)

It returns the numerical form of the day of the week of the Element date.

It returns the name of the day (Monday, Tuesday, and so on) from the Element. It takes a date column as input.

Example:

weekday([Order Date])

[Element1].year

Year(Element)

It returns the year of the Element date.

The newly created feature is of integer type.

Example:

Year(2021-08-29 12:30:45)

It returns value 2021.


 

(info)Note:

The Element1, Element2, Element3, and Element mentioned in the above table can be either Feature, Constant, or another Expression that evaluates to an Interval value.

 Text Operators

The different types of Text operators are explained in the table below.

Operator

Code Editor

Syntax/Description

Example/Remark

[Element1].startswith('Value1')

Beginswith(Element, Value)

It returns True if the string value of the Element begins with a character in Value.

  • This operation is case-sensitive.
  • Beginswith(Europe, “E”)
    Returns true.

[Element1].endswith('Value1')

Endswith(Element, Value)

It returns true if the string value of Element ends with character in Value.

  • This operation is case-sensitive.
  • Endswith(Europe, “A”)
    Returns false.

[Element1]==[Value1]

Compare(Element, Value)

It returns true if the string in Element is the same as the string in Value.

This operation is case-sensitive.

[Element1]+[Element2]+[Element3]

Concatenate(Element, Value1, Value2)

It concatenates Element with Value1 and Value2

Concat(“Monthly”, “ “, “Income)

It returns the string as Monthly Income.

contains([ELement1],'Value1')

Contains(Element, Value)

It returns True if the Element string contains a string in Value.

  • Contains(“Monthly”, “on”)
    It returns True.
  • Contains(“Monthly”, “ok”)
    It returns False.

Example:

contains([Country Name],'India')

'Element1'.find('Value1',Value2,Value3)

Find(Element, substring to find, start index, end index)

It finds the position of a substring in a text. If the substring is found at the location specified by the start index and end index, it returns the position of the text; otherwise, it returns -1.

  • Find(“India”, ”In”, 0, 2)
    It returns 0.
  • Find(“India”, “In”, 2, 4)
    It returns -1.

Example:

'India'.find('in',0,2)

len([Element1])

Length(Element)

It returns the length of the string in Feature.

Length(“Monthly Income”)

It returns the length of the string “Monthly Income” - 14.

Example:

len([City])


[Element1].replace('Value1','Value2',Value3)

Replace(Element, string to replace, string to replace with, count)

It replaces a string with a user-specified string.

Replace(City, Pune, PUN, 1)

It replaces the rows in the City feature containing value Pune with value PUN.

Example:

[City].replace('Pune','PUN',1)

[Element1].lower()

Lower(Element)

It converts the Element to lowercase.

Lower(“Monthly”)

It returns the value “monthly”.

Example:

[Region].lower()

[Element1].upper()

Upper(Element)

It converts the Element to uppercase.

Upper(“Monthly”)

It returns the value “MONTHLY”.

Example:

[Region].upper()

left([Element1],Value1)

Left(Element, Number)

It returns the first number of characters denoted by Number from Element

Left(“Algorithm”, 4)

It returns “Algo”.

Example:

left([City],4)

right([Element1],Value1)

Right(Element, Number)

It returns the last number of characters denoted by Number from Element.

Right(“Algorithm”, 4)

It returns “ithm”.

Example:

right([Element1],Value1)

contains([Element1],'Value1')

Like(Element, Value)

It checks if the Value is present in the Element using a wildcard match.

It returns True if the value is present; otherwise, it returns False.

Like(India, “In”)

It returns True.

Example:

contains([City],'Wi')

notLike([Element1],'Value1')

Not Like(Element, Value)

It checks if the Value is present in the Element using a wildcard match.

It returns True if the value is not present; otherwise, it returns False.

Not Like(India, “In”)

It returns False.

Example:

notLike([City],'Wi')

[Element1].strip()

Trim(Element)

It trims the white spaces at the beginning and end of the Element.

Trim(“  Monthly  “)

It returns the value as “Monthly”.

Example:

[City].strip()

[Element1].lstrip()

LTrim(Element)

It trims the white spaces at the beginning of the Element.

LTrim(“  Monthly  “)

It returns the value as “Monthly  ”.


Example:

[City].lstrip()

[Element1].rstrip()

RTrim(Element)

It trims the white spaces at the end of the Element.

RTrim(“  Monthly  “)

It returns the value as

“  Monthly”.


Example:

[City].rstrip()

substring([Element1],start_point = Value1,numberOfChars = Value2)

Substring(Element, Position, Number of Letters)

It returns a substring of Element starting from Position containing Number of Letters.

Substring(“Monthly”, 1, 2)

It returns the value as “on”.

Example:

substring([City],start_point = 1,numberOfChars = 2)

 

(info)

Note:

The Element1 and Element2 mentioned in the above table can be Feature, Constant, or another Expression that evaluates to a Textual value.

Dynamic Calculations Types

For Dynamic Calculations Types, refer to Dynamic Calculations.

Features

This block contains the Features available in the dataset. You can drag-drop it on the canvas to include the features. Then, use the drop-down arrow to select from the list of available features.

The image below displays an example feature block with a list of features available in a sample dataset.













Example of Expression

Consider an HR Dataset with the EmployeeNumber, MonthlyRate, JobInvolvement, BusinessTravel, and a few other features. The input data is shown in the figure below.

Two features are added using the Expression builder –

  • A feature named MonthlyIRateinINR that converts the monthly rate to INR by multiplying each row of the MonthlyRate feature by a constant value of 78. The expression is shown in the figure below.

  • A feature named Biztravel that extracts a substring of the BusinessTravel feature by removing the text Travel from the beginning of the feature. The expression is shown in the figure below.



The result of the Expression node containing the newly added features is displayed in the figure below.

As seen in the above figure, two new features are added – MonthlyIRateinINR and Biztravel.


Table of Contents