Expression | |||
Description | Expression involves creating additional features in a dataset by combining existing features in different ways using various expressions and functions. | ||
Why to use | For Data Preparation | ||
When to use |
| When not to use | — |
Prerequisites | No prerequisites; it can be applied to any data source. | ||
Input | Any dataset | Output | Additional features created using existing features. |
Statistical Methods used | Refer to Operators in Expression Builder. | Limitations |
|
Expression is located under Model Studio ( ) in Data Preparation, in the Task Pane on the left. Use the drag-and-drop method to use the feature in the canvas. Click the feature to view and select different properties for analysis.
Refer to Properties of Expression.
With the Expression function, you can generate new features by modifying a feature or combining two or more existing features from the dataset using arithmetic, logic, text, date, and other such operators.
These various types of operators can be used as standalone operators or combined to create new features. The Expression function provides the facility to validate the expression before using it on the data. New features are created only if the combination of operators produces a valid result.
Properties of Expression
The available properties of the Expression are as shown in the figure given below.
The table given below describes the different fields present on the Properties pane of the Expression.
Field | Description | Remark | |
Task Name | It is the name of the task selected on the workbook canvas. | You can click the text field to edit or modify the name of the task as required. | |
Features | It lists the newly created features. | — | |
Add Expression Feature | It allows you to create a new feature. |
| |
Advanced | Node Configuration | It allows you to select the instance of the AWS server to provide control on the execution of a task in a workbook or workflow. | For more details, refer to Worker Node Configuration. |
Creating a New Feature
The Expression function helps you to create new features in your dataset. To create a new feature, follow the steps given below.
- Create your algorithm flow with an Expression.
- Select the Expression.
The Properties pane is displayed.
On the Properties pane, click Add Expression Feature.
The Feature Definition page is displayed as shown in the figure below.
The table given below describes the different fields present on the Feature Definition page.
Field | Description | Remark |
Feature name | It allows you to enter the name of the newly created feature. | — |
Variable Type | It allows you to select the variable type of the feature. | The available options are,
|
Data type | It displays the data type of the feature based on the selected variable type. | — |
Hyperlink to access the online help for Expression. | — | |
Switch to Code Editor | It allows you to switch between the Code Editor and Expression Builder. | — |
Expression Builder | It allows you to build the expression by adding the Operators, Features, and Constants on the canvas. | For building an expression, refer to Building an Expression. |
Navigation Icon – Helps to change the relative position of the blocks on Expression Builder canvas. | — | |
Zoom in – It helps you to zoom in on the Expression Builder canvas. | — | |
Zoom out – It helps you to zoom out the Expression Builder canvas. | — | |
Delete – It helps you to delete a block or an expression from the Expression Builder canvas. | — | |
It deletes the built expressions and closes the Feature Definition page. | — | |
It validates the expression based on the selected features, operators, and the variable type of the newly created feature. | — | |
It adds a valid newly created expression to the Features list in the Properties pane of the Expression node. | — |
- On the Feature Definition page, Enter the Feature name.
- Select the Variable Type from the drop-down.
Build your expression. Refer to Building an Expression.
Note:
Alternatively, you can use the Code Editor to build your expression. To build your expression using Code Editor, click Switch to Code Editor in the Expression Builder canvas top-right corner and type your code in Python language.
- Once the expression is ready, click Validate. This step is optional; however, it is highly recommended to validate the expression before running it. This helps to eliminate errors at an early stage.
If the expression is valid, Expression is valid message is displayed. If it is invalid, build the expression again. When the valid expression is ready, click Add.
The newly created Feature is added in the Features list in the Properties pane, as shown in the figure below.
5. Run the Expression.
After successful execution, the feature is added to the output of the Expression node. You can explore the Expression node to view the newly added feature.
Building an Expression
- From the left pane of Expression Builder, select the required Operator Category.
The operators available under the selected category are displayed. Select the required Operator
The Operator is added to the canvas.
For details about operators in Expression Builder, refer to Operators in Expression Builder. - Select Features from the left pane of the Expression Builder.
The feature block is added to the canvas.
Click the feature block and select the required feature from the drop-down.
3. Drag and drop the feature block as the required operand in the Operator added in step 2.
Repeat steps 3 to 5 for all the features required for the selected Operator.
Note: | To add a constant to the canvas, click Constant in step 3 and add the required constant block. |
A sample expression is shown in the figure below.
Operators in Expression Builder
The Expression Builder on the Feature Definition page has the elements listed below -
- Operator categories
- Constant types
- Features available in the predecessor node
The different types of operators are explained in the sections below.
Comparison Operators
The different types of Comparison operators are explained in the table below.Operator | Code Editor | Syntax/Description | Example |
| Comparison - Element1 == Element2 | [Quantity] == 100 | |
| Not Equal To - [Element1] != Element2 | [Quantity] != 0 It returns True when the value of the Quantity feature is not 0; otherwise, it returns False. | |
| Less Than - [Element1] < Element2 | [Quantity] < 50 It returns True when the value of the Quantity feature is less than 50; otherwise, it returns False. | |
| Less Than or Equal To - [Element1] <= Element2 It returns True if the value of Element1 is less than or equal to Element2. | [Quantity] <= 50 It returns True when the value of the Quantity feature is less than or equal to 50; otherwise, it returns False. | |
| Greater Than - [Element1] > Element2 It returns true if the value of Feature is greater than Element2. | [Quantity] > 35 It returns true when the value of the Quantity feature is greater than 35; otherwise, it returns False. | |
| Greater Than or Equal To - [Element1] >= Element2 It returns true if the value of Element1 is greater than or equal to Element2. | [Quantity] >= 16.35 It returns true when the value of the Quantity feature is greater than or equal to 16.35; otherwise, it returns False. |
Note: | The Element1 and Element2 mentioned in the above table can be Feature, Constant, or another Expression that evaluates to a Numerical value. |
Constant Types
The different types of Constants are explained in the table below.Operator | Code Editor | Syntax/Description | Example/Remark |
| Boolean – It is used to add a Boolean value in the expression. | Available options are True and False. | |
| String – It is used to add a string to the expression. | Example : 'Age' or '12" | |
| List – It is used to create a list of items. You can connect other features/expressions to create a list. | or [[Age], [Gender, [Education]] In this example, a list is created using three features – Age, Gender, and Education. | |
| Float – It is used to add a float value to the expression. | 123.01 | |
| Integer – It is used to add an integer value to the expression. | 123 | |
| Date – It is used to add a date to the expression. | The date format is yyyy-mm-dd. Example: pd.datetime(2023,12,31) | |
| Key Value Pair - | Example: 'Education' : 'Engineer' |
Conversion Operators
The different types of conversion operators are explained in the table below.Operator | Code Editor | Syntax/Description | Example/Remark |
| To BOOLEAN(Element) | Example: bool([Owner]) | |
| To NUMERIC(Element) | Example: float(123) | |
| To INT (Element) | int(12.5) | |
| To STR(Element) It converts the Element to a string value. | str(Rubiscape) | |
| To Date(Element) | Example: datetime.strptime([Owner], "2-3-1998") You can provide a Python date format for conversion. |
Note: | The Element1 mentioned in the above table can be either a Feature, a Constant, or an Expression. |
Logical Operators
The different types of Logical operators are explained in the table below.
Operator | Code Editor | Syntax/Description | Example/Remark |
| NOT(Feature) | Any value which is not 0 or False is considered as True. Example: not([Color]) | |
| Element1 AND Element2 It returns True if Element1 and Element2 both evaluate to True. | Example: [Color] and [Neck Style] | |
| Element1 OR Element2 It returns True if either Element1 or Element2 evaluates to True. | Example: [Color] or [Neck Style] | |
| Element1 XOR Element2 It returns True if the values of Element1 and Element2 are not the same. | — | |
| Element1 Equals Element2 It returns True if Element1 is equal to Element2. | Example: [Price] == '1000' |
Note: | The Element1 and Element2 mentioned in the above table can be Feature, Constant, or another Expression. |
Mathematical Operators
The different types of Mathematical operators are explained in the table below.
Operator | Code Editor | Syntax/Description | Example |
| Addition - It is used to add,
|
| |
| Subtraction - [Element1] – [Element2] It is used to subtract,
|
| |
| Multiplication [Element1] * [Element2] It is used to multiply
|
| |
| Division [Element1] / [Element2]
|
| |
| Sin(Element) It returns the sine value of the Element. | Example: math.sin(20) | |
| Cos(Element) It returns the cosine value of the Element. | Example: math.cos(20) | |
| Tan(Element) It returns the tangent of the Element. | Example: math.tan(20) | |
| Floor(Element) It returns the floor value of the Element. | The floor value of a number is the closest integer less than or equal to the given number. Example: math.floor(20) | |
| CEIL(Element) It returns the ceiling value of the Element. | The ceiling value of a number is the closest integer greater than or equal to the given number. Example: math.ceil(100) | |
| ABS(Element) It returns the absolute value of the Element. | Example: abs([Qty]) | |
| Mod(Element1)( Element2) It returns the remainder of the division Element1/Element2 | Example: [Cost]%[Qty] | |
| Power(Element1)( Element2) It returns Element1 to the power of Element2 | Example: math.pow([Price],[Qty]) | |
| It returns the value of the mathematical constant pi. | The value is 3.141593. Example: [Price] * math.pi | |
| NATURAL LOG([Element]) It is used to calculate the natural logarithm (logarithm to the base 10) of an Element. | NATURAL LOG ([Quantity]) If the value of Quantity is 5, this expression evaluates as log10(5). Example: log([Price]) | |
| EXPONENTIAL([Element]) It is used to raise the value of e (2.7183) to the power of Element. | EXPONENTIAL ([Quantity]) If the value of Quantity is 5, this expression evaluates as exp(5) and e5 = (2.7183)5 is calculated. | |
| RECIPROCAL([Element]) It is used to calculate the reciprocal (power of –1) of Element. | RECIPROCAL (Quantity) if the value of Quantity is 5, this expression evaluates to 1/5. | |
| RECIPROCAL SQUARE([Element]) It is used to calculate the reciprocal of the square of Feature. | Example: 1/5**2 if the value of Quantity is 5, this expression evaluates to 1/(52). | |
| RECIPROCAL SQUARE ROOT It is used to calculate the reciprocal of the square root of Feature. | 1/(5**0.5) If the value of Quantity is 5, this expression evaluates to (1/√5). | |
| ([Element])**2 It is used to calculate the square of the Element. | If the value of Quantity is 5, this expression evaluates to (5)2. | |
| sqrt([Element]) It is used to calculate the square root of the Element. | Example: sqrt(5) If the value of Quantity is 5, this expression evaluates to √5. |
Note: | The Element1 and Element2 mentioned in the above table can be Feature, Constant, or another Expression that evaluates to a Numerical value. |
Aggregation Operators
Operator | Code Editor | Syntax/Description | Example/Remark |
| It used to get the aggregated value of the selected feature. Available aggregation methos are
| Example: Agg([Price], aggregationMethod = "max") | |
| It exclude the selected dimension(s) in the view. Available aggregation methos are
| Example: groupByExclude([Size], measureName = [Qty], aggregationMethod = "min") | |
| It includes the selected dimension(s) in the view. Available aggregation methos are
| Example: groupByInclude([Size], measureName = [Price], aggregationMethod = "max") | |
| It fixes the selected dimension in the view. Available aggregation methos are
| Example: groupByFixed([Size], measureName = [Cost], aggregationMethod = "nunique") |
Date Operators
The different types of Date operators are explained in the table below.
Operator | Code Editor | Syntax/Description | Example/Remark |
| Date(Element1, Element2, Element3) It creates a date object with the given values. Element1– Year Element2 – Month Element3 – Date | Example: pd.datetime(2023,12,31) | |
| DateAdd(Element1, Time Component, Element2) It creates a date feature by adding Element2 to the Time Component of Element1. The newly created feature is of date type. |
Example: [Order Date]+ dateutil.relativedelta.relativedelta(weeks=25) | |
| DateDiff(Element1, Time Component, Element2) It creates a date feature by subtracting the Element2 to the Time Component of the Element1. The newly created feature is of date type. |
Example: relativedelta([Order Date], [Ship Date]).years | |
| DateTrunc(Element, Time Component) It creates a date feature by extracting the Time Component from the Element. The rest of the components are given default values. The newly created feature is of date type. |
Example: [Order Date].replace(day=1, hour=0, minute=0, second=0) | |
| Day(Element) It returns the Day component of the Element. The newly created feature is of integer type. | Example: Day(2021-08-29 12:30:45) It returns value 29. | |
| Hour(Element) It returns the Hour component of the Element. The newly created feature is of integer type. | Example: Hour(2021-08-29 12:30:45) It returns value 12. | |
| Minute(Element) It returns the Day component of the Element. The newly created feature is of integer type. | Example: Minute(2021-08-29 12:30:45) It returns value 30. | |
| Now It returns the current date and time in the UTC timezone. The newly created feature is of date type. | — | |
| Month(Element) It returns the Month component of the Element. The newly created feature is of integer type. | Example: Month(2021-08-29 12:30:45) It returns value 08. | |
| Quarter(Element) It returns the Quarter of the year of the date in the Element. The newly created feature is of integer type. | Example: Quarter(2021-08-29 12:30:45) It returns the value 3 since the date is from the third quarter of the year. | |
| Second(Element) It returns the Second component of the Element. The newly created feature is of integer type. | Example: Second(2021-08-29 12:30:45) It returns value 45. | |
| Week(Element) It returns the week of the year of the date in the Element. The newly created feature is of integer type. | Example: Week(2021-08-29 12:30:45) It returns the value 34 since the date is from the 34th week of the year. | |
| Weekday(Element) It returns the numerical form of the day of the week of the Element date. It returns the name of the day (Monday, Tuesday, and so on) from the Element. It takes a date column as input. | Example: weekday([Order Date]) | |
| Year(Element) It returns the year of the Element date. The newly created feature is of integer type. | Example: Year(2021-08-29 12:30:45) It returns value 2021. |
Note: | The Element1, Element2, Element3, and Element mentioned in the above table can be either Feature, Constant, or another Expression that evaluates to an Interval value. |
Text Operators
The different types of Text operators are explained in the table below.
Operator | Code Editor | Syntax/Description | Example/Remark |
| Beginswith(Element, Value) It returns True if the string value of the Element begins with a character in Value. |
| |
| Endswith(Element, Value) It returns true if the string value of Element ends with character in Value. |
| |
| Compare(Element, Value) It returns true if the string in Element is the same as the string in Value. | This operation is case-sensitive. | |
| Concatenate(Element, Value1, Value2) It concatenates Element with Value1 and Value2 | Concat(“Monthly”, “ “, “Income) It returns the string as Monthly Income. | |
| Contains(Element, Value) It returns True if the Element string contains a string in Value. |
Example: contains([Country Name],'India') | |
| Find(Element, substring to find, start index, end index) It finds the position of a substring in a text. If the substring is found at the location specified by the start index and end index, it returns the position of the text; otherwise, it returns -1. |
Example: 'India'.find('in',0,2) | |
| Length(Element) It returns the length of the string in Feature. | Length(“Monthly Income”) It returns the length of the string “Monthly Income” - 14. Example: len([City]) | |
| Replace(Element, string to replace, string to replace with, count) It replaces a string with a user-specified string. | Replace(City, Pune, PUN, 1) It replaces the rows in the City feature containing value Pune with value PUN. Example: [City].replace('Pune','PUN',1) | |
| Lower(Element) It converts the Element to lowercase. | Lower(“Monthly”) It returns the value “monthly”. Example: [Region].lower() | |
| Upper(Element) It converts the Element to uppercase. | Upper(“Monthly”) It returns the value “MONTHLY”. Example: [Region].upper() | |
| Left(Element, Number) It returns the first number of characters denoted by Number from Element. | Left(“Algorithm”, 4) It returns “Algo”. Example: left([City],4) | |
| Right(Element, Number) It returns the last number of characters denoted by Number from Element. | Right(“Algorithm”, 4) It returns “ithm”. Example: right([Element1],Value1) | |
| Like(Element, Value) It checks if the Value is present in the Element using a wildcard match. It returns True if the value is present; otherwise, it returns False. | Like(India, “In”) It returns True. Example: contains([City],'Wi') | |
| Not Like(Element, Value) It checks if the Value is present in the Element using a wildcard match. It returns True if the value is not present; otherwise, it returns False. | Not Like(India, “In”) It returns False. Example: notLike([City],'Wi') | |
| Trim(Element) It trims the white spaces at the beginning and end of the Element. | Trim(“ Monthly “) It returns the value as “Monthly”. Example: [City].strip() | |
| LTrim(Element) It trims the white spaces at the beginning of the Element. | LTrim(“ Monthly “) It returns the value as “Monthly ”. Example: [City].lstrip() | |
| RTrim(Element) It trims the white spaces at the end of the Element. | RTrim(“ Monthly “) It returns the value as “ Monthly”. Example: [City].rstrip() | |
| Substring(Element, Position, Number of Letters) It returns a substring of Element starting from Position containing Number of Letters. | Substring(“Monthly”, 1, 2) It returns the value as “on”. Example: substring([City],start_point = 1,numberOfChars = 2) |
Note: | The Element1 and Element2 mentioned in the above table can be Feature, Constant, or another Expression that evaluates to a Textual value. |
Dynamic Calculations Types
For Dynamic Calculations Types, refer to Dynamic Calculations.
Features
This block contains the Features available in the dataset. You can drag-drop it on the canvas to include the features. Then, use the drop-down arrow to select from the list of available features.
The image below displays an example feature block with a list of features available in a sample dataset.
Example of Expression
Consider an HR Dataset with the EmployeeNumber, MonthlyRate, JobInvolvement, BusinessTravel, and a few other features. The input data is shown in the figure below.
Two features are added using the Expression builder –
- A feature named MonthlyIRateinINR that converts the monthly rate to INR by multiplying each row of the MonthlyRate feature by a constant value of 78. The expression is shown in the figure below.
- A feature named Biztravel that extracts a substring of the BusinessTravel feature by removing the text Travel from the beginning of the feature. The expression is shown in the figure below.
The result of the Expression node containing the newly added features is displayed in the figure below.
As seen in the above figure, two new features are added – MonthlyIRateinINR and Biztravel.
Table of Contents