Dynamic Calculations is a part of the Expression function. Using Dynamic Calculations, you can define formulas to create new features from the existing features of the dataset.

Dynamic Calculations is one of the features available in the Expression Builder on the Feature Definition page.

Types of Dynamic Calculation

The different types of Dynamic Calculations are explained in the table below.

Name and Calculation Type

Code Editor

Description

Cumulative

cumulative([Element1], function="Sum", decimals=4) 

It returns the cumulative value of the selected Feature based on the selected Function.

Available values are:

  • Sum
  • Mean
  • Max
  • Min


Rolling Window

rollingWindow([Element1], function="Sum", direction="Backward", windowSize=1, decimals=4)

It returns the rolling value of the selected Feature based on the defined Window Size.

Example:

rollingWindow([Row ID], function="Mean", direction="Forward", windowSize=1, decimals=4)

Difference

diffData([Element1], relativeTo="Value1", windowSize=1, decimals=4)

It returns the difference between two rows of the selected Feature based on the defined Window Size.

Available Values are:

  • Previous
  • Next
  • First
  • Last



Percent Difference

percentDiff([Element1], relativeTo="Value1", windowSize=1, decimals=4)

It returns the percentage difference between two rows of the selected Feature based on the defined Window Size.

Available Values are:

  • Previous
  • Next
  • First
  • Last

Rank

rankData([Element1], [Element2], method="Dense", order="ASC")

It returns the numerical data ranks of a Feature in the selected Order based on the selected Rank Method.

Methods:

  • Dense
  • Row_Number
  • Rank


Order:

  • ASC
  • DESC

Lag

lagData([Element1], windowSize=1)

It returns the value from a previous row of the selected Feature based on the defined Window Size.

Lead

leadData([Element1], windowSize=1)

It returns the value from the next row of the selected Feature based on the defined Window Size.

The sections below explain each of these calculations in detail.

Consider the data snippet displayed in the below figure.

This input data is used in the examples of different types of Dynamic Calculations in the below sections.

Cumulative

The Cumulative expression provides the functionality to obtain a running value of a feature based on the selected function.

It can be used only on numerical features.

The elements in Cumulative expression are displayed in the figure below.




The table given below describes the elements in the Cumulative expression.

Sr. No.

Field Name

Description

1

Feature

This drop-down allows you to select a numerical feature from the selected dataset.

2

Function

It allows you to select the function to be performed on the selected feature. The available options are:

  • Sum: It returns the addition of the selected values.
  • Mean: It returns the average value of the selected values.
  • Maximum: It returns the maximum value from the selected values.
  • Minimum: It returns the minimum value from the selected values.

Consider the input dataset in Figure: Input Data Snippet. The table below describes the input fields selected on the Feature Definition page for Cumulative Expression.

Field

Value

Input/Selected Feature

Units Sold

New Feature Name

Cumulative_Unit Sold

Variable Type of New Feature

Numerical

Data Type of New Feature

Integer

The output Data page of the Cumulative node displays an additional feature Cumulative_Unit Sold, along with the existing features, as shown in the figure below.

The new feature displays the result of the Cumulative calculations. Here, each cell in the new feature displays the sum of the previous cumulative value and the current row value.

The Cumulative value of the nth row is calculated as per the formula given below.

Cumulative Value = Selected_function(Current Row Value, Previous Cumulative Value)

  • If the selected function is Sum, then Output = Sum(Current Row Value, Previous Cumulative Value).
  • If the selected function is Mean, then Output = Mean(Current Row Value, Previous Cumulative Value).
  • If the selected function is Min, then Output = Min(Current Row Value, Previous Cumulative Value).
  • If the selected function is Max, then Output = Max(Current Row Value, Previous Cumulative Value).

In this example, the selected function is Sum, so the cumulative values are calculated as below –

First row - Same as the value of the original feature, 4484.

Second row - [1075]  + [4484] = [5559]

Third row is - [6515] + [5559] = [12074], and so on.

Rolling Window

Rolling Window is used to return the rolling value of the feature based on the defined window size.

You can use Rolling Window only on features with numerical values, and the output type of rolling window is also a numerical value. The two directions available in the Rolling Window expression are Forward and Backward.

The elements in the Rolling Window expression are displayed in the figure below.





The table given below describes the elements in the Rolling Window expression.

Sr No.

Field Name

Description

1

Feature

It allows you to select a numerical feature available in the selected dataset.

2

Function

It allows you to select the function to be performed on the selected feature. The available options are:

  • Sum: It returns the addition of the selected values.
  • Mean: It returns the average value of the selected values.
  • Maximum: It returns the maximum value from the selected values.
  • Minimum: It returns the minimum value from the selected values.

3

Direction

It allows you to select the direction to consider while performing the rolling window calculation of the selected feature.

The available directions are:

Forward

Backward

4

Window Size

It allows you to enter an integer value as the window size.

The minimum value for Window Size is 1, and the maximum value is (n – 1), where n is the total number of data points.

Consider the input dataset in Figure: Input Data Snippet. The table below describes the input fields selected on the Feature Definition page for Rolling Window Expression.

Field

Value

Input/Selected Feature

Total Revenue

Function

Sum

Direction

Forward

Window size

1

New Feature Name

RW_Revenue_Frwd_1

Variable Type of New Feature

Numerical

Data Type of New Feature

Float

The output Data page of the Rolling Window node displays an additional feature (in this example, RW_Revenue_Frwd_1) along with the existing features, as shown in the figure below.

The Rolling Window value of the nth row is calculated as per the formula given below.

Rolling Window (n) = Selected_function(R1, R2)

Where,

R1 = Selected Feature Value (n)

R2 = Selected Feature Value[n + Window Size]; if the direction is Forward Or

R2 = Selected Feature Value[n - Window Size]; if the direction is Backward

For example, if the selected function is Mean, the direction is Forward, and the Window Size is 1.

Output = Mean(Current Row Value, Next Row Value)

In this example, the Selected Function is Sum, Direction is Forward, and Window Size is 1. So, the Rolling Window values are calculated as below -
First row: [2920025.64] + [51008.75] = [2971034.39].

Second row: [51008.75] + [1003700.9] = [1054709.65]

Third row: [1003700.9] + [5134318.41] = [6138019.31], and so on.

Difference

Difference expression returns the difference between values in two rows that are separated by Window Size. The output is a numerical value.

The elements in the Difference expression are displayed in the figure below.

The table given below describes the different elements in the Difference Expression.

Sr No.

Field Name

Description

1

Feature

It allows you to select the features available in the selected dataset.

2

Relative to

It allows you to select the relative value to calculate the difference. The available options are –

  • Previous - It considers the value in the previous row.
  • Next - It considers the value in the next row.
  • First – it considers the value in the first row.
  • Last – it considers the value in the last row.

3

Window Size

It allows you to enter an integer value as the window size.

The minimum value for Window Size is 1, and the maximum value is (n – 1), where n is the total number of data points.

Consider the input dataset in Figure: Input Data Snippet. The table below describes the input fields selected on the Feature Definition page for Difference Expression.

Field

Value

Input/Selected Feature

Total Profit

Relative To

Previous

Window Size

1

New Feature Name

Diff_TotalProfit_Prev_1

Variable Type of New Feature

Numerical

Data Type of New Feature

Float

The output Data page of the Difference node displays an additional feature Diff_TotalProfit_Prev_1 along with the existing features, as shown in the figure below.

The Difference value of the nth row is calculated as per the formula given below.

Difference (n) = R1 - R2

Where,

R1 = Selected Feature Value (n)

R2 = Selected Feature Value[n - Window Size]; if the Relative To is Previous Or

R2 = Selected Feature Value[n + Window Size]; if the direction is Next Or

R2 = Selected Feature Value[First Row]; if the direction is First Or

R2 = Selected Feature Value[Last Row]; if the direction is Last Or

In this example, Relative To is Previous, and Window Size is 1. So, the Difference values are as below -
First row: na.

Second row: [16834.5] - [566105] = [-549270.5]

Third row: [411291.95] - [16834.5] = [394457.45], and so on.

Percent Difference

The Percent Difference expression returns the percentage change between values in two rows apart from each other by Window Size. The output is a numerical value.

The elements in the Percent Difference expression are displayed in the figure below.

The table given below describes the different elements in the Percent Difference Expression.

Sr No.

Field Name

Description

1

Features

It allows you to select the feature available in the selected dataset.

2

Relative to

It allows you to select the relative value for calculating the Percent Difference. The available options are –

  • Previous - It considers the value in the previous row.
  • Next - It considers the value in the previous row.
  • First – it considers the value in the first row.
  • Last – it considers the value in the last row.

3

Window Size

It allows you to enter an integer value as the window size.

The minimum value for Window Size is 1, and the maximum value is (n – 1), where n is the total number of data points.

Consider the input dataset in Figure: Input Data Snippet. The table below describes the input fields selected on the Feature Definition page for Percent Difference Expression.

Field

Value

Input/Selected Feature

Units Sold

New Feature Name

USold_PD_Previous_1

Variable Type of New Feature

Numerical

Data Type of New Feature

Float

Relative To

Previous

Window Size

1

The output Data page of the Percent Difference node displays an additional feature USold_PD_Previous_1 along with the existing features, as shown in the figure below.

The Percent Difference value of the nth row is calculated as per the formula given below.

Percent Difference (n) = ((R1 – R2) / R2)

Where,

R1 = Selected Feature Value (n)

R2 = Selected Feature Value[n - Window Size]; if the Relative To is Previous Or

R2 = Selected Feature Value[n + Window Size]; if the direction is Next Or

R2 = Selected Feature Value[First Row]; if the direction is First Or

R2 = Selected Feature Value[Last Row]; if the direction is Last.

In this example, Relative To is Previous, and Window Size is 1. So, the Percent Difference values are as below -
First row: na.

Second row: (1075 – 4484) / 4484 = -0.7603

Third row: (6515 – 1075) / 1075 = 5.0605, and so on.

Percent From

Percent From expression returns the percentage difference between values of two rows apart from the window size.

The elements in the Percent From expression are displayed in the figure below.

The table given below describes the different elements in the Percent From Expression.

Sr. No.

Field Name

Description

1

Features

It allows you to select the feature available in the selected dataset.

2

Relative to

It allows you to select the relative row for calculating the Percentage From. The available options are –

  • Previous - It considers the value in the previous row.
  • Next - It considers the value in the previous row.
  • First – it considers the value in the first row.
  • Last – it considers the value in the last row.

3

Window Size

It allows you to enter an integer value as the window size.

The minimum value for Window Size is 1, and the maximum value is (n – 1), where n is the total number of data points.

Consider the input dataset in Figure: Input Data Snippet. The table below describes the input fields selected on the Feature Definition page for Percent From Expression.

Field

Value

Input/Selected Feature

Total Profit

New Feature Name

PercentFrom_TProfit_Prev_1

Variable Type of New Feature

Numerical

Data Type of New Feature

Float

Relative To

Previous

Window Size

1

The output Data page of the Percent From node displays an additional feature, PercentFrom_TProfit_Prev_1, along with the existing features, as shown in the figure below.

The Percent From value of the nth row is calculated as per the formula given below.

Percent From (n) = ((R1 * 100)/R2))

Where,

R1 = Selected Feature Value (n)

R2 = Selected Feature Value[n - Window Size]; if the Relative To is Previous Or

R2 = Selected Feature Value[n + Window Size]; if the direction is Next Or

R2 = Selected Feature Value[First Row]; if the direction is First Or

R2 = Selected Feature Value[Last Row]; if the direction is Last Or

In this example, Relative To is Previous, and Window Size is 1. So, the, Percent From values are as below -
First row: na.

Second row: (16834.5 * 100) / 566105 = 2.9737

Third row: = (411291.95 * 100) / 16834.5 = 2443.1492, and so on.

Rank

The Rank expression returns the numerical rank of an input feature based on another feature. The output of Rank is an integer value.

The rank order is ascending or descending, and you can use this only on features with numerical values.

The elements in Rank expression are displayed in the figure below.

The table given below describes the different elements in Rank Expression.

Sr No.

Field Name

Description

1

Group By

It allows you to select a feature to perform the group by operation.

2

Feature

It allows you to select a feature available in the selected dataset.

3

Rank Method

It allows you to select the rank method to be used on the input feature. Available rank methods are:

  • Dense: It assigns the rank number to each row in a partition. It does not skip the number for similar values.
  • Row Number: It assigns the sequential rank number to each unique record.
  • Rank: It assigns the rank number to each row in a partition. It skips the number for similar values.

4

Rank Order

It allows you to select how your data should be sorted before allocating a rank value. The available options are –

  • ASC – It sorts the values in ascending order.
  • DESC - It sorts the values in descending order.

Consider the input dataset shown in the figure below.

The output Data page of the Rank node displays an additional feature Year_Sales_Rank_Desc along with the existing features, as shown in the figure below.

 

The newly created feature displays the rank of the Sales grouped by Year in descending order. In this example, since the method selected is Rank, Rubiscape assigns the rank number to each row in the partition. It skips the number for similar values. For example, in the above data, there are four values in the Year 2018 – 1000, 1000, 720, and 1500. The ranks of these values in descending order are 2, 2, 4, and 1.

The figure below shows the output of different rank methods. The names of the columns denote the rank method and sorting order.








The newly added feature columns are explained below.

  • Year_Sales_Rank_Desc displays ranks of Sales, grouped by Year using the Rank method, sorted by Descending
  • Year_Sales_Dense_Desc displays ranks of Sales, grouped by Year using Dense method, sorted by Descending
  • Year_Sales_RowNum_Desc displays ranks of Sales, grouped by Year using Row Number method, sorted by Descending

Lag

Lag expression is used to return the value from a row that is distanced from the current row by Window Size. Thus, you can enter any integer as window size. The elements in the Lag expression are displayed in the figure below.

The table given below describes the different elements in the Lag Expression.

Sr No.

Field Name

Description

1

Feature

It allows you to select the feature available in the selected dataset.

2

Window Size

It allows you to enter an integer value as the window size.

The minimum value for Window Size is 1, and the maximum value is (n – 1), where n is the total number of data points.

Consider the input dataset in Figure: Input Data Snippet. The table below describes the input fields selected on the Feature Definition page for Lag Expression.

Field

Value

Input/Selected Feature

Total Profit

New Feature Name

Total Profit_Lag_3

Variable Type of New Feature

Numerical

Data Type of New Feature

Float

Window Size

3

The output Data page of the Lag node displays an additional feature, Total Profit_Lag_3, along with the existing features, as shown in the figure below.

The Lag value of the nth row is calculated as per the formula given below.

Lag (n) = R[n - Window Size]

Where,

R = Selected Feature Value

In this example, the Window Size is 3. So, the, Lead values are as below -
First Row: Lead(1) = R[1 - 3] = na.

Second row: Lead(2) = R[2 - 3] = na

Third row: Lead(3) = R[3 - 3] = na

Fourth row: Lead(4) = R[4 - 3] = R1 = 566105

Fifth row: Lead(5) = R[5 - 3] = R2 = 16834.5, and so on.

Lead

Lead expression is used to return the value from the next row, separated by Window Size. The elements in Lead expression are displayed in the figure below.

The table given below describes the different elements in the Lead Expression.

Sr No.

Field Name

Description

1

Feature

It allows you to select the feature available in the selected dataset.

2

Window Size

It allows you to enter an integer value as the window size.

The minimum value for Window Size is 1, and the maximum value is (n – 1), where n is the total number of data points.

Consider the input dataset in Figure: Input Data Snippet. The table below describes the input fields selected on the Feature Definition page for Lead Expression.

Field

Value

Input/Selected Features

Total Profit

Window Size

3

New Feature Name

Lead_Total Profit_3

Variable Type of New Feature

Categorical

Data Type of New Feature

Numerical

The output Data page of the Lead node displays an additional feature, Lead_Total Profit_3, along with the existing features, as shown in the figure below.

The Lead value of the nth row is calculated as per the formula given below.

Lead (n) = R[n + Window Size]

Where,

R = Selected Feature Value

In this example, the Window Size is 3. So, the, Lead values are as below -
First Row: Lead(1) = R[1+3] = R4 = 1273303.59.

Second row: Lead(2) = R[2+3] = R5 = 54669.06

Third row: Lead(3) = R[3+3] = R6 = 154720.8

(info)

Notes:

  • The variable type and data type of the newly added feature are automatically updated when you validate the expression.
  • You can validate the expression any time the properties are updated.

Table of Contents