RubiPython is a feature within Code Fusion on the Rubiscape platform to write your code in the Python programming language for building models.

You can use the RubiPython node as a stand-alone node, or you can connect it to your Reader node (dataset) or other algorithm nodes. You can connect multiple predecessors or preceding tasks to a single RubiPython node.

Each predecessor task and all the columns present in the task appear on the RubiPython configuration page. You can differentiate each predecessor task and its columns by their names. This feature is useful when you have large data to combine or split and perform different actions on it.

Writing Custom Code using RubiPython

To write your custom code using RubiPython, follow the steps given below.

  1. Create your algorithm flow. Refer to Building Algorithm Flow in a Workbook Canvas.
  2. Drag and drop RubiPython on your workbook canvas.


    (info)

    Notes:

    • You can connect multiple predecessor nodes to RubiPython. The output of all the predecessors is available for your use in the RubiPython node.
    • If you connect the dataset to RubiPython, the dataset fields are available as input to RubiPython.
    • If you connect the algorithm to RubiPython, the columns in the resultant data are available as input to RubiPython.

      Connect required nodes to RubiPython in your algorithm flow.

  3. Select RubiPython and in the Properties pane, click Configure.


    The RubiPython configuration page is displayed.

    The fields/icons on the RubiPython configuration page are described in the table below.

    Icon/Field

    Description

    Available Input Variables

    • It displays the features (columns) present in the preceding task’s output. These features are stored in a variable named inputData, which is of dictionary data type.  
    • Dictionary is a data type that contains name-value pairs.

    • You can access the features in the inputData variable and process them to create your custom variables.
    • You can access all the preceding task names, the features (columns) present in each task, and the index of each row in the task in the inputData variable.
    • You can access all the preceding task names in the variable named inputData.keys(), which is of dictionary data type.
    • You can access all the features (columns) present in a particular preceding task and the index of each row in the inputData[‘dataset_ name’] variable.
    • You can access a single feature (column) from the features present in a particular preceding task and the index of each row in the inputData[‘<dataset_ name>’][‘<column_name>’] variable.

    Custom Output Variables

    It displays a list of output variables created by you. These variables are stored in the form of a Dictionary.

    Add Custom Output Variable

    It helps you create your custom output variable from the existing variables (Features of the dataset). Refer to Creating Custom Output Variable.

    Add Multiple Custom Output Variable

    It helps you create multiple custom output variables from the existing variables (Features of the dataset). Refer to Creating Multiple Custom Output Variables.

    DEFINITE OPTION OUTPUT VARIABLE

    • If selected, only the Custom Output Variables are passed on to the successor node.
    • If multiple predecessors are connected to the RubiPython component, this flag is false (not selected) by default. You can select it as per your requirements.

    INPUT CARRY FORWARD FLAG

    • If selected, the variables received as input from RubiPython’s predecessor tasks are passed on to the successor task, not otherwise.
      Note: This is possible only when a single predecessor is connected to RubiPython, not when multiple predecessors are connected to RubiPython.
    • This flag is disabled when multiple predecessors are connected to RubiPython. So, the input variables do not appear on the Data page when you explore RubiPython.
    • Upon exploring RubiPython, you can view the custom output variables for RubiPython in its Data page if you have created the variables.

    INPUT SAME AS OUTPUT

    • If selected, the input to the RubiPython node is passed on to the successor task; not otherwise.
    • If multiple predecessors are connected to the RubiPython component, this flag is false (not selected) by default. You can select it as per your requirements.

    Python Code Editor

    The Python Code Editor helps you to add your customized Python code. Refer to Using Python Code Editor.

    TRAINING REQUIRED

    Functionality is coming soon.

    Minimap
    • It is a small, scaled version of code editor window.
    • If selected, shows the overview of entire code area in top right corner of code editor window.
    Theme
    • It helps you to customize the code editor theme.
    • Following theme options are provided:
      • VS-Dark
      • VS-Light
      • High Contrast-Dark
      • High Contrast-Light
    By default VS-Dark theme is selected. 

       

    It helps you to maximize the Code Editor page.

    It saves the changes, closes the configuration page, and returns you to the workbook canvas.

  4. Write your Python code in the Python Code Editor. Refer to Using Python Code Editor.
  5. After completing, click Save.
  6. To execute the code, Run the RubiPython node.
    After successful execution, a confirmation message is displayed. You can view the output of the custom component under View Log > Custom Component Log.

Using Python Code Editor

In the Python Code Editor, you can add your Python code.

  • The input variables are stored in the form of a Dictionary data type inputData.
  • Similarly, newly created variables are stored in another Dictionary type variable output.
  • To use new variables in your code, you first need to create them. Refer to Creating Custom Output Variable.
  • To print to the console, use print2log().
  • You can use variables declared at the workbook/workflow level in your Python code. Refer to Using Variables in RubiPython.

A sample Python code is shown in the image below.

The table below explains the above code snippet.

Line of Code

Result

data = inputData

It creates a copy of inputData with the name data.

print2log(data)

It prints the contents of inputData to the console log.

output = {}

It creates a new variable named ‘output’ of type dictionary.

output['newSepalLength'] = data[‘Iris CSV’]['Sepal.Length'] * 2.54

It assigns the value of Sepal.Length of Iris CSV dataset multiplied by 2.54 to column newSepalLength.

 

(info)

Notes:

  • Make sure the data types of your output and its corresponding input variable are the same. If the data types are different, RubiPython gives an error.
  • print2log()is a customized function created by Rubiscape, which internally uses the Python print() function. 

A sample Python code to access predecessor tasks is shown in the image below.

The table below explains the above code snippet.

Line of Code

Result

print2log(inputData.keys())

It prints all the predecessor task names referenced by inputData to the console log or custom component log as a dictionary.

print2log(inputData)

It prints the contents (predecessor task names, column names, and index) of inputData to the console log or custom component log.

print2log(inputData[‘taskname’])

It prints the contents (column names and index) of the predecessor task to the console log or custom component log.

print2log(inputData[‘taskname’]

[‘columnname’])

It prints the contents (column name and index) of a particular column of the predecessor task to the console log or custom component log.

Creating Custom Output Variable

To create custom variable in your RubiPython code, follow the steps given below.

  1. On the RubiPython page, click Add Custom Output Variable.
    Create Custom Variable page is displayed.
  2. Enter a Name for your output variable.
  3. Select the type of your output variable from the Variable Type drop-down. The options are CategoricalNumericalInterval, and Textual.

    (info)

    Note:

    Ensure the data type of the newly created output variable matches the data type of the corresponding input variable. If the variable types do not match, the application gives an error when running the algorithm flow.


  4. Select data type of the output variable from the Data Type drop-down. The options are Integer, Textual, Float, and Boolean.
  5. Click Create.


    The output variable is created and is added to the Custom Output Variables list.

Creating Multiple Custom Output Variables

You can create multiple custom output variables by providing a JSON string.

To create multiple custom output variables in your RubiPython code, follow the steps given below.

  1. On the RubiPython configuration page, click Add Multiple Custom Output Variable.
    Create Custom Variable page is displayed.
  2. Enter JSON strings in the following format –
    <Variable1 Name>, <Variable Type>, <Data Type>; <Variable2 Name>, <Variable Type>, <Data Type>; …<VariableN Name>, <Variable Type>, <Data Type>.

    (info)

    Note:

    Individual elements of the string should be separated by comma (,) and the strings should be separated by a semicolon (;).

    For example, to add the variables Name and Age of type Text and Integer respectively, the string would be – Name, Textual, Textual; Age, Numerical, Integer.

  3. Click Validate to validate the string.


    If the string is valid, a confirmation message is displayed. If the string is invalid, the application gives an error message. You can rectify the errors and try again.

  4. After you make sure the string is valid, Click Create.


The specified output variables are created and are added to the Custom Output Variables list.

(info)

Notes:

  • Please make sure not to use the 'Geographical' as the variable type.
  • Each variable should have three parameters in the JSON file - Variable Name, Variable Type, and Data type in string format.
  • If the JSON file does not follow the above string format, the application gives an error while validating.
  • The variable name should not be repeated.

Writing Custom Functions to Access Data using RubiPython

You can use custom functions in RubiPython to

  • read a dataset without connecting to the Reader node
  • write data to a file or an RDBMS table without using the Writer functionality
  • upload file to cloud storage
  • download file from cloud storage

The following sections explain these custom functions.

Reading Data from File

A sample Python code to read data from file is shown in the image below.

The table below explains the above code snippet.

Line of Code

Result

getReaderData(“datasetName”,“subdatasetName”)

This custom function checks the type of the dataset (Excel, CSV, and Text) and accordingly reads the dataset and returns it as an output of the function of dictionary type.

print2log(data)

Prints the Reader output data as a dictionary to the console log or custom component log.

 

(info)

Note:

In the getReaderData custom function, the dataset and the sub-dataset names are the same for all datasets except for RDBMS datasets. In the case of RDBMS datasets, the sub-dataset name is the name of the table added in the RDBMS dataset.

Writing Data to File

A sample Python code to write data to file is shown in the image below.

In the above code, the writeDataToFile custom function is used to append a row to the selected Carbo Fitness file (dataset).

  • dataToWrite stores the output data of the type dictionary to write to a file.
  • writeDataToFile(dataToWrite,“action”,“delimiter”,“datasetName”)custom function appends or overwrites the output data to the selected file (Excel, CSV, or Text datasets).

The table below describes the writeDataToFile function and parameters.

Function

Parameter

Remarks

writeDataToFile(dataToWrite,“action”,

“delimiter”,“datasetName”)

dataToWrite – Output data of type “dictionary” to be written to the file.

 —

action – The action to perform on the file. You can append or overwrite more than one row in the selected dataset.

It is of type String, and values can be overwrite and append.

delimiter – The character that separates the columns in a dataset.

It is of type String, and values can be – “,” / “|” / “            ” / “ ” (comma, pipe, tab, and space).

datasetName – Name of the dataset file to which you want to write.

It is of type String.

Another example of reading and writing to a file is shown below.

Reading Data from Table

You can use getReaderData to read the data from RDBMS Table.

In the above code,

  • getReaderData(“datasetName”,“subdatasetName”) custom function checks the type of the dataset and accordingly reads the dataset from the Reader and returns it as an output of the function of type dictionary.
  • print2log(dataToWrite) prints the output data as dictionary to the console log or custom component log.

    (info)

    Note:

    In the getReaderData custom function, the dataset and the sub-dataset names are the same for all datasets except for RDBMS datasets. In case of RDBMS datasets, sub-dataset name is the name of the table added in the RDBMS dataset.

Writing Data to Table

Similarly, you can use writeDataToTable to write the data into RDBMS Table.

The table below describes the writeDataToTable function and parameters.

Function

Parameters

Remarks

writeDataToTable(dataToWrite, “dropAndCreate”, “datasetName”, “tableName”, “strategy”, keyColumns)

dataToWrite – Output data of type “dictionary” to be written to RDBMS table.

 —

dropAndCreate – Flag that decides whether to delete an existing table and create a new one or overwrite the existing table.

  • It is of type Boolean.
  • The value can be True or False.
  • The default value is True that deletes the existing rows in the table.

datasetName – Name of the RDBMS dataset that contains the table you want to edit.

It is of type String.

tableName – Name of existing/new table in the RDBMS dataset that you want to edit.

It is of type String.

strategy – The database operation you want to perform on the specified table.

 

  • It is of type String, and possible values are insert, update, and delete.
  • You can insert, update, or delete more than one row or column in the selected table in the RDBMS dataset.

keyColumns – A list of column names that need to be defined as a primary key or are already defined as a primary key.

  • In the case of insert, the keyColumns parameter should be kept empty.
  • You cannot perform update action on columns that are defined as a primary key.

 

(info)

Notes:

  • The dataset to which you are writing should be already existing.
  • You can add an existing or a new table in the dataset while creating an RDBMS dataset. You can add more than one table.
  • You can apply the above custom functions on all types of datasets present in Rubiscape.
  • Ensure the variable types, data types, the number of columns, and the column names of your output and its corresponding input dataset are the same. If there is a mismatch, RubiPython gives an error.

Writing custom function for image dataset using RubiPython

To read the image data using custom function from Rubipython follow the steps given below.

  1. Select an image dataset from the reader section in the task pane.
  2. Drag and drop RubiPython on your workbook canvas.
  3. Connect required nodes to RubiPython.
  4. Select RubiPython and, in the Properties pane, click Configure


Line of Code

Result

df = inputdata[‘ReaderName’]

It creates a copy of inputData with the name df.

Ind

It declares a variable containing the list of index values.

gm = getImage(‘DatasetName’, ’InputDf’, ’List of Image objects’)

It gives image objects in n-dimensional arrays.

Output = {}

It creates a new variable named ‘output’ of the type dictionary.

Count

We initialize the count from 1.

for i in gm

It defines a ‘for’ loop for the values in gm.

Value

It combines the input and count in strings.

Output[value] = [i]

It assigns the n-dimensional arrays of the image to the value variable.

Count+=1

It increases the count by 1.

Return output

It returns the output.

 Reading the image data using metaImage function

Another way to read the image dataset is by using the ‘metaImage’ function. In rubipython ‘metaImage’ function reads the image dataset without connecting to the rubipython node. Its syntax is (metaImage('datasetname')).

Create a custom output variable to store the images. For creating custom output variables, refer to Creating Multiple Custom Output Variables.

After successfully running the Rubipython node, we select the face verification algorithm.

 In it, we select the source image and target image.

And then run the algorithm.

It identifies whether the face in the source image and target image are the same or not.

Uploading File to Cloud Storage

RubiPython provides a custom function to upload files to S3 server. The code syntax to upload a file to cloud storage is shown below.

filetoUpload = <Path of the local file to be uploaded to S3 server>

s3FileName = <Path on s3 server to which file is to be uploaded>

putObject = CloudFactory().uploadFile(filetoUpload, s3FileName)

Downloading File from Cloud Storage

RubiPython provides a custom function to download files stored on S3 server. The code syntax to download a file from cloud storage is shown below.

import pandas

localFileName = <Local file path>

s3FileName = <S3 server path of the file to be downloaded>

downloadObject = CloudFactory().downloadFile(s3FileName, localFileName)

If the downloaded file is a CSV, you can use the below syntax to read the contents of the downloaded file.

data = pd.read_csv(<Local file path>)

print2log(data)

Using Variables in RubiPython

The variables defined at the workbook/workflow level can be used in the RubiPython custom component.

To use a user-defined variable in RubiPython, follow the steps given below.

  1. Create your algorithm flow. Refer to Building Algorithm Flow in a Workbook Canvas.
  2. Drag and drop RubiPython on your workbook canvas.
  3. If required, connect other nodes to the RubiPython node in your algorithm flow.
  4. Select RubiPython and in the Properties pane, click Configure.
    The configuration page is displayed.
  5. Write the code in the RubiPython Code Editor. Refer to Writing Custom Code using RubiPython.
  6. To use a workbook/workflow level variable in the code statements, use @@ symbols before and after the variable name.
    For example, if the variable name is var1, you can use it in RubiPython as @@var1@@.
    A sample RubiPython code containing a user-defined variable is shown in the figure below.

Publishing RubiPython Code

You can publish the RubiPython code from a workbook and reuse it in another workbook or workflow. This feature is similar to publishing models in RubiStudio.

To publish RubiPython code, follow the steps given below.

  1. Write the RubiPython code as required. Refer to Writing Custom Code using RubiPython.
  2. Run the RubiPython node.
  3. After the node is successfully executed, select the node, click the vertical ellipsis ( ), and click Publish code.

After the code is successfully published, a confirmation message is displayed. This code is listed under Reusable Codes on the Rubiscape Home page.

(info)

Notes:

  • The published code is also available under Reusable Codes in Code Fusion under rubistudio in the Task Pane of a workbook.
  • The name of the published RubiPython node should be unique. If a RubiPython code with the same name already exists, the platform gives an error.

Reusing RubiPython Code

The published RubiPython code is available for reuse in workbooks and workflows in the same workspace.

To reuse a published code, follow the steps given below.

  1. Open the workbook or create a workbook. Refer to Opening a Workbook and Creating a Workbook.
  2. Click Reusable Codes under Code Fusion in rubistudio in the Task Pane.

    The available reusable codes are displayed as shown in the figure below.

  3. Double-click or drag-and-drop the node on the workbook canvas.
  4. To run the code, select the node, click the vertical ellipsis (), and click Run.

After the node executes successfully, a confirmation message is displayed.



Table of contents