Overview

What

Understanding the diverse types of datasets in Rubiscape and their creation process.

When

When you want to use data from various sources in your algorithm flows.

Why

To extract data from various sources and create datasets as per your requirements.

Where

Inside a workspace that is assigned to you.

Who

A user with dataset creation rights.

How

The dataset creation process is described in the following sections.


A dataset is a compilation or collection of data, usually in the tabular form. However, non-tabular datasets can also be compiled, as in the case of an XML file, where data appears in the form of marked-up strings of characters.
In machine learning, data is mostly categorized into four types.

Numerical dataCategorical dataTime-series dataTextual dataGeographical Data

The data types and corresponding datasets supported in Rubiscape are given below.

As shown in the above figure, Rubiscape supports various data sources under each of the dataset types.
The dataset creation process for these types is explained in the sections that follow.

Data TypesSocial MediaRDBMSFileHadoopAPIEmail
Datasets
  • Twitter
  • RSS
  • Facebook
  • PostgreSQL
  • SQL
  • MySQL
  • Oracle
  • ODBC
  • SSAS
  • Snowflake
  • Vertica
  • Excel
  • CSV
  • Text
  • JSON
  • Image
  • HDFS
  • Hive
  • HBase
  • Impala
  • Google News
  • Video Stream
  • Google Spreadsheet
  • Google Big Query
  • Email

Your Rating: