Word Embedding

Description

  • Word Embedding is a form of word representation that bridges the human understanding of language to that of a machine.
  • These are found to be useful representations of words and lead to better performance in the various tasks performed.

Why to use

Machine learning models cannot process text so we need to convert this textual data into numerical data. Hence, we have to use this algorithm.

When to use

When required to represent words or phrases in vector space with several dimensions

When not to use

In applications where antonyms or even synonyms are required to be used.

Prerequisites

None

Input

One textual column

Output

Vector norm – Higher the frequency of the word, larger is the value of the norm in this word embedding.

Statistical Methods Used

None

Limitations

Words with multiple meanings are often combined into a single representation.

Word Embedding is located under Textual Analysis (  ) in Pre processing on the left task pane. Alternatively, use the search bar for finding the Word Embedding feature. Use the drag-and-drop method or double-click to use the algorithm in the canvas. Click the algorithm to view and select different properties for analysis.

Properties of Word Embedding

The available properties of Word Embedding are shown below.

The table below describes the different properties of the Mann Whitney U Test.

Field

Description

Remark

Task Name


It is the name of the task selected on the workbook canvas.

  • You can click the text field to edit or modify the task's name.
  • Space between words is not allowed in the Task Name.

Text


It allows you to select one textual column.

  • You need to select textual data.
  • All data columns are displayed in this dropdown.
  • However if you select a non-textual data column, the algorithm will not work.

Advanced

Stop words

A dictionary of words that are restricted to use.

  • The default value is none.
  • You can choose multiple words. Each separated by a ','
  • Spaces between words are allowed.


Dimension

It represents the total number of features that are encoded in the word embedding.

  • Only integer type values are allowed.
  • You can choose any natural number starting from 1.
  • The selection of features is mostly done automatically in the training process by using hidden layer.

Node Configuration

It allows you to select the instance of the AWS server to provide control over the execution of a task in a workbook or workflow.

For more details, refer to Worker Node Configuration.

Example of Word embedding

An employee is given the task of converting a specific set of words into machine readable vector form while maintaining that the words that have similar meaning are placed in close spatial capacity. He uses Word Embedding to achieve this.
Below is a snippet of the output data-

Further, the Result page is as follows.

The result page consists of the following sections:

  1. Index:
    This section displays the index.
  2. Word:
    This section displays the word categories available in the dataset.
  3. Vector Norm:
  • These are the metrics created by using the stop words and the words that are available in the selected data column. They give the vector form for the categories in the dataset.
  • The higher the word frequency, the larger is the norm of this word embedding.
  • If the words are used in different contexts, the norm of the vector decreases.

Table of Contents