Lemmatizer 

Description

Lemmatization usually refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word. 
Lemmatizer reduces the inflected words ensuring that the “root” word belongs to the language.

Why to use

Textual Analysis – Pre Processing 

When to use

When you want to get the base or dictionary form of words that has meaning. When you want to link words with similar meanings to one word.

When not to use

On numerical data.

Prerequisites

It is used on textual data. 

Input

Gone

Going

Went

Output

Go

Related algorithms

  • Case Convertor
  • Custom Words Remover
  • Frequent Words Remover
  • Punctuation Remover
  • Spelling Corrector
  • Stemmer
  • Advanced Entity Extraction
  • Word Correlation
  • Word Frequency

Alternative algorithm

Stemmer

Statistical Methods used

-


Limitations

In-depth linguistic knowledge is required to create dictionaries and look for the proper form of the word.
It can change the meaning of some textual strings.

Lemmatizer is located under Textual Analysis ( ) in Pre Processing, in the task pane on the left. Use drag-and-drop method to use algorithm in the canvas. Click the algorithm to view and select different properties for analysis.

Lemmatizer is an algorithm in morphological analysis and computational linguistics which identifies the lemma (or the dictionary form) of a word. In lemmatization, all the inflected forms of a word are grouped together so that they can be identified as a single item.
Lemmatization algorithms identify the intended part of speech as well as the meaning of a word in a sentence, as also in a larger context in the surrounding sentences and even the entire document.

Properties of Lemmatizer

The available properties of Lemmatizer are as shown in the figure given below.



The table given below describes different fields present on the properties of Lemmatizer.

Field

Description

Remark

Task Name

It displays the name of the selected task.

You can click the text field to edit or modify the name of the task as required.

Text

It allows you to select the text for which you want to perform lemmatization.

  • Only one data field can be selected
  • Textual data fields selected for the reader are visible.
  • Only textual data field can be selected
AdvancedNode ConfigurationIt allows you to select the instance of the AWS server to provide control on the execution of a task in a workbook or workflow.For more details, refer to  Worker Node Configuration.

Interpretation of Lemmatizer

The figure given below shows the result of Lemmatizer applied on Google News snippets.
In the figure, the column heading CLEText represents the text after the Lemmatizer is applied.
In the highlighted example, the word "cases" has been reduced to its lemma "case".



Table of Contents