Lemmatizer | |||||
Description | Lemmatization usually refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word. | ||||
Why to use | Textual Analysis – Pre Processing | ||||
When to use | When you want to get the base or dictionary form of words that has meaning. When you want to link words with similar meanings to one word. | When not to use | On numerical data. | ||
Prerequisites | It is used on textual data. | ||||
Input | Gone Going Went | Output | Go | ||
Related algorithms |
| Alternative algorithm | Stemmer | ||
Statistical Methods used | - | Limitations | In-depth linguistic knowledge is required to create dictionaries and look for the proper form of the word. |
Lemmatizer is located under Textual Analysis ( ) in Pre Processing, in the task pane on the left. Use drag-and-drop method to use algorithm in the canvas. Click the algorithm to view and select different properties for analysis.
Lemmatizer is an algorithm in morphological analysis and computational linguistics which identifies the lemma (or the dictionary form) of a word. In lemmatization, all the inflected forms of a word are grouped together so that they can be identified as a single item.
Lemmatization algorithms identify the intended part of speech as well as the meaning of a word in a sentence, as also in a larger context in the surrounding sentences and even the entire document.
Properties of Lemmatizer
The available properties of Lemmatizer are as shown in the figure given below.
The table given below describes different fields present on the properties of Lemmatizer.
Field | Description | Remark | |
---|---|---|---|
Task Name | It displays the name of the selected task. | You can click the text field to edit or modify the name of the task as required. | |
Text | It allows you to select the text for which you want to perform lemmatization. |
| |
Advanced | Node Configuration | It allows you to select the instance of the AWS server to provide control on the execution of a task in a workbook or workflow. | For more details, refer to Worker Node Configuration. |
Interpretation of Lemmatizer
The figure given below shows the result of Lemmatizer applied on Google News snippets.
In the figure, the column heading CLEText represents the text after the Lemmatizer is applied.
In the highlighted example, the word "cases" has been reduced to its lemma "case".
Table of Contents