In its general sense, data preprocessing is a data mining technique to transform raw data into useful and analyzable form. It involves data cleaning, data transformation, and data reduction.
With respect to textual analysis, pre-processing involves multiple algorithms dedicated to convert a raw and imprecise data into cleaned and ready-to-analyse data. Each algorithm has its own specific objective to be fulfilled. This can be case conversion, lemmatization, counting word frequency, removal of punctuations, extraction of advanced entity, and so on. These algorithms are either used in singularity or in combination with other algorithms.
In rubiscape, the Pre Processing algorithms are,

  • Case Convertor
  • Custom Words Remover
  • Frequent Words Remover
  • Lemmatizer
  • Punctuation
  • Remover
  • Spelling Corrector
  • Stemmer
  • Advanced Entity Extraction
  • Word Correlation
  • Word Frequency

In the task pane, click Textual analysis, and then click Pre Processing.

For more information, refer to Pre-processing Algorithms