Poisson Regression

Description

Poisson Regression is a type of linear regression used to model the countable data.

Why to use

For regression analysis of count data

When to use

For numerical variables

When not to use

For textual variables

Prerequisites

  • The data should contain variables having countable data points.
  • The data should not contain any missing/NaN values.

Input

Numerical variable which is countable.

Output

  • Regression Key Performance Indicators (KPIs)
  • Regression Statistics
  • Actual Vs. Predicted scatter plot

Statistical Methods used

  • Deviance
  • Mean Absolute Error
  • Mean Squared Error

Limitations

It can be used only on numerical data.

Poisson Regression is located under Machine Learning () under Regression, in the left task pane. Use the drag-and-drop method to use the algorithm in the canvas. Click the algorithm to view and select different properties for analysis.

Refer to Properties of Poisson Regression.

Properties of Poisson Regression

The available properties of the Poisson Regression are as shown in the figure below.

The table below describes the different fields present on the Properties pane of the Poisson Regression.

Field

Description

Remark

Task Name

It is the name of the task selected on the workbook canvas.

You can click the text field to edit or modify the name of the task as required.

Dependent Variable

It allows you to select the numerical variable on which the regression is to be applied.

You can select any numerical type of variable which contains count values.

Advanced

Alpha

It allows you to set the level of significance.

The default value is 1.0.

Maximum Iterations

It allows you to enter the maximum number of iterations.

The default value is 100.

Tolerance

It allows you to enter the precision of the solution.


  • It is the stopping criterion defined to stop the number of iterations based on the type of solver used and the objective function.
  • The default value is 0.0001.

Fit Intercept

It allows you to select whether you want to calculate constant (c) value for your model.

  • You can select either True or False.
  • Selecting True will calculate the value of the constant.
  • The default value is True.

Warm Start

It allows you to select whether you want to use the existing fitted model attributes to initialize the new model in the next call to fit.


  • You can select either True or False.
  • Selecting True will use the existing fitted model attributes to initialize the new model.
  • The default value is False.

Verbose

It allows you to select whether you want to enable logging.

  • Verbose is an option for producing detailed logging information.
  • If verbose is greater than 0, the enabled algorithm running process becomes very slow.
  • The default value is 0.

Dimensionality Reduction

It allows you to select the method for dimensionality reduction.

  • The available options are – None and PCA.
  • The default value is None.

Add result as a variable

It allows you to select the KPIs to be displayed in the output.

The available options are

  • Actual number of iterations
  • Deviance
  • Intercept
  • MAE (Mean Absolute Error)
  • MSE (Mean Squared Error)

Node Configuration

It allows you to select the instance of the AWS server to provide control on the execution of a task in a workbook or workflow.

For more details, refer to Worker Node Configuration.

Hyper Parameter

Optimization

It allows you to select the parameters for optimization.

For more details, refer to Hyperparameter Optimization.

Example of Poisson Regression

Consider a dataset of the count of the number of people crossing the Brooklyn Bridge on various dates. The dataset also contains data related to high and low temperatures and precipitation on those days. A snippet of input data is shown in the figure below.

We select BB_Count as the Dependent Variable. The Result page of the Poisson Regression is displayed in the figure below.

As seen in the above figure, the KPIs for Poisson Regression, the Regression Statistics containing coefficients for the independent variables, and a scatter plot of Actual Vs Predicted count is displayed on the Result page.

When you hover over any point of the scatter plot, you see the Predicted Count and Actual Count values for the data point.

Table of Contents