### Exercise 3. Linear Regression

Operations of the Regression Analysis tools are similar regardless of the algorithm used. The participating variables are imported the same way. In this exercise, **Linear Regression** is run as an example.

1 Go to **ALS Forest > Regression Analysis > Linear Regression** to bring up the Linear Regression dialog window.

2 Import training data

2.1 Click on the button to locate **Sample data** for the import.

2.2 **Dependent Variable**: choose **Biomass** from the dropdown menu.

2.3 **Plot Type**: Set to match actual data – for **Square** plots, set the **Length**; for **Circular** plots, set the **Radius** value. For this exercise, the **Type** is set to **Circle** with a **Radius** of 10m.

2.4 **Optimize by considering location uncertainty**: check **Optimize**, and accept the default **Location Uncertainty** value of 5. The plot center (XY) will be moved around in the circle with a radius of 5 meter, and points within 10m (Radius of the survey plot) of the new XY location will be used for Regression Analysis modeling. The model that has the smallest Standard Deviation will be used eventually. This process aims to reduce the influence of plot location uncertainty in the field on the regression analysis results (Su et al. (2016)).

2.5 **X/Y**: Specify the columns containing plot center point coordinates.

Note: The imported sample data has to be a subset of the independent variable

3 Independent Variables

3.1 Select **CSV** for input format. Click on and then select ALSData_Normalized_Elevation Metrics.csv. All the existing variables will be listed in the **Independent Variables** field.

3.2 In this exercise, only these 15 Elevation Percentiles will be used as **Independent Variables**: 1st, 5th, 10th, 20th, 25th, 30th, 40th, 50th, 60th, 70th, 75th, 80th, 90th, 95th, 99th. Left click on or click and hold on multiple variables to select the other variables, and then click on to remove them.

Note: The amount of training samples must be not less than the amount of independent variables to build a Regression model.

To import TIFF files as Independent Variables instead of CSV, select TIFF, and import the TIFF files.

All imported TIFF files must share identical size and resolution.

Multiple Independent Variables can be imported at one time as TIFF files, but only one CSV file can be imported at a time.

4 **Linear Regression Methods: Enter and Stepwise**

**Enter**: All selected independent variables will be included in the model fitting process. This tutorial uses this method.**Stepwise**: In each model fitting iteration, one independent variable is considered as an addition to the linear regression model, and each variable will be retained or removed from the model based on statistical t-test. This is known as step-wise regression.

5 **Save Regression Model**: If checked, the resulting regression model will be saved at the output path as a.model file. This saved model can be used in the **Run Existing Regression Model** tool.

6 **Save Regression Dataset**: If checked, the resulting dataset will be saved at the output path as a CSV file. The first column of this file is the **Dependent Variable** and the following columns are **Independent Variables**.

7 **Accuracy Assessment**: Assess the fitted models using a **K-fold cross validation** method. The sample dataset is partitioned into K subsets. In each model fitting iteration, all but one subset of the data are used to train the model while the remaining one is used to validate the model.

8 **Output Path**: **Regression Analysis** outputs will be saved here, which includes model predictions (.tif file), regression analysis reports (.html file), and, if so specified by the user, resulting regression models (.model file) and regression datasets (.csv file).

9 **Model predictions (.tif file)**

The prediction of biomass in the spatial resolution of 20m. The value of each pixel of the .tif raster is the predicted biomass of that pixel area.

10 **Regression Analysis Reports**

The report consists of two parts. The first part is a **Summary** of the input and accuracy parameters of the analysis, including **Regress Type**, **Linear Regression Coefficients**, **K-Fold value**, **R value**, **R-Square**, **RMSE**, **Probability Value,** and the **Significance of the K-fold Test Result**. The second part lists the **Dependent** and **Independent Variables** of the **Regression Analysis**.