
Process multiple days of water data and remove outliers using KNN
Source:R/knn_outliers.R
knn_clean_water.Rd
Process multiple days of water data and remove outliers using KNN
Usage
knn_clean_water(
water_data,
k = 50,
threshold_percentile = 99.9,
custom_scaling = list(rate = 20, intake = 1, duration = 0.01),
intake_col = intake_col2(),
duration_col = duration_col2(),
remove_outliers = FALSE,
date_col = "date"
)
Arguments
- water_data
A list of daily water data frames or a single data frame.
- k
Integer. Number of nearest neighbors to consider (default: 50). Will be automatically adjusted if it exceeds the number of rows in the data.
- threshold_percentile
Numeric. Percentile threshold for outlier detection. Points with average distances above this percentile are considered outliers. Must be between 0 and 100. Default is 99.
- custom_scaling
A named list with scaling factors for input variables. Default is NULL, which means no scaling is applied (all factors = 1).
- intake_col
Character. Name of the column containing intake data (default: from global_var.R).
- duration_col
Character. Name of the column containing duration data (default: from global_var.R).
- remove_outliers
Logical. Whether to remove outliers from the data frame.
- date_col
Character. Name of the date column if water_data is a list that needs to be unmerged (default: "date").