Cluster feeding visits into meals using DBSCAN — cluster

This function clusters individual feeding visits into meals using DBSCAN (Density-Based Spatial Clustering). For each animal on each day, visits that occur close together in time are grouped into meals. The function automatically determines optimal clustering parameters if not specified.

Usage

cluster_meals(
  data,
  eps = NULL,
  min_pts = 2,
  method = "gmm",
  percentile = 0.93,
  eps_scope = "all_animals",
  lower_bound = 5,
  upper_bound = 60,
  use_log_transform = TRUE,
  log_multiplier = 20,
  log_offset = 1,
  id_col = id_col2(),
  start_col = start_col2(),
  end_col = end_col2(),
  bin_col = bin_col2(),
  intake_col = intake_col2(),
  dur_col = duration_col2(),
  tz = tz2()
)

Arguments

data

A single dataframe or list of dataframes containing feeding visit data

eps

DBSCAN epsilon parameter (maximum time gap in minutes between visits in same meal). If NULL (default), the parameter is automatically determined using statistical methods.

min_pts

DBSCAN minimum points parameter (minimum visits to form a dense cluster). Default is 2. This follows the DBSCAN recommendation of setting min_pts to D + 1 where D is the number of dimensions (we have only 1 dimension: time, so min_pts = 1 + 1 = 2).

method

Character string specifying the automatic eps determination method when eps=NULL. Options are "gmm" (default), "percentile", or "both".

percentile

Numeric value between 0 and 1 specifying which percentile to use for automatic eps determination when method="percentile" or "both". Default is 0.93.

eps_scope

Character string specifying the scope for automatic eps determination when eps=NULL. Options are:

"all_animals" (default): calculate an universal optimal interval (eps) for all animals across all days
"one_animal_all_days": calculate optimal interval (eps) differently for different animals, but within each animal, we use the same eps across all days
"one_animal_single_day": calculate optimal interval (eps) differently for different animals, and calculate different eps for each day within the same animal

lower_bound

Numeric value for lower bound of the optimal interval, if NULL, no lower bound is applied. Default is 5.

upper_bound

Numeric value for upper bound of the optimal interval, if NULL, no upper bound is applied. Default is 60.

use_log_transform

Logical indicating whether to use log transformation for GMM fitting. Default is TRUE. Log transformation often provides better separation of within-meal and between-meal gaps.

log_multiplier

Numeric value for multiplier of log transformation. Default is 20.

log_offset

Numeric value for offset of log transformation. Default is 1.

id_col

Animal ID column name (default current global value from id_col2())

start_col

Start time column name (default current global value from start_col2())

end_col

End time column name (default current global value from end_col2())

bin_col

Bin ID column name (default current global value from bin_col2())

intake_col

Intake column name (default current global value from intake_col2())

dur_col

Duration column name (default current global value from duration_col2())

tz

Timezone (default current global value from tz2())

Value

A dataframe with meal-level summaries containing:

id_col2(): Animal ID
date: Date
meal_id: Sequential meal number within animal-day
meal_start: Start time of first visit in meal
meal_end: End time of last visit in meal
meal_duration: Total time from meal start to end (seconds)
visit_count: Number of visits in meal
total_intake: Sum of intake across all visits in meal
feeding_percentage: Percentage of meal time spent actively feeding
unique_bins_count: Number of unique bins visited in meal

Details

The function uses DBSCAN clustering on visit start times (converted to minutes from midnight). Visits are clustered based on temporal proximity, with the eps parameter determining the maximum time gap between visits in the same meal. Single visits or visits classified as "noise" by DBSCAN are treated as noise points and excluded from meal summaries.

When eps=NULL, the function automatically determines the optimal parameter using:

93rd percentile of inter-visit gaps
Gaussian mixture modeling
We will pick the minimum eps of the two methods, with a minimum of 5 minutes and a maximum of 60 minutes, to be conservative

Examples


# Cluster meals with automatic parameter determination (all_fed is a list of 
# dataframes included in the package)
meals <- cluster_meals(all_fed[[1]], eps = 90, min_pts = 2, id_col="cow", 
                       start_col="start", end_col="end", bin_col="bin", 
                       intake_col="intake", dur_col="duration")