This function clusters individual feeding visits into meals using DBSCAN (Density-Based Spatial Clustering). For each animal on each day, visits that occur close together in time are grouped into meals. The function automatically determines optimal clustering parameters if not specified.
Usage
cluster_meals(
data,
eps = NULL,
min_pts = 2,
method = "gmm",
percentile = 0.93,
eps_scope = "all_animals",
lower_bound = 5,
upper_bound = 60,
use_log_transform = TRUE,
log_multiplier = 20,
log_offset = 1,
id_col = id_col2(),
start_col = start_col2(),
end_col = end_col2(),
bin_col = bin_col2(),
intake_col = intake_col2(),
dur_col = duration_col2(),
tz = tz2()
)
Arguments
- data
A single dataframe or list of dataframes containing feeding visit data
- eps
DBSCAN epsilon parameter (maximum time gap in minutes between visits in same meal). If NULL (default), the parameter is automatically determined using statistical methods.
- min_pts
DBSCAN minimum points parameter (minimum visits to form a dense cluster). Default is 2. This follows the DBSCAN recommendation of setting min_pts to D + 1 where D is the number of dimensions (we have only 1 dimension: time, so min_pts = 1 + 1 = 2).
- method
Character string specifying the automatic eps determination method when eps=NULL. Options are "gmm" (default), "percentile", or "both".
- percentile
Numeric value between 0 and 1 specifying which percentile to use for automatic eps determination when method="percentile" or "both". Default is 0.93.
- eps_scope
Character string specifying the scope for automatic eps determination when eps=NULL. Options are:
"all_animals" (default): calculate an universal optimal interval (eps) for all animals across all days
"one_animal_all_days": calculate optimal interval (eps) differently for different animals, but within each animal, we use the same eps across all days
"one_animal_single_day": calculate optimal interval (eps) differently for different animals, and calculate different eps for each day within the same animal
- lower_bound
Numeric value for lower bound of the optimal interval, if NULL, no lower bound is applied. Default is 5.
- upper_bound
Numeric value for upper bound of the optimal interval, if NULL, no upper bound is applied. Default is 60.
- use_log_transform
Logical indicating whether to use log transformation for GMM fitting. Default is TRUE. Log transformation often provides better separation of within-meal and between-meal gaps.
- log_multiplier
Numeric value for multiplier of log transformation. Default is 20.
- log_offset
Numeric value for offset of log transformation. Default is 1.
- id_col
Animal ID column name (default current global value from
id_col2()
)- start_col
Start time column name (default current global value from
start_col2()
)- end_col
End time column name (default current global value from
end_col2()
)- bin_col
Bin ID column name (default current global value from
bin_col2()
)- intake_col
Intake column name (default current global value from
intake_col2()
)- dur_col
Duration column name (default current global value from
duration_col2()
)- tz
Timezone (default current global value from
tz2()
)
Value
A dataframe with meal-level summaries containing:
id_col2()
Animal ID
- date
Date
- meal_id
Sequential meal number within animal-day
- meal_start
Start time of first visit in meal
- meal_end
End time of last visit in meal
- meal_duration
Total time from meal start to end (seconds)
- visit_count
Number of visits in meal
- total_intake
Sum of intake across all visits in meal
- feeding_percentage
Percentage of meal time spent actively feeding
- unique_bins_count
Number of unique bins visited in meal
Details
The function uses DBSCAN clustering on visit start times (converted to minutes from midnight). Visits are clustered based on temporal proximity, with the eps parameter determining the maximum time gap between visits in the same meal. Single visits or visits classified as "noise" by DBSCAN are treated as noise points and excluded from meal summaries.
When eps=NULL, the function automatically determines the optimal parameter using:
93rd percentile of inter-visit gaps
Gaussian mixture modeling
We will pick the minimum eps of the two methods, with a minimum of 5 minutes and a maximum of 60 minutes, to be conservative