A fully cleaned dataset containing both feeding and drinking behavior data for cattle over a two-day period (2020-10-31 to 2020-11-01). This dataset is the result of applying quality control procedures and KNN-based outlier removal to both feed and water data, then combining them into a single dataset for integrated analysis.
Format
A list of 2 data frames, one for each date (2020-10-31, 2020-11-01), with each data frame containing the following 11 variables:
- transponder
integer, unique electronic ID for each bin
- cow
integer, animal ID number
- bin
numeric, bin location number (feed or water bin)
- start
POSIXct, timestamp when the visit started
- end
POSIXct, timestamp when the visit ended
- duration
integer, duration of the visit in seconds
- start_weight
numeric, weight of feed/water at start of visit
- end_weight
numeric, weight of feed/water at end of visit
- intake
numeric, amount consumed (kg or L) during the visit
- date
Date, calendar date of the visit
- intake
numeric, how fast the cow was eating/drinking (kg/s or L/s)
Source
clean_feed and clean_water after KNN outlier removal
Details
This dataset represents the final, fully cleaned dataset after applying multiple
layers of quality control. First, the qc()
function was used to fix basic issues like
double detections and negative values. Then, advanced outlier detection using K-Nearest
Neighbors (KNN) was applied through the knn_clean_feed()
and knn_clean_water()
functions
to remove improbable data points. Finally, the cleaned feed and water datasets
were combined using combine_feed_water()
. This dataset provides a comprehensive view of
both feeding and drinking behaviors in a single, analysis-ready format.
The KNN outlier detection emphasizes on removing data points with high rate and intake, and do not punish too much for data points with long duration. Because based on our experience, it's likely for a cow to have very long durations per visit, but very unlikely to have large intake in a short time, so we flagged outliers to catch visits with large intake and high rate.
Examples
# Access combined data for the first day
first_day <- clean_comb[["2020-10-31"]]
head(first_day)
#> # A tibble: 6 × 11
#> transponder cow bin start end duration
#> <int> <int> <dbl> <dttm> <dttm> <dbl>
#> 1 12448407 6020 1 2020-10-31 00:26:12 2020-10-31 00:27:36 84
#> 2 11954014 4044 1 2020-10-31 01:17:43 2020-10-31 01:22:13 270
#> 3 11954042 4072 1 2020-10-31 01:37:30 2020-10-31 01:37:52 22
#> 4 12200070 5124 1 2020-10-31 06:05:49 2020-10-31 06:07:52 123
#> 5 12448407 6020 1 2020-10-31 06:08:02 2020-10-31 06:09:44 102
#> 6 21292850 6069 1 2020-10-31 06:09:55 2020-10-31 06:12:05 130
#> # ℹ 5 more variables: start_weight <dbl>, end_weight <dbl>, intake <dbl>,
#> # date <date>, rate <dbl>