Reads, cleans, and time‑adjusts a batch of files, returning a named list of data frames keyed by file date (YYYY‑MM‑DD).
Usage
process_all_feed(
files,
col_names = NULL,
id_col = id_col2(),
drop_ids = NULL,
trans_col = trans_col2(),
start_col = start_col2(),
end_col = end_col2(),
drop_trans = NULL,
bin_col = bin_col2(),
bins = bins_feed2(),
select_cols = NULL,
sep = ",",
header = FALSE,
daylight_change_duration = 60,
tz = tz2(),
adjust_dst = TRUE
)
Arguments
- files
What are the files you wish to process? This should be a character vector of all the file paths.
- col_names
A character vector of column names to assign when
header = FALSE
. This vector must match the number of columns in the raw data. Ifheader = TRUE
, the file’s existing column names are used andcol_names
is ignored.- id_col
What's the name of the column recording animal ID? This should be a Single string. (default:
"cow"
).- drop_ids
Which animals do you wish to drop? This should be a vector indicating values in
id_col
that you wish to remove (default:NULL
, so remove nothing).- trans_col
What's the name of the column recording transponder ID for each visit? This should be a single string. (default:
"transponder"
).- start_col
Name of the column recording the start time of an event (quoted), e.g.: start_col = "start"
- end_col
Name of the column recording the end time of an event (quoted). e.g.: end_col = "end"
- drop_trans
Which transponders do you wish to delete because they are not part of your study? This should be a vector indicating values in
trans_col
that you wish to remove (default:NULL
, so remove nothing).- bin_col
What's the name of the column recording the ID of the bin for each visit? This should be a single string. (default:
"bin"
).- bins
Which feed bins are included in your study for analysis? This should be a numeric vector of bin IDs to keep. You can supply individual values (e.g.
c(1, 3, 5)
) or a sequence (e.g.2:4
). Default is set to 1:30- select_cols
Which columns in the dataframe do you wish to keep in your original data frame after cleaning? This should be a character vector indicating columns to retain in the final output. Default is
NULL
, so we select all columns.- sep
Field separator; passed to
read.table()
. Defaults to","
for comma delimited files like.csv
and.DAT
.- header
Logical; does your data file have a header row (i.e., column names)? Defaults to
FALSE
. If yor file contains column names at the top, please set this toTRUE
.- daylight_change_duration
How many minutes does the clock jump or fall back on the day of daylight saving change? This should be an integer for the duration in minutes (default = 60).
- tz
A valid time zone name (default is "America/Vancouver"), used to determine DST rules. Use
OlsonNames()
to see all valid options.- adjust_dst
Do you want to apply the function (
daylight_saving_adjust()
) I designed to adjust timestamp for dates affected by Daylight Saving Time changes or not? This should be logical, default is TRUE. The timestamp adjustment would only be applied ifadjust_dst
is TRUE andtz
is set to be a timezone in North America.
Value
A named list of data frames, one per input file, named by date (YYYY‑MM-DD).
Within each datafarme, there is processed date
column.
Details
Steps:
Validate all inputs (types,
file_type
∈"feed","water"
).Extract a date information from each filename and parse it via
lubridate::ymd()
.Fetch Daylight Saving Time (DST) switch table for the relevant years (
dst_switch_day()
).Loop over each file:
Call either
process_feeder()
orprocess_water()
to do: - Safely read the CSV / DAT file - Rename columns - Drop unwanted cows & transponders - Keep only specified bins - Subset to desired columnsDrop any rows with
NA
.Call
daylight_saving_adjust()
to adjust timestamps for daylight saving change days.Standardize the columns recording start and end time of each event to be in the format of "yyyy-mm-dd hh:mm:ss".
Store processed dataframe the output list and name it by the date.
Examples
tmp <- tempdir()
# create two CSVs in a temporary directory
files <- file.path(tmp, paste0("VR2022010", 1:3, ".csv"))
for (i in seq_along(files)) {
write.csv(
data.frame(
cow = c("A", "B", "C"),
transponder = c("X1", "X2", "X3"),
bin = i + 0:2,
start = c("01:00:00", "02:00:00", "03:00:00"),
end = c("01:05:00", "02:06:01", "03:03:00")
),
file = files[i],
row.names = FALSE
)
}
res <- process_all_feed(
files = files,
bins = 1:10,
select_cols = c("cow", "bin", "start", "end"),
sep = ",",
header = TRUE,
tz = "America/Vancouver"
)
res
#> $`2022-01-01`
#> cow bin start end date
#> 1 A 1 2022-01-01 01:00:00 2022-01-01 01:05:00 2022-01-01
#> 2 B 2 2022-01-01 02:00:00 2022-01-01 02:06:01 2022-01-01
#> 3 C 3 2022-01-01 03:00:00 2022-01-01 03:03:00 2022-01-01
#>
#> $`2022-01-02`
#> cow bin start end date
#> 1 A 2 2022-01-02 01:00:00 2022-01-02 01:05:00 2022-01-02
#> 2 B 3 2022-01-02 02:00:00 2022-01-02 02:06:01 2022-01-02
#> 3 C 4 2022-01-02 03:00:00 2022-01-02 03:03:00 2022-01-02
#>
#> $`2022-01-03`
#> cow bin start end date
#> 1 A 3 2022-01-03 01:00:00 2022-01-03 01:05:00 2022-01-03
#> 2 B 4 2022-01-03 02:00:00 2022-01-03 02:06:01 2022-01-03
#> 3 C 5 2022-01-03 03:00:00 2022-01-03 03:03:00 2022-01-03
#>