Reads, cleans, and time‑adjusts a batch of files, returning a named list of data frames keyed by file date (YYYY‑MM‑DD).
Usage
process_all_water(
files,
col_names = NULL,
id_col = id_col2(),
drop_ids = NULL,
trans_col = trans_col2(),
start_col = start_col2(),
end_col = end_col2(),
drop_trans = NULL,
bin_col = bin_col2(),
bins = bins_wat2(),
select_cols = NULL,
bin_offset = bin_offset2(),
sep = ",",
header = FALSE,
daylight_change_duration = 60,
tz = tz2(),
adjust_dst = TRUE
)
Arguments
- files
What are the files you wish to process? This should be a character vector of all the file paths.
- col_names
A character vector of column names to assign when
header = FALSE
. This vector must match the number of columns in the raw data. Ifheader = TRUE
, the file’s existing column names are used andcol_names
is ignored.- id_col
What's the name of the column recording animal ID? This should be a Single string. (default:
"cow"
).- drop_ids
Which animals do you wish to drop? This should be a vector indicating values in
id_col
that you wish to remove (default:NULL
, so remove nothing).- trans_col
What's the name of the column recording transponder ID for each visit? This should be a single string. (default:
"transponder"
).- start_col
Name of the column recording the start time of an event (quoted), e.g.: start_col = "start"
- end_col
Name of the column recording the end time of an event (quoted). e.g.: end_col = "end"
- drop_trans
Which transponders do you wish to delete because they are not part of your study? This should be a vector indicating values in
trans_col
that you wish to remove (default:NULL
, so remove nothing).- bin_col
What's the name of the column recording the ID of the bin for each visit? This should be a single string. (default:
"bin"
).- bins
Which water bins are included in your study for analysis? This should be a numeric vector of bin IDs to keep. You can supply individual values (e.g.
c(1, 3, 5)
) or a sequence (e.g.2:4
). Default is 1:5- select_cols
Which columns in the dataframe do you wish to keep in your original data frame after cleaning? This should be a character vector indicating columns to retain in the final output. Default is
NULL
, so we select all columns.- bin_offset
A single numeric value to add to each matching bin ID. Default is 100.
- sep
Field separator; passed to
read.table()
. Defaults to","
for comma delimited files like.csv
and.DAT
.- header
Logical; does your data file have a header row (i.e., column names)? Defaults to
FALSE
. If yor file contains column names at the top, please set this toTRUE
.- daylight_change_duration
How many minutes does the clock jump or fall back on the day of daylight saving change? This should be an integer for the duration in minutes (default = 60).
- tz
A valid time zone name (default is "America/Vancouver"), used to determine DST rules. Use
OlsonNames()
to see all valid options.- adjust_dst
Do you want to apply the function (
daylight_saving_adjust()
) I designed to adjust timestamp for dates affected by Daylight Saving Time changes or not? This should be logical, default is TRUE. The timestamp adjustment would only be applied ifadjust_dst
is TRUE andtz
is set to be a timezone in North America.
Value
A named list of data frames, one per input file, named by date (YYYY‑MM-DD).
Within each datafarme, there is processed date
column.
Details
Steps:
Validate all inputs (types,
file_type
∈"feed","water"
).Extract a date information from each filename and parse it via
lubridate::ymd()
.Fetch Daylight Saving Time (DST) switch table for the relevant years (
dst_switch_day()
).Loop over each file:
Call either
process_feeder()
orprocess_water()
to do: - Safely read the CSV / DAT file - Rename columns - Drop unwanted cows & transponders - Keep only specified bins - Subset to desired columnsDrop any rows with
NA
.Call
daylight_saving_adjust()
to adjust timestamps for daylight saving change days.Standardize the columns recording start and end time of each event to be in the format of "yyyy-mm-dd hh:mm:ss".
Store processed dataframe the output list and name it by the date.
Examples
# 1) create three small water‐data CSVs in a temporary directory
tmp <- tempdir()
files <- file.path(tmp, paste0("VW2023042", 0:3, ".csv"))
for (i in seq_along(files)) {
toy <- data.frame(
cow = c("A", "B", "C"),
transponder = c("W1", "W2", "W3"),
bin = i + c(1, 2, 3),
start = c("06:00:00", "07:00:00", "08:00:00"),
end = c("06:05:00", "07:05:00", "08:05:00"),
stringsAsFactors = FALSE
)
write.csv(toy, files[i], row.names = FALSE)
}
# 2) process them in batch, shifting water‐bin IDs by +100
res <- process_all_water(
files = files,
bins = 2:4,
select_cols = c("cow", "bin", "start", "end"),
bin_offset = 100,
sep = ",",
header = TRUE,
tz = "America/Vancouver"
)
res
#> $`2023-04-20`
#> cow bin start end date
#> 1 A 102 2023-04-20 06:00:00 2023-04-20 06:05:00 2023-04-20
#> 2 B 103 2023-04-20 07:00:00 2023-04-20 07:05:00 2023-04-20
#> 3 C 104 2023-04-20 08:00:00 2023-04-20 08:05:00 2023-04-20
#>
#> $`2023-04-21`
#> cow bin start end date
#> 1 A 103 2023-04-21 06:00:00 2023-04-21 06:05:00 2023-04-21
#> 2 B 104 2023-04-21 07:00:00 2023-04-21 07:05:00 2023-04-21
#>
#> $`2023-04-22`
#> cow bin start end date
#> 1 A 104 2023-04-22 06:00:00 2023-04-22 06:05:00 2023-04-22
#>
#> $`2023-04-23`
#> [1] cow bin start end
#> <0 rows> (or 0-length row.names)
#>