The goal of this vignette is to introduce you to clock’s high-level
API, which works directly on R’s built-in date-time types, Date and
POSIXct. For an overview of all of the functionality in the high-level
API, check out the pkgdown reference section, High
Level API. One thing you should immediately notice is that every
function specific to R’s date and date-time types are prefixed with
date_*()
. There are also additional functions for
arithmetic (add_*()
) and getting (get_*()
) or
setting (set_*()
) components that are also used by other
types in clock.
As you’ll quickly see in this vignette, one of the main goals of clock is to guard you, the user, from unexpected issues caused by frustrating date manipulation concepts like invalid dates and daylight saving time. It does this by letting you know as soon as one of these issues happens, giving you the power to handle it explicitly with one of a number of different resolution strategies.
Building
To create a vector of dates, you can use date_build()
.
This allows you to specify the components individually.
date_build(2019, 2, 1:5)
#> [1] "2019-02-01" "2019-02-02" "2019-02-03" "2019-02-04" "2019-02-05"
If you happen to specify an invalid date, you’ll get an error message:
date_build(2019, 1:12, 31)
#> Error in `invalid_resolve()`:
#> ! Invalid date found at location 2.
#> ℹ Resolve invalid date issues by specifying the `invalid` argument.
One way to resolve this is by specifying an invalid date resolution
strategy using the invalid
argument. There are multiple
options, but in this case we’ll ask for the invalid dates to be set to
the previous valid moment in time.
date_build(2019, 1:12, 31, invalid = "previous")
#> [1] "2019-01-31" "2019-02-28" "2019-03-31" "2019-04-30" "2019-05-31"
#> [6] "2019-06-30" "2019-07-31" "2019-08-31" "2019-09-30" "2019-10-31"
#> [11] "2019-11-30" "2019-12-31"
To learn more about invalid dates, check out the documentation for
invalid_resolve()
.
If we were actually after the “last day of the month”, an easier way to specify this would have been:
date_build(2019, 1:12, "last")
#> [1] "2019-01-31" "2019-02-28" "2019-03-31" "2019-04-30" "2019-05-31"
#> [6] "2019-06-30" "2019-07-31" "2019-08-31" "2019-09-30" "2019-10-31"
#> [11] "2019-11-30" "2019-12-31"
You can also create date-times using date_time_build()
,
which generates a POSIXct. Note that you must supply a time zone!
date_time_build(2019, 1:5, 1, 2, 30, zone = "America/New_York")
#> [1] "2019-01-01 02:30:00 EST" "2019-02-01 02:30:00 EST"
#> [3] "2019-03-01 02:30:00 EST" "2019-04-01 02:30:00 EDT"
#> [5] "2019-05-01 02:30:00 EDT"
If you “build” a time that doesn’t exist, you’ll get an error. For
example, on March 8th, 2020, there was a daylight saving time gap of 1
hour in the America/New_York time zone that took us from
01:59:59
directly to 03:00:00
, skipping the 2
o’clock hour entirely. Let’s “accidentally” create a time in that
gap:
date_time_build(2019:2021, 3, 8, 2, 30, zone = "America/New_York")
#> Error in `as_zoned_time()`:
#> ! Nonexistent time due to daylight saving time at location 2.
#> ℹ Resolve nonexistent time issues by specifying the `nonexistent` argument.
To resolve this issue, we can specify a nonexistent time resolution
strategy through the nonexistent
argument. There are a
number of options, including rolling forward or backward to the next or
previous valid moments in time:
zone <- "America/New_York"
date_time_build(2019:2021, 3, 8, 2, 30, zone = zone, nonexistent = "roll-forward")
#> [1] "2019-03-08 02:30:00 EST" "2020-03-08 03:00:00 EDT"
#> [3] "2021-03-08 02:30:00 EST"
date_time_build(2019:2021, 3, 8, 2, 30, zone = zone, nonexistent = "roll-backward")
#> [1] "2019-03-08 02:30:00 EST" "2020-03-08 01:59:59 EST"
#> [3] "2021-03-08 02:30:00 EST"
Parsing
Parsing dates
To parse dates, use date_parse()
. Parsing dates requires
a format string, a combination of commands that
specify where date components are in your string. By default, it assumes
that you’re working with dates in the form "%Y-%m-%d"
(year-month-day).
date_parse("2019-01-05")
#> [1] "2019-01-05"
You can change the format string using format
:
date_parse("January 5, 2020", format = "%B %d, %Y")
#> [1] "2020-01-05"
Various different locales are supported for parsing month and weekday names in different languages. To parse a French month:
date_parse(
"juillet 10, 2021",
format = "%B %d, %Y",
locale = clock_locale("fr")
)
#> [1] "2021-07-10"
You can learn about more locale options in the documentation for
clock_locale()
.
If you have heterogeneous dates, you can supply multiple format strings:
x <- c("2020/1/5", "10-03-05", "2020/2/2")
formats <- c("%Y/%m/%d", "%y-%m-%d")
date_parse(x, format = formats)
#> [1] "2020-01-05" "2010-03-05" "2020-02-02"
Parsing date-times
You have four options when parsing date-times:
date_time_parse()
: For strings like"2020-01-01 01:02:03"
where there is neither a time zone offset nor a full (not abbreviated!) time zone name.date_time_parse_complete()
: For strings like"2020-01-01T01:02:03-05:00[America/New_York]"
where there is both a time zone offset and time zone name present in the string.date_time_parse_abbrev()
: For strings like"2020-01-01 01:02:03 EST"
where there is a time zone abbreviation in the string.date_time_parse_RFC_3339()
: For strings like"2020-01-01T01:02:03Z"
or"2020-01-01T01:02:03-05:00"
, which are in RFC 3339 format and are intended to be interpreted as UTC.
date_time_parse()
date_time_parse()
requires a zone
argument,
and will ignore any other zone information in the string (i.e. if you
tried to specify %z
and %Z
). The default
format string is "%Y-%m-%d %H:%M:%S"
.
date_time_parse("2020-01-01 01:02:03", "America/New_York")
#> [1] "2020-01-01 01:02:03 EST"
If you happen to parse an invalid or ambiguous date-time, you’ll get an error. For example, on November 1st, 2020, there were two 1 o’clock hours in the America/New_York time zone due to a daylight saving time fallback. You can see that if we parse a time right before the fallback, and then shift it forward by 1 second, and then 1 hour and 1 second, respectively:
before <- date_time_parse("2020-11-01 00:59:59", "America/New_York")
# First 1 o'clock
before + 1
#> [1] "2020-11-01 01:00:00 EDT"
# Second 1 o'clock
before + 1 + 3600
#> [1] "2020-11-01 01:00:00 EST"
The following string doesn’t include any information about which of these two 1 o’clocks it belongs to, so it is considered ambiguous. Ambiguous times will error when parsing:
date_time_parse("2020-11-01 01:30:00", "America/New_York")
#> Error in `as_zoned_time()`:
#> ! Ambiguous time due to daylight saving time at location 1.
#> ℹ Resolve ambiguous time issues by specifying the `ambiguous` argument.
To fix that, you can specify an ambiguous time resolution strategy
with the ambiguous
argument.
zone <- "America/New_York"
date_time_parse("2020-11-01 01:30:00", zone, ambiguous = "earliest")
#> [1] "2020-11-01 01:30:00 EDT"
date_time_parse("2020-11-01 01:30:00", zone, ambiguous = "latest")
#> [1] "2020-11-01 01:30:00 EST"
date_time_parse_complete()
date_time_parse_complete()
doesn’t have a
zone
argument, and doesn’t require ambiguous
or nonexistent
arguments, since it assumes that the string
you are providing is completely unambiguous. The only way this is
possible is by having both a time zone offset, specified by
%z
, and a full time zone name, specified by
%Z
, in the string.
The following is an example of an “extended” RFC 3339 format used by
Java 8’s time library to specify complete date-time strings. This is
something that date_time_parse_complete()
can parse. The
default format string follows this extended format, and is
"%Y-%m-%dT%H:%M:%S%z[%Z]"
.
x <- "2020-01-01T01:02:03-05:00[America/New_York]"
date_time_parse_complete(x)
#> [1] "2020-01-01 01:02:03 EST"
date_time_parse_abbrev()
date_time_parse_abbrev()
is useful when your date-time
strings contain a time zone abbreviation rather than a time zone offset
or full time zone name.
x <- "2020-01-01 01:02:03 EST"
date_time_parse_abbrev(x, "America/New_York")
#> [1] "2020-01-01 01:02:03 EST"
The string is first parsed as a naive time without considering the
abbreviation, and is then converted to a zoned-time using the supplied
zone
. If an ambiguous time is parsed, the abbreviation is
used to resolve the ambiguity.
x <- c(
"1970-10-25 01:30:00 EDT",
"1970-10-25 01:30:00 EST"
)
date_time_parse_abbrev(x, "America/New_York")
#> [1] "1970-10-25 01:30:00 EDT" "1970-10-25 01:30:00 EST"
You might be wondering why you need to supply zone
at
all. Isn’t the abbreviation enough? Unfortunately, multiple countries
use the same time zone abbreviations, even though they have different
time zones. This means that, in many cases, the abbreviation alone is
ambiguous. For example, both India and Israel use IST
for
their standard times.
x <- "1970-01-01 02:30:30 IST"
# IST = India Standard Time
date_time_parse_abbrev(x, "Asia/Kolkata")
#> [1] "1970-01-01 02:30:30 IST"
# IST = Israel Standard Time
date_time_parse_abbrev(x, "Asia/Jerusalem")
#> [1] "1970-01-01 02:30:30 IST"
date_time_parse_RFC_3339()
date_time_parse_RFC_3339()
is useful when your date-time
strings come from an API, which means they are likely in an ISO 8601 or
RFC 3339 format, and should be interpreted as UTC.
The default format string parses the typical RFC 3339 format of
"%Y-%m-%dT%H:%M:%SZ"
.
x <- "2020-01-01T01:02:03Z"
date_time_parse_RFC_3339(x)
#> [1] "2020-01-01 01:02:03 UTC"
If your date-time strings contain a numeric offset from UTC rather
than a "Z"
, then you’ll need to set the offset
argument to one of the following:
-
"%z"
if the offset is of the form"-0500"
. -
"%Ez"
if the offset is of the form"-05:00"
.
x <- "2020-01-01T01:02:03-0500"
date_time_parse_RFC_3339(x, offset = "%z")
#> [1] "2020-01-01 06:02:03 UTC"
x <- "2020-01-01T01:02:03-05:00"
date_time_parse_RFC_3339(x, offset = "%Ez")
#> [1] "2020-01-01 06:02:03 UTC"
Grouping, rounding and shifting
When performing time-series related data analysis, you often need to summarize your series at a less precise precision. There are many different ways to do this, and the differences between them are subtle, but meaningful. clock offers three different sets of functions for summarization:
Grouping
Grouping allows you to summarize a component of a date or date-time within other components. An example of this is grouping by day of the month, which summarizes the day component within the current year-month.
x <- seq(date_build(2019, 1, 20), date_build(2019, 2, 5), by = 1)
x
#> [1] "2019-01-20" "2019-01-21" "2019-01-22" "2019-01-23" "2019-01-24"
#> [6] "2019-01-25" "2019-01-26" "2019-01-27" "2019-01-28" "2019-01-29"
#> [11] "2019-01-30" "2019-01-31" "2019-02-01" "2019-02-02" "2019-02-03"
#> [16] "2019-02-04" "2019-02-05"
# Grouping by 5 days of the current month
date_group(x, "day", n = 5)
#> [1] "2019-01-16" "2019-01-21" "2019-01-21" "2019-01-21" "2019-01-21"
#> [6] "2019-01-21" "2019-01-26" "2019-01-26" "2019-01-26" "2019-01-26"
#> [11] "2019-01-26" "2019-01-31" "2019-02-01" "2019-02-01" "2019-02-01"
#> [16] "2019-02-01" "2019-02-01"
The thing to note about grouping by day of the month is that at the
end of each month, the groups restart. So this created groups for
January of
[1, 5], [6, 10], [11, 15], [16, 20], [21, 25], [26, 30], [31]
.
You can also group by month or year:
date_group(x, "month")
#> [1] "2019-01-01" "2019-01-01" "2019-01-01" "2019-01-01" "2019-01-01"
#> [6] "2019-01-01" "2019-01-01" "2019-01-01" "2019-01-01" "2019-01-01"
#> [11] "2019-01-01" "2019-01-01" "2019-02-01" "2019-02-01" "2019-02-01"
#> [16] "2019-02-01" "2019-02-01"
This also works with date-times, adding the ability to group by hour of the day, minute of the hour, and second of the minute.
x <- seq(
date_time_build(2019, 1, 1, 1, 55, zone = "UTC"),
date_time_build(2019, 1, 1, 2, 15, zone = "UTC"),
by = 120
)
x
#> [1] "2019-01-01 01:55:00 UTC" "2019-01-01 01:57:00 UTC"
#> [3] "2019-01-01 01:59:00 UTC" "2019-01-01 02:01:00 UTC"
#> [5] "2019-01-01 02:03:00 UTC" "2019-01-01 02:05:00 UTC"
#> [7] "2019-01-01 02:07:00 UTC" "2019-01-01 02:09:00 UTC"
#> [9] "2019-01-01 02:11:00 UTC" "2019-01-01 02:13:00 UTC"
#> [11] "2019-01-01 02:15:00 UTC"
date_group(x, "minute", n = 5)
#> [1] "2019-01-01 01:55:00 UTC" "2019-01-01 01:55:00 UTC"
#> [3] "2019-01-01 01:55:00 UTC" "2019-01-01 02:00:00 UTC"
#> [5] "2019-01-01 02:00:00 UTC" "2019-01-01 02:05:00 UTC"
#> [7] "2019-01-01 02:05:00 UTC" "2019-01-01 02:05:00 UTC"
#> [9] "2019-01-01 02:10:00 UTC" "2019-01-01 02:10:00 UTC"
#> [11] "2019-01-01 02:15:00 UTC"
Rounding
While grouping is useful for summarizing within a component, rounding is useful for summarizing across components. It is great for summarizing by, say, a rolling set of 60 days.
Rounding operates on the underlying count that makes up your date or date-time. To see what I mean by this, try unclassing a date:
unclass(date_build(2020, 1, 1))
#> [1] 18262
This is a count of days since the origin that R uses,
1970-01-01, which is considered day 0. If you were to floor by 60 days,
this would bundle
[1970-01-01, 1970-03-02), [1970-03-02, 1970-05-01)
, and so
on. Equivalently, it bundles counts of [0, 60), [60, 120)
,
etc.
x <- seq(date_build(1970, 01, 01), date_build(1970, 05, 10), by = 20)
date_floor(x, "day", n = 60)
#> [1] "1970-01-01" "1970-01-01" "1970-01-01" "1970-03-02" "1970-03-02"
#> [6] "1970-03-02" "1970-05-01"
date_ceiling(x, "day", n = 60)
#> [1] "1970-01-01" "1970-03-02" "1970-03-02" "1970-03-02" "1970-05-01"
#> [6] "1970-05-01" "1970-05-01"
If you prefer a different origin, you can supply a Date
origin
to date_floor()
, which determines what
“day 0” is considered to be. This can be useful for grouping by multiple
weeks if you want to control what is considered the start of the week.
Since 1970-01-01 is a Thursday, flooring by 2 weeks would normally
generate all Thursdays:
as_weekday(date_floor(x, "week", n = 14))
#> <weekday[7]>
#> [1] Thu Thu Thu Thu Thu Thu Thu
To change this you can supply an origin
on the weekday
that you’d like to be considered the first day of the week.
sunday <- date_build(1970, 01, 04)
date_floor(x, "week", n = 14, origin = sunday)
#> [1] "1969-09-28" "1970-01-04" "1970-01-04" "1970-01-04" "1970-01-04"
#> [6] "1970-01-04" "1970-04-12"
as_weekday(date_floor(x, "week", n = 14, origin = sunday))
#> <weekday[7]>
#> [1] Sun Sun Sun Sun Sun Sun Sun
If you only need to floor by 1 week, it is often easier to use
date_shift()
, as seen in the next section.
Shifting
date_shift()
allows you to target a weekday, and then
shift a vector of dates forward or backward to the next instance of that
target. It requires using one of the new types in clock,
weekday, which is supplied as the target.
For example, to shift to the next Tuesday:
x <- date_build(2020, 1, 1:2)
# Wednesday / Thursday
as_weekday(x)
#> <weekday[2]>
#> [1] Wed Thu
# `clock_weekdays` is a helper that returns the code corresponding to
# the requested day of the week
clock_weekdays$tuesday
#> [1] 3
tuesday <- weekday(clock_weekdays$tuesday)
tuesday
#> <weekday[1]>
#> [1] Tue
date_shift(x, target = tuesday)
#> [1] "2020-01-07" "2020-01-07"
Shifting to the previous day of the week is a nice way to
floor by 1 week. It allows you to control the start of the week in a way
that is slightly easier than using
date_floor(origin = )
.
x <- seq(date_build(1970, 01, 01), date_build(1970, 01, "last"), by = 3)
date_shift(x, tuesday, which = "previous")
#> [1] "1969-12-30" "1969-12-30" "1970-01-06" "1970-01-06" "1970-01-13"
#> [6] "1970-01-13" "1970-01-13" "1970-01-20" "1970-01-20" "1970-01-27"
#> [11] "1970-01-27"
Arithmetic
You can do arithmetic with dates and date-times using the family of
add_*()
functions. With dates, you can add years, months,
and days. With date-times, you can additionally add hours, minutes, and
seconds.
x <- date_build(2020, 1, 1)
add_years(x, 1:5)
#> [1] "2021-01-01" "2022-01-01" "2023-01-01" "2024-01-01" "2025-01-01"
One of the neat parts about clock is that it requires you to be explicit about how you want to handle invalid dates when doing arithmetic. What is 1 month after January 31st? If you try and create this date, you’ll get an error.
x <- date_build(2020, 1, 31)
add_months(x, 1)
#> Error in `invalid_resolve()`:
#> ! Invalid date found at location 1.
#> ℹ Resolve invalid date issues by specifying the `invalid` argument.
clock gives you the power to handle this through the
invalid
option:
# The previous valid moment in time
add_months(x, 1, invalid = "previous")
#> [1] "2020-02-29"
# The next valid moment in time
add_months(x, 1, invalid = "next")
#> [1] "2020-03-01"
# Overflow the days. There were 29 days in February, 2020, but we
# specified 31. So this overflows 2 days past day 29.
add_months(x, 1, invalid = "overflow")
#> [1] "2020-03-02"
# If you don't consider it to be a valid date
add_months(x, 1, invalid = "NA")
#> [1] NA
As a teaser, the low level library has a calendar type named
year-month-day that powers this operation. It actually gives you
more flexibility, allowing "2020-02-31"
to exist
in the wild:
ymd <- as_year_month_day(x) + duration_months(1)
ymd
#> <year_month_day<day>[1]>
#> [1] "2020-02-31"
You can use invalid_resolve(invalid =)
to resolve this
like you did in add_months()
, or you can let it hang around
if you expect other operations to make it “valid” again.
# Adding 1 more month makes it valid again
ymd + duration_months(1)
#> <year_month_day<day>[1]>
#> [1] "2020-03-31"
When working with date-times, you can additionally add hours, minutes, and seconds.
x <- date_time_build(2020, 1, 1, 2, 30, zone = "America/New_York")
x %>%
add_days(1) %>%
add_hours(2:5)
#> [1] "2020-01-02 04:30:00 EST" "2020-01-02 05:30:00 EST"
#> [3] "2020-01-02 06:30:00 EST" "2020-01-02 07:30:00 EST"
When adding units of time to a POSIXct, you have to be very careful with daylight saving time issues. clock tries to help you out by letting you know when you run into an issue:
x <- date_time_build(1970, 04, 25, 02, 30, 00, zone = "America/New_York")
x
#> [1] "1970-04-25 02:30:00 EST"
# Daylight saving time gap on the 26th between 01:59:59 -> 03:00:00
x %>% add_days(1)
#> Error in `as_zoned_time()`:
#> ! Nonexistent time due to daylight saving time at location 1.
#> ℹ Resolve nonexistent time issues by specifying the `nonexistent` argument.
You can solve this using the nonexistent
argument to
control how these times should be handled.
# Roll forward to the next valid moment in time
x %>% add_days(1, nonexistent = "roll-forward")
#> [1] "1970-04-26 03:00:00 EDT"
# Roll backward to the previous valid moment in time
x %>% add_days(1, nonexistent = "roll-backward")
#> [1] "1970-04-26 01:59:59 EST"
# Shift forward by adding the size of the DST gap
# (this often keeps the time of day,
# but doesn't guaratee that relative ordering in `x` is maintained
# so I don't recommend it)
x %>% add_days(1, nonexistent = "shift-forward")
#> [1] "1970-04-26 03:30:00 EDT"
# Replace nonexistent times with an NA
x %>% add_days(1, nonexistent = "NA")
#> [1] NA
Getting and setting
clock provides a family of getters and setters for working with dates and date-times. You can get and set the year, month, or day of a date.
x <- date_build(2019, 5, 6)
get_year(x)
#> [1] 2019
get_month(x)
#> [1] 5
get_day(x)
#> [1] 6
x %>%
set_day(22) %>%
set_month(10)
#> [1] "2019-10-22"
As you might expect by now, setting the date to an invalid date requires you to explicitly handle this:
x %>%
set_day(31) %>%
set_month(4)
#> Error in `invalid_resolve()`:
#> ! Invalid date found at location 1.
#> ℹ Resolve invalid date issues by specifying the `invalid` argument.
x %>%
set_day(31) %>%
set_month(4, invalid = "previous")
#> [1] "2019-04-30"
You can additionally set the hour, minute, and second of a POSIXct.
x <- date_time_build(2020, 1, 2, 3, zone = "America/New_York")
x
#> [1] "2020-01-02 03:00:00 EST"
x %>%
set_minute(5) %>%
set_second(10)
#> [1] "2020-01-02 03:05:10 EST"
As with other manipulations of POSIXct, you’ll have to be aware of
daylight saving time when setting components. You may need to supply the
nonexistent
or ambiguous
arguments of the
set_*()
functions to handle these issues.