-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 17350 - Formatting fractions of a second for POSIXt #83
Comments
This is a good and gnarly issue. (And my younger self appears to have been in the bugreport, albeit on a side-topic not fully groking the original issue -- it happens.) The subsequent July 2020 analysis by @bastistician is very good as is @mmaechler's followup. Implementing this in its simplest form, namely trusting > options(digits.secs = 6)
> x <- as.POSIXct("2009-08-03 12:01:59") + 0:3/10
> format(x) # we now get full precision as opposed to the truncation
[1] "2009-08-03 12:01:59.000000" "2009-08-03 12:01:59.099999"
[3] "2009-08-03 12:01:59.200000" "2009-08-03 12:01:59.299999"
> format(x, format = "%Y-%m-%d %H:%M:%OS")
[1] "2009-08-03 12:01:59.000000" "2009-08-03 12:01:59.099999"
[3] "2009-08-03 12:01:59.200000" "2009-08-03 12:01:59.299999"
> Selecting, say, three digits either as the > format(x, digits = 3)
[1] "2009-08-03 12:01:59.000" "2009-08-03 12:01:59.099"
[3] "2009-08-03 12:01:59.200" "2009-08-03 12:01:59.299"
> format(x, format = "%Y-%m-%d %H:%M:%OS3")
[1] "2009-08-03 12:01:59.000" "2009-08-03 12:01:59.099"
[3] "2009-08-03 12:01:59.200" "2009-08-03 12:01:59.299" (Part of me is now wondering of the For completeness, code for that would be as follows commenting out format.POSIXlt <- function(x, format = "", usetz = FALSE,
digits = getOption("digits.secs"), ...)
{
if(!inherits(x, "POSIXlt")) stop("wrong class")
if(any(f0 <- format == "")) {
## need list [ method here.
times <- unlist(unclass(x)[1L:3L])[f0]
#secs <- x$sec[f0]; secs <- secs[is.finite(secs)]
np <- if(is.null(digits)) 0L else min(6L, digits)
#if(np >= 1L) # no unnecessary trailing '0' :
# for (i in seq_len(np)- 1L)
# if(all( abs(secs - round(secs, i)) < 1e-6 )) {
# np <- i
# break
# }
format[f0] <-
if(all(times[is.finite(times)] == 0)) "%Y-%m-%d"
else if(np == 0L) "%Y-%m-%d %H:%M:%S"
else paste0("%Y-%m-%d %H:%M:%OS", np)
}
.Internal(format.POSIXlt(x, format, usetz))
} |
If we added rounding, for example via if (!is.null(digits)) x$sec <- round(x$sec, digits) before the other processing that the default output (i.e. no format specified) is a truncation-free single digit supressing the trailing digits as was desired: [1] "2009-08-03 12:01:59.0" "2009-08-03 12:01:59.1"
[3] "2009-08-03 12:01:59.2" "2009-08-03 12:01:59.3" and the full-precision format is also rounded: [1] "2009-08-03 12:01:59.000000" "2009-08-03 12:01:59.100000"
[3] "2009-08-03 12:01:59.200000" "2009-08-03 12:01:59.300000" For completeness, code for that would be as follows showing the added line format.POSIXlt <- function(x, format = "", usetz = FALSE,
digits = getOption("digits.secs"), ...)
{
if(!inherits(x, "POSIXlt")) stop("wrong class")
if (!is.null(digits)) x$sec <- round(x$sec, digits)
if(any(f0 <- format == "")) {
## need list [ method here.
times <- unlist(unclass(x)[1L:3L])[f0]
secs <- x$sec[f0]; secs <- secs[is.finite(secs)]
np <- if(is.null(digits)) 0L else min(6L, digits)
if(np >= 1L) # no unnecessary trailing '0' :
for (i in seq_len(np)- 1L)
if(all( abs(secs - round(secs, i)) < 1e-6 )) {
np <- i
break
}
format[f0] <-
if(all(times[is.finite(times)] == 0)) "%Y-%m-%d"
else if(np == 0L) "%Y-%m-%d %H:%M:%S"
else paste0("%Y-%m-%d %H:%M:%OS", np)
}
.Internal(format.POSIXlt(x, format, usetz))
} |
I'm getting different results from you @eddelbuettel. Here with R.Version()$version.string
#> [1] "R version 4.4.2 (2024-10-31)"
R.Version()$platform
#> [1] "aarch64-apple-darwin20"
# chunk 1 from https://github.com/r-devel/r-dev-day/issues/83#issuecomment-2484226381
options(digits.secs = 6)
x <- as.POSIXct("2009-08-03 12:01:59") + 0:3/10
format(x)
#> [1] "2009-08-03 12:01:59.0" "2009-08-03 12:01:59.0" "2009-08-03 12:01:59.2"
#> [4] "2009-08-03 12:01:59.2"
format(x, format = "%Y-%m-%d %H:%M:%OS")
#> [1] "2009-08-03 12:01:59.000000" "2009-08-03 12:01:59.099999"
#> [3] "2009-08-03 12:01:59.200000" "2009-08-03 12:01:59.299999"
# chunk 2 from https://github.com/r-devel/r-dev-day/issues/83#issuecomment-2484226381
format(x, digits = 3)
#> [1] "2009-08-03 12:01:59.0" "2009-08-03 12:01:59.0" "2009-08-03 12:01:59.2"
#> [4] "2009-08-03 12:01:59.2"
format(x, format = "%Y-%m-%d %H:%M:%OS3")
#> [1] "2009-08-03 12:01:59.000" "2009-08-03 12:01:59.099"
#> [3] "2009-08-03 12:01:59.200" "2009-08-03 12:01:59.299"
# chunk 2 resetting `digits.secs` first
options(digits.secs = 0)
format(x, digits = 3)
#> [1] "2009-08-03 12:01:59.0" "2009-08-03 12:01:59.0" "2009-08-03 12:01:59.2"
#> [4] "2009-08-03 12:01:59.2"
format(x, format = "%Y-%m-%d %H:%M:%OS3")
#> [1] "2009-08-03 12:01:59.000" "2009-08-03 12:01:59.099"
#> [3] "2009-08-03 12:01:59.200" "2009-08-03 12:01:59.299" Created on 2024-11-19 with reprex v2.1.1 Can you check and if still getting differences, try to work out why? |
Sure but I applied my draft patch against (by convention) R-devel, not R-release. R-releaseedd@rob:~$ Rscript --vanilla /tmp/r/heather_reprex.R
[1] "R version 4.4.2 (2024-10-31)"
[1] "x86_64-pc-linux-gnu"
[1] "2009-08-03 12:01:59.0" "2009-08-03 12:01:59.0" "2009-08-03 12:01:59.2"
[4] "2009-08-03 12:01:59.2"
[1] "2009-08-03 12:01:59.000000" "2009-08-03 12:01:59.099999"
[3] "2009-08-03 12:01:59.200000" "2009-08-03 12:01:59.299999"
[1] "2009-08-03 12:01:59.0" "2009-08-03 12:01:59.0" "2009-08-03 12:01:59.2"
[4] "2009-08-03 12:01:59.2"
[1] "2009-08-03 12:01:59.000" "2009-08-03 12:01:59.099"
[3] "2009-08-03 12:01:59.200" "2009-08-03 12:01:59.299"
[1] "2009-08-03 12:01:59.0" "2009-08-03 12:01:59.0" "2009-08-03 12:01:59.2"
[4] "2009-08-03 12:01:59.2"
[1] "2009-08-03 12:01:59.000" "2009-08-03 12:01:59.099"
[3] "2009-08-03 12:01:59.200" "2009-08-03 12:01:59.299"
edd@rob:~$ R-devel with draft patchedd@rob:~$ RDscript --vanilla /tmp/r/heather_reprex.R
[1] "R Under development (unstable) (2024-11-18 r87347)"
[1] "x86_64-pc-linux-gnu"
[1] "2009-08-03 12:01:59.000000" "2009-08-03 12:01:59.099999"
[3] "2009-08-03 12:01:59.200000" "2009-08-03 12:01:59.299999"
[1] "2009-08-03 12:01:59.000000" "2009-08-03 12:01:59.099999"
[3] "2009-08-03 12:01:59.200000" "2009-08-03 12:01:59.299999"
[1] "2009-08-03 12:01:59.000" "2009-08-03 12:01:59.099"
[3] "2009-08-03 12:01:59.200" "2009-08-03 12:01:59.299"
[1] "2009-08-03 12:01:59.000" "2009-08-03 12:01:59.099"
[3] "2009-08-03 12:01:59.200" "2009-08-03 12:01:59.299"
[1] "2009-08-03 12:01:59.000" "2009-08-03 12:01:59.099"
[3] "2009-08-03 12:01:59.200" "2009-08-03 12:01:59.299"
[1] "2009-08-03 12:01:59.000" "2009-08-03 12:01:59.099"
[3] "2009-08-03 12:01:59.200" "2009-08-03 12:01:59.299"
edd@rob:~$ |
Ah okay, I missed that those results were with your patch. |
No worries. That is actually a nice summary of the change because a) R-release clearly shows how bad the truncation is and b) that the patch helps some and how it have me the idea to also round - but that may be a step too far. I ran the patch through CI at r-svn as well getting a cherished ✔️ there. |
I think your first patch is closer to what we want here. We want to avoid rounding, because as noted by @mmaechler in comment 7 of the bug report, the C code always truncates rather than rounds to avoid bug 14579, i.e. to avoid invalid times like "19:59:60.000". So I think the bit left to fix is the issue of trailing digits. The comment in the code says the intention is to avoid trailing zeros, similar to how the trailing zeros are not shown in the first case here: > print(1.00000, digits = 6)
[1] 1
> print(1.00001, digits = 6)
[1] 1.00001 However, in our example, the code is not working correctly, because it is dropping non-zero digits. I haven't worked it through but I guess this is due to the R code assuming rounding and the C code actually truncating. |
Yes of course. That is the patch I tested in r-svn.
I know you do not have easy access to the patched build but can you help me follow along? Which input produces which wrong output and which expected output where we looking for? The C code in datetime is a little beyond what I have appetite for. So all I may have accomplished here is to illustrate that the suggestion by @mmaechler to maybe just stick with the chosen digits is an improvement. Returning one digit when digits.sevs is six is indeed bad. It can always be overrules by explicit giving at argument, |
I don't think we need to touch the C code. I'll try to come up with some examples in time for the Dev Day session. |
Sounds good. I started hard and long at the |
Okay, the trailing zeros behaviour isn't quite what I expected in R-release. Example 1 y <- as.POSIXlt(c("2009-08-03 12:01:59.0001", "2009-08-03 12:01:59.1001",
"2009-08-03 12:01:59.2001", "2009-08-03 12:01:59.3001"))
print(as.POSIXlt(y)$sec, digits = 6)
#> [1] 59.0001 59.1001 59.2001 59.3001
format(y, format = "%Y-%m-%d %H:%M:%OS3")
#> [1] "2009-08-03 12:01:59.000" "2009-08-03 12:01:59.100"
#> [3] "2009-08-03 12:01:59.200" "2009-08-03 12:01:59.300"
options(digits.secs = 3)
format(y)
#> [1] "2009-08-03 12:01:59.000" "2009-08-03 12:01:59.100"
#> [3] "2009-08-03 12:01:59.200" "2009-08-03 12:01:59.300" I was expecting at least the second case there to drop the trailing zeros, because print(1.1001, digits = 4)
#> [1] 1.1 and with Example 2 z <- as.POSIXlt(c("2009-08-03 12:01:59.000", "2009-08-03 12:01:59.100",
"2009-08-03 12:01:59.200", "2009-08-03 12:01:59.300"))
print(as.POSIXlt(z)$sec, digits = 6)
#> [1] 59.0 59.1 59.2 59.3
format(z, format = "%Y-%m-%d %H:%M:%OS3")
#> [1] "2009-08-03 12:01:59.000" "2009-08-03 12:01:59.100"
#> [3] "2009-08-03 12:01:59.200" "2009-08-03 12:01:59.300"
options(digits.secs = 3)
format(z)
#> [1] "2009-08-03 12:01:59.0" "2009-08-03 12:01:59.1" "2009-08-03 12:01:59.2"
#> [4] "2009-08-03 12:01:59.3" Here we have an inconsistency in R-release. I guess that your patch will make these both like the first case. If that's true, we have two options:
|
Example 0 This is what I meant by my earlier comment
options(digits.secs = 7)
x <- as.POSIXct("2009-08-03 12:01:59") + 0:3/10
print(as.POSIXlt(x)$sec, digits = 9)
#> [1] 59.0000000 59.0999999 59.2000000 59.3000000
format(x)
#> [1] "2009-08-03 12:01:59.0" "2009-08-03 12:01:59.0" "2009-08-03 12:01:59.2"
#> [4] "2009-08-03 12:01:59.2" I think the format command should have the same digits as the print in this case. This is for R-release, so I mean the current code to adjust np is not working correctly if the aim is only to drop trailing zeros. |
No this is as coded. You provided a format string which makes |
I do not provide a format string in the second call to format in Example 1. |
I am getting lost here and I fear we are talking past each other.
Yes. That is what I have been saying a few times by now. |
Let's talk it through in the Dev Day session and see if we can get on the same page! |
While CI is running the diff (with the whitespace changes as the indentation seems wrong to me) is: Index: src/library/base/R/datetime.R
===================================================================
--- src/library/base/R/datetime.R (revision 87347)
+++ src/library/base/R/datetime.R (working copy)
@@ -382,21 +382,14 @@
digits = getOption("digits.secs"), ...)
{
if(!inherits(x, "POSIXlt")) stop("wrong class")
- if(any(f0 <- format == "")) {
+ if(any(f0 <- format == "" | grepl("%OS$", format))) {
## need list [ method here.
- times <- unlist(unclass(x)[1L:3L])[f0]
- secs <- x$sec[f0]; secs <- secs[is.finite(secs)]
+ times <- unlist(unclass(x)[1L:3L])[f0]
np <- if(is.null(digits)) 0L else min(6L, digits)
- if(np >= 1L) # no unnecessary trailing '0' :
- for (i in seq_len(np)- 1L)
- if(all( abs(secs - round(secs, i)) < 1e-6 )) {
- np <- i
- break
- }
- format[f0] <-
- if(all(times[is.finite(times)] == 0)) "%Y-%m-%d"
- else if(np == 0L) "%Y-%m-%d %H:%M:%S"
- else paste0("%Y-%m-%d %H:%M:%OS", np)
+ format[f0] <-
+ if(all(times[is.finite(times)] == 0)) "%Y-%m-%d"
+ else if(np == 0L) "%Y-%m-%d %H:%M:%S"
+ else paste0("%Y-%m-%d %H:%M:%OS", np)
}
.Internal(format.POSIXlt(x, format, usetz))
}
Index: src/library/base/man/strptime.Rd
===================================================================
--- src/library/base/man/strptime.Rd (revision 87347)
+++ src/library/base/man/strptime.Rd (working copy)
@@ -39,7 +39,7 @@
methods is
\code{"\%Y-\%m-\%d \%H:\%M:\%S"} if any element has a time
component which is not midnight, and \code{"\%Y-\%m-\%d"}
- otherwise. If \code{\link{options}("digits.secs")} is set, up to
+ otherwise. If \code{\link{options}("digits.secs")} is set,
the specified number of digits will be printed for seconds.}
\item{\dots}{further arguments to be passed from or to other methods.}
\item{usetz}{logical. Should the time zone abbreviation be appended |
GH pull request is r-devel/r-svn#187, second commit at r-devel/r-svn@fa7b99b contains today's refinement, above is the diff we will likely submitt to bugzilla. CI is complete And patch submitted to bugzilla too. |
Patch updated following Martin Maechler's feedback, tested on the PR and now submitted to Bugzilla. |
The original report for Bug 17350 queried whether fractions of a second for a POSIXct were displayed correctly when printed. The discussion started to pin down some issues which I explore further here.
We'll use Sebastian Meyer's (@bastistician's) example code:
As Sebastian notes,
print.POSIXct(x)
callsformat.POSIXct(x)
which is a wrapper aroundformat.POSIXlt()
. If we check the seconds after convertingx
to POSIXct, we see there are lots of digits due to the floating point arithmetic (i.e. adding 0, 0.1, 0.2, 0.3 does not give exactly 59.0, 59.1, 59.2, 59.3).Main Bug: inconsistent formatting of digits for fractional seconds
As documented in
?format.POSIXlt
the default format is "%Y-%m-%d %H:%M:%S" when the global optiondigits.secs
is NULL (or 0). In other words, no fractional seconds are displayed.If we use the "%OS" format, 0:6 digits are used for the fractional seconds, with the number taken from the global option
digits.secs
, e.g.If we have set the global option
digit.sec > 0
and use the default format, however, we only get 1 decimal place:The same happens if the global option is at the default and we set
digits = 3
This is due to a block of code in
format.POSIXlt()
(L385-400) that is only run when the default format is used (format == ""
) and adjusts the number of digits to avoid trailing zeroes.The task here is to discuss how
format.POSIXlt(x)
could be modified to be more consistent and to prepare a corresponding patch.Side issue: digits ignored when "%OS" format used
A side issue here is that when specifying "%OS" format directly, only the global
digit.secs
option has an effect:This is inconsistent with how options in R usually behave, e.g.
and is due to the C code looking up the option to get the digits datetime.c#L986, rather than taking an argument.
So, an extension to this task could be adjusting the R or C code to respect the
digits
argument.Other
Potentially the help file for
format.POSIXlt()
needs to be updated to document the behaviour of the function, e.g. the help for theformat
anddigits
arguments could be reviewed.The text was updated successfully, but these errors were encountered: