R Dates
参考文献:
1. Introduction
Some are date-only classes, some are datetime classes. All classes can handle calendar dates (e.g., March 15, 2010), but not all can represent a datetime (e.g., 11:45 AM on March 1, 2010).
The following classes are included in the base distribution of R:
Date: date-only.- General-purpose class for working with dates
 
POSIXct: datetime.- Internally, the datetime is stored as the number of seconds since January 1, 1970, and so is a very compact representation.
 - This class is recommended for storing datetime information (e.g., in data frames).
 - “ct” stands for “calender time”
 - 我觉得 stands for “compact time” 也说得通
 
POSIXlt: datetime.- Stored in a 9-element list.
 - Easy to extract date parts, such as the month or hour.
 - Obviously, this representation is much less compact than the 
POSIXctclass; hence it is normally used for intermediate processing and not for storing data. - “lt” stands for “local time”
 - “lt” also helps one remember that 
POXIXltobjects are lists 
POSIXlt 举例:
> tm1.lt <- as.POSIXlt(Sys.time())
> tm1.lt
[1] "2015-02-07 23:19:29 CST"
> unlist(tm1.lt) ## unlist: simplifies a list structure to produce a vector which contains all the atomic components
               sec                min               hour               mday 
"29.5144650936127"               "19"               "23"                "7" 
               mon               year               wday               yday 
               "1"              "115"                "6"               "37" 
             isdst               zone             gmtoff 
               "0"              "CST"            "28800" 
> unclass(tm1.lt) ## 效果类似 unlist。这里我们是把 POSIXlt 看做一个 object,然后 unclass 的作用是 returns (a copy of) its fields with its class attribute removed
> ... ## 输出省略
POSIXlt 实际是一个 11-element list,只是前 9 个相对常用一点:
- sec: 0–61; seconds.
 - min: 0–59; minutes.
 - hour: 0–23; hours.
 - mday: 1–31; day of the month
 - mon: 0–11; months after the first of the year.
 - year: years since 1900.
 - wday: 0–6; day of the week, starting on Sunday.
 - yday: 0–365; day of the year.
 - isdst: Daylight Saving Time flag. Positive if in force, zero if not, negative if unknown.
 - zone: (Optional) The abbreviation for the time zone in force at that time; “” if unknown (but “” might also be used for UTC).
    
- UTC: Coordinated Universal Time
 - CST: China Standard Time, i.e. UTC+08:00
 
 - gmtoff: (Optional) The offset in minutes from GMT; positive values are East of the meridian. Usually NA if unknown, but 0 could mean unknown.
 
The base distribution also provides functions for easily converting between representations: as.Date, as.POSIXct, and as.POSIXlt.
The following packages are available for downloading from CRAN:
chron:- No time zones and daylight savings time.
 - It’s therefore easier to use than 
Datebut less powerful thanPOSIXctandPOSIXlt. - It would be useful for work in econometrics or time series analysis.
 
lubridate:- a wrapper for 
POSIXctwith more intuitive syntax 
- a wrapper for 
 mondate:- This is a specialized package for handling dates in units of months in addition to days and years.
 - Such needs arise in accounting and actuarial work, for example, where month-by-month calculations are needed.
 
timeDate:- This is a high-powered package with well-thought-out facilities for handling dates and times, including date arithmetic, business days, holidays, conversions, and generalized handling of time zones.
 - It was originally part of the Rmetrics software for financial modeling, where precision in dates and times is critical.
 
Which class should you select? The article Date and Time Classes in R by Grothendieck and Petzoldt offers this general advice:
When considering which class to use, always choose the least complex class that will support the application. That is, use
Dateif possible, otherwise usechronand otherwise use the POSIX classes. Such a strategy will greatly reduce the potential for error and increase the reliability of your application.
2. Basic Date Operation
2.1 Getting the Current Date
> Sys.Date()
[1] "2015-02-08"
> class(Sys.Date()) ## Sys.Date() returns a Date object
[1] "Date"
Similarly Sys.time() returns the current date-time:
> Sys.time() ## 注意 Date() 是大写,time() 是小写
[1] "2015-02-08 11:28:02 CST"
> class(Sys.time())
[1] "POSIXct" "POSIXt" 
> unclass(Sys.time()) 
[1] 1423366132 ## 可见本身是个 POSIXct
这里 class(Sys.time()) 返回了两个值要特别说明一下:class(object) returns a character vector giving the names of the classes from which the object inherits.
2.2 Converting a String into a Date
The default format assumed by as.Date is the ISO 8601 standard format of yyyy-mm-dd:
> as.Date("1985-11-30")
[1] "1985-11-30"
Use format="%m/%d/%Y" if the date string is in American style mm/dd/yyyy:
> as.Date("11/30/1985")
Error in charToDate(x) : 
  character string is not in a standard unambiguous format
> as.Date("11/30/1985", format="%m/%d/%Y")
[1] "1985-11-30"
?strftime or ?strptime for details about allowed formats.
如果是 datetime 还要折腾 y, m, d 的话,用 lubridate 会更方便。
2.3 Converting a Date into a String
类似 as.Date("11/30/1985", format="%m/%d/%Y"),我们也可以用 format="%m/%d/%Y" 来指定输出的格式,比如:
> format(Sys.Date())
[1] "2015-02-08"
> format(Sys.Date(), format="%m/%d/%Y")
[1] "02/08/2015"
> as.character(Sys.Date())
[1] "2015-02-08"
> as.character(Sys.Date(), format="%m/%d/%Y")
[1] "02/08/2015"
2.4 Converting Year, Month, and Day into a Date
Use the ISOdate function:
> ISOdate(year, month, day)
The result is a POSIXct object that you can convert into a Date object:
> as.Date(ISOdate(year, month, day))
ISOdate can process entire vectors of years, months, and days, which is quite handy for mass conversion of input data.
> years
[1] 2010 2011 2012 2013 2014
> months
[1] 1 1 1 1 1
> days
[1] 15 21 20 18 17
> ISOdate(years, months, days)
[1] "2010-01-15 12:00:00 GMT" "2011-01-21 12:00:00 GMT"
[3] "2012-01-20 12:00:00 GMT" "2013-01-18 12:00:00 GMT"
[5] "2014-01-17 12:00:00 GMT"
Here vector of months is actually redundant and that the last expression can therefore be further simplified by invoking the Recycling Rule:
> as.Date(ISOdate(years, 1, days))
[1] "2010-01-15" "2011-01-21" "2012-01-20" "2013-01-18" "2014-01-17"
This recipe can also be extended to handle year, month, day, hour, minute, and second data by using the ISOdatetime function:
> ISOdatetime(year, month, day, hour, minute, second)
2.5 Getting the Julian Date
Julian date is the number of days since January 1, 1970.
Either convert the Date object to an integer or use the julian function:
> d = as.Date("1985-11-30")
> as.integer(d)
[1] 5812
> julian(d)
[1] 5812
attr(,"origin")
[1] "1970-01-01"
> as.integer(as.Date("1970-01-01"))
[1] 0
> as.integer(as.Date("1970-01-02"))
[1] 1
2.6 Extracting the Parts of a Date
> d <- as.Date("2010-03-15")
> p <- as.POSIXlt(d)
> p$mday 		# Day of the month
[1] 15
> p$mon 		# Month (0 = January)
[1] 2
> p$year + 1900 # Year
[1] 2010
简写形式:
> as.POSIXlt(d)$mday
[1] 15
2.7 Creating a Sequence of Dates
The seq function is a generic function that has a version for Date objects. It can create a Date sequence similarly to the way it creates a sequence of numbers.
> s <- as.Date("2012-01-01")
> e <- as.Date("2012-02-01")
> seq(from=s, to=e, by=1) # One month of dates
[1] "2012-01-01" "2012-01-02" "2012-01-03" "2012-01-04" "2012-01-05" "2012-01-06"
[7] "2012-01-07" "2012-01-08" "2012-01-09" "2012-01-10" "2012-01-11" "2012-01-12"
[13] "2012-01-13" "2012-01-14" "2012-01-15" "2012-01-16" "2012-01-17" "2012-01-18"
[19] "2012-01-19" "2012-01-20" "2012-01-21" "2012-01-22" "2012-01-23" "2012-01-24"
[25] "2012-01-25" "2012-01-26" "2012-01-27" "2012-01-28" "2012-01-29" "2012-01-30"
[31] "2012-01-31" "2012-02-01"
Another typical use specifies a starting date (from), increment (by), and number of dates (length.out):
> seq(from=s, by=1, length.out=7) # Dates, one week apart 
[1] "2012-01-01" "2012-01-02" "2012-01-03" "2012-01-04" "2012-01-05" "2012-01-06" 
[7] "2012-01-07"
The increment (by) is flexible and can be specified in days, weeks, months, or years:
> seq(from=s, by="month", length.out=12) # First of the month for one year
[1] "2012-01-01" "2012-02-01" "2012-03-01" "2012-04-01" "2012-05-01" "2012-06-01"
[7] "2012-07-01" "2012-08-01" "2012-09-01" "2012-10-01" "2012-11-01" "2012-12-01"
> seq(from=s, by="3 months", length.out=4) # Quarterly dates for one year
[1] "2012-01-01" "2012-04-01" "2012-07-01" "2012-10-01"
> seq(from=s, by="year", length.out=10) # Year-start dates for one decade
[1] "2012-01-01" "2013-01-01" "2014-01-01" "2015-01-01" "2016-01-01" "2017-01-01"
[7] "2018-01-01" "2019-01-01" "2020-01-01" "2021-01-01"
Be careful with by="month" near month-end. In this example, the end of February overflows into March, which is probably not what you wanted:
> seq(as.Date("2010-01-29"), by="month", len=3)
[1] "2010-01-29" "2010-03-01" "2010-03-29"
最后这个 len=3 我要说下。这个 len=3 是 length.out=3 的缩写,缩写成 l=3 也可以;而且 seq 的参数基本都是可以缩写的,比如:
> seq(from=1, to=10)
[1]  1  2  3  4  5  6  7  8  9 10
> seq(f=1, t=10)
[1]  1  2  3  4  5  6  7  8  9 10
我最开始以为 seq(f=1, t=10) 是不是被识别成 seq(1, 10) 然后再根据参数的位置被识别成了 seq(from=1, to=10)。试验证明不是的:
> seq(t=10, f=1)
[1]  1  2  3  4  5  6  7  8  9 10
> seq(10, 1)
[1] 10  9  8  7  6  5  4  3  2  1
不知道是不是所有的 R function 都这样;而且万一两个参数缩写相同时该如何判断?有时间再研究。
2.8 Date Addition and Subtraction
> d1 = as.Date("1985-11-30")
> d1 + 10
[1] "1985-12-10"
> d1 - 10
[1] "1985-11-20"
> d2 = as.Date("1995-02-16") ## 松冈茉优
> d2 - d1
Time difference of 3365 days
> d2 + d1
Error in `+.Date`(d2, d1) : binary + is not defined for "Date" objects
> difftime(d2, d1, units="weeks")
Time difference of 480.7143 weeks
> d3 = as.Date("1997-03-15") ## 黑岛结菜
> diff(c(d1,d2,d3))
Time differences in days
[1] 3365  758
注意,这样得到的 difference 并不能直接拿来计算,具体的数值需要 as.numeric 来转一下,比如:
> d2 - d1
Time difference of 3365 days
> as.numeric(d2 - d1, units = "days")
[1] 3365
> as.numeric(d2 - d1, units = "secs")
[1] 290736000
        
      
留下评论