Smoothing
总结自 2. Smoothing。
In the context of nonparametric regression, a smoothing algorithm (a.k.a. a smoother) is a summary of trend in
We focus on scatterplot smooths, for which
Essentially, a smooth just finds an estimate of
As a running example for the next several sections, assume we have data generated from the following function by adding
1. Bin SmoothingPermalink
Partition
2. Moving AveragesPermalink
Moving averages use variable bins containing a fixed number of observations, rather than fixed-width bins with a variable number of observations. They tend to wiggle near the center of the data, but flatten out near the boundary of the data.
3. Running LinePermalink
This improves on the moving average by fitting a line rather than an average to the data within a variable-width bin. But it still tends to be rough.
4. LoessPermalink
Loess extends the running line smooth by using weighted linear regression inside the variable-width bins. Loess is more computationally intensive, but is often satisfactorily smooth and flexible.
LOESS fits the model
where
and
LOESS is a consistent estimator, but may be inefficient at finding relatively simple structures in the data. Although not originally intended for high-dimensional regression, LOESS is often used.
5. Kernel SmoothersPermalink
These are much like moving averages except that the average is weighted and the bin-width is fixed. Kernel smoothers work well and are mathematically tractable. The weights in the average depend upon the kernel
The bin-width is set by
Let
For kernel estimation, the Epanechnikov function has good properties. The function is
6. SplinesPermalink
If one estimates
over an appropriate set of functions (e.g., the usual Hilbert space of square-integrable functions), then the solution one obtains are smoothing splines.
Smoothing splines are piecewise polynomials, and the pieces are divided at the sample values
The
Regression splines have fixed knots that need not depend upon the data. But knot selection techniques enable one to find good knots automatically.
Splines are computationally fast, enjoy strong theory, work well, and are widely used.
7. Comparing SmoothersPermalink
Most smoothing methods are approximately kernel smoothers, with parameters that correspond to the kernel
In practice, one can:
- fix
by judgment - find the optimal fixed
- fit
adaptively from the data - fit the kernel
adaptively from the data
There is a point of diminishing returns, and this is usually hit when one fits the h adaptively.
Silverman (1986; Density Estimation for Statistics and Data Analysis, Chapman-Hall) provides a nice discussion of smoothing issues in the context of density estimation.
Breiman and Peters (1992; International Statistics Review, 60, 271-290) give results on a simulation experiment to compare smoothers. Broadly, they found that:
- adaptive kernel smoothing is good but slow,
- smoothing splines are accurate but too rough in large samples,
- everything else is not really competitive in hard problems.
Theoretical understanding of the properties of smoothing depends upon the eigenstructure of the smoothing matrix. Hastie, Tibshirani, and Freidman (2001; The Elements of Statistical Learning, Springer) provide an introduction and summary of this area in Chapter 5.
A key issue is the tension between how “wiggly” (扭曲的;与 “平滑” 对应) (or flexible) the smooth can be, and how many observations one has. You need a lot of data to fit local wiggles. This tradeoff is reflected in the degrees of freedom associated with the smooth.
In linear regression one starts with a quantity of information equal to
Smoothing is a nonlinear constraint and costs more information. But most smoothers can be expressed as a linear operator (matrix)
In linear regression, the “smoother” is the linear operator that acts on the data to produce the estimate:
The matrix
Note that:
since
This means that the degrees of freedom associated fitting a linear regression (without an intercept) is
If one uses an intercept, then this is equivalent to adding a column of ones to
Similarly, we can find the smoothing operator for bin smoothing in
In this case the matrix
Clearly, the trace of this matrix is
Most smoothers are shrinkage estimators. Mathematically, they pull the weights on the coefficients in the basis expansion for
Shrinkage is why smoothing has an effective degrees of freedom between
For bin smoothing we can oversmooth or undersmooth. If
Smoothing entails a tradeoff between the bias and variance in
Mean squared error is a criterion that captures both aspects. At
One wants a smooth that minimizes
待补充:
http://www.biostat.jhsph.edu/~ririzarr/Teaching/754/ https://web.stanford.edu/~hastie/Papers/lsam_annals.pdf http://www.stat.umn.edu/geyer/5601/notes/smoo.pdf http://www.stat.uchicago.edu/~lafferty/pdf/nonparam.pdf http://data.princeton.edu/eco572/smoothing.pdf
Comments