GEOS 597e: Spatiotemporal Data
Analysis Workshop
Homework 8: SSA: Error, stability, truncation and
filtering
Last updated 10/24/06. To be completed prior to class
session Weds., Nov. 1st.
Introduction: This
week we'll test the sensitivity of our SSA of the century-long NINO3
SST anomaly time series from the GOSTA dataset to parameter choices and
to a simple model of a time series with memory; and we'll filter the
data to isolate the robust signal.
- Use the Monte Carlo SSA algorithm described by Ghil et al. (2002)
to compare the eigenvalues of the NINO3 SST time series to a null
hypothesis of AR(1) noise.
- For this exercise we will work with only the time interval
January 1947- December 1990, which has no missing values. Select
only this time interval for further analysis, and remove the mean from
the data. Use the code you developed for HW7 to perform SSA on
this time series with an embedding dimension of 60 months.
- Load a set of 100 realizations of red noise time series having
the same 528-month length and similar AR(1) characteristics of the
NINO3
time series, from here.
These synthetic data were produced using the ARfit package
published by Tapio Schneider and Arnold Neumaier. (Because
of the high degree of autocorrelation in the monthly NINO3 data series,
this rather sophisticated algorithm, rather than the simple one we saw
in HW0, is necessary to construct reasonably
stationary
synthetic AR(1) series.) Additionally, I had to additionally
adjust the
synthetic time series to acccount for the skew in the real time
series. If you're interested, code for working with ARfit to do
this is here.
- For each of 100 red noise realizations, construct the
trajectory matrix, and the covariance matrix corresponding to the
trajectory matrix. Don't forget to remove the mean of each time
series prior to calculation of the covariance. Use these 'noise'
covariance matrices to calculate the MC-SSA eigenvalues after Ghil et
al. (2002), Eq. 17, and in the text. In other words, construct
the
'noise' eigenvalue distribution as the diagonal elements of the
projection of the NINO3 SSA EOFs on each estimate of the covariance of
each of the 100 AR(1) noise series trajectory matrices. Save your
eigenvalues into a 100 x 60 matrix.
- Sort your 100 x 60 'noise'-projected eigenvalues twice: sort
them by row, then by column. Then flip the matrix vertically and
horizontally to get your noise eigenvalues in order from largest to
smallest, left to right and top to bottom. Helpful matlab
functions: sort,
flipud, fliplr, help.
- Plot the 5th and 95th percentile 'noise' eigenvalues vs.
eigenvalue number (use subplot(221))
as a pair of lines. Plot on the same graph the eigenvalues of the
NINO3 time series SSA, as a series of markers. Using the full set
of Monte Carlo results, at approximately what level are each of the
first 6 EOFs statistically different from red noise? Helpful
matlab commands/operators: find, <=,
>=. Label your plot with a title and axis labels.
- Plot the EOFs corresponding to the significant eigenvalues (use
subplot(222)).
How would you describe these EOFs? Label your plot with a title
and axis labels.
- Plot the sum of just the RCs corresponding to your significant
EOFs vs. time (use subplot(212)).
On the same time axis, plot the original NINO3 timeseries.
How much variance in the NINO3 time series is explained?
How
similar is this "significant" summed RC to the sum of the first 6
RCs?
- Print your figure,
and be sure to hand in your answers to the questions in questions 1f-h.
- Test the sensitivity of your SSA of the NINO3 time series to
choice of embedding dimension M.
- Thought experiment: What would you expect the eigenvalue trace
to look like for small embedding dimension, relative to the length of
the time series? How about for large embedding dimension relative
to the length of the time series? What trade-off do you make in
opting for large vs. small embedding dimension?
- Choose a wide range of embedding dimensions over which to test
the sensitivity of the results of the NINO3 SSA to choice of M.
Divide your range up into P reasonable choices for M, where P is
about 10. Be sure to include the case M=60.
- For each value of M, recompute the SSA, form the sum of the
first six RCs, and compute the correlation of this summed RC with the
sum of the 6 RCs you formed for the case M=60.
- Make a two panel plot (using subplot(211),
subplot(212) ). Iin the top panel, plot the correlation
with the sum of the first 6 RCs from the M=60 case vs. embedding
dimension. In the second panel, plot the variance explained by
the
first six EOFs vs. embedding dimension. How sensitive is this SSA to
the
choice of M? Why? How sensitive is the total variance
explained by the first six EOFs to choice of M? Why? Do
your
results fit with your thought experiment results (question 2a)? Label
your plots with titles and axis labels, and print the figure.
Be sure to turn in answers to questions in 2a,d.
- Please be sure you've handed in a copy of your answers to Prework
8. Please write "Prework 8", the date, and your name on it.
Hand in a copy of the code you wrote to solve HW8 and your
written answers to Problems 1f-h and 2a,d.
Back
to Schedule/Syllabus.