GEOS 597e: Spatiotemporal Data Analysis Workshop

Homework 4: Empirical orthogonal function calculation

Last updated 9/21/06.
To be completed prior to class session Weds., Oct. 6th.  Note you have two weeks to work on this rather central homework assignment.


Introduction: In this homework we'll learn and practice the calculation of EOFs,
and we'll calculate the empirical orthogonal functions and principal components of the covariance matrix of GOSTA sea surface temperatures for 1961-1990.
  1. Paper and pencil.  Let F be a 2 x 3 matrix whose rows are station locations and whose columns give observations over time, such that the rows of F are 3 time series measurements  of some variable measured at two locations:
 F =  [ 1 -2 1
           2  -1  -1 ]

Calculate the eigenvalues, eigenvectors, and principal components of the covariance R of the following dataset, F.  Report your eigenvectors normalized to unit length.  Calculate also normalized eigenvalues.  Check that your eofs and principal components are correct by using them to reconstruct F exactly.  Check that your eofs are orthogonal.
  1. Code it.  Open a script (call it hw4_lastname.m).  Calculate the eigenvalues, normalized eigenvectors, and principal components of F using the matlab function svd and confirm that you get the same results as in  Problem 1.   Note that in the special case that the input to the svd calculation is a square symmetric matrix like R, the SVD gives us R=ELET, where E is the matrix whose columns are the eigenvectors of R, and L is the matrix whose diagonal elements are the eigenvalues of R.  Calculate and check the normalized eigenvalues as well. Helpful Matlab functions: length, svd, diag, sum, zeros.
  2. Test it.  Use your script to perform an EOF analysis of the 4 timeseries sea surface temperature matrix ts from HW3. (To retrieve those calculations, you can load your hw3 matlab workfile or rerun your script, using save filename.mat ts ts_covariance, replacing filename and ts_covariance with the appropriate names.  Now you can use  load filename.mat in your HW4 script to access those variables without recalculating them. 
    1. Calculate the eigenvectors, eigenvalues, normalized eigenvalues, and principal components of the covariance of ts.  Important note:  In general, be sure to remove the mean from each time series in ts, as you did to construct the covariance, to create SST anomaly timeseries, prior to calculating the principal components.  Hint: to calculate the principal components, make a copy of the SST dataset, replacing all NaNs in it with zeros.  In this case, when you multiply the EOF matrix by this matrix to get the principal components, you will be adding zeros where you have no data for the inner products; this give an unbiased result, as you removed the mean of each data series (leaving out missing values) over exactly this period as well.  (Make the calculation by hand for a 1x3 eof matrix and a 3x3 data series matrix, with NaNs in it, to see what I mean.)
    2. Verify that you can reconstruct the anomaly series ts from the eigenvalues and eigenvectors of the covariance matrix.
    3. Make a plot to report your results.  Your plot should have 3 panels: eigenvalue amplitude vs. eigenvalue number; eigenvector amplitudes vs. station, and principal component amplitude vs. time, for the first principal component only.  Have Matlab make a color postscript copy of your plot.  (You can use the commands subplot(221), subplot(222), and subplot(212) before each of your plotting commands to put  these three plots all on a single page.)  Helpful Matlab commands: subplot, plot, title, legend, xlabel, ylabel, grid, print.  Print your result to a postscript file by issuing the command: print -dpsc hw4q3c.ps once you have the plot you like.
    4. Notice that the first EOF [ e.g. subplot(222);plot(eof(:,1)) ] loads most strongly (really, loads only) on the first station location.  Explain why the four EOFs you calculated load most strongly on stations 1, 4, 2, and 3, respectively.
    5. Were we able to find strong patterns of common variance among the time series, as did Weare et al. (1976)?  Why isn't this a very useful EOF analysis?
  1. Calculate the global SST covariance matrix.  Use your script to calculate the covariance, eigenvectors, eigenvalues, and principal components of the full field dataset for January 1961- December 1990 from the MOHSST5 dataset.  Find the dataset here.  This is a binary file with specific structure, so use the following code pasted into your script for how to read the data in properly and plot it (there're some quirks).   To run this code, you'll also need the land area "mask" you can download here.  You can also see also the following workspace which results from running ld.m.  Here's a suggested roadmap to help you with your script:
    1. If you work on your own computer and not the CLUE PCs, you'll need to install the m_map plotting routines (including modified m_pcolor.m function) built and distributed by Rich Pawlowicz from the University of British Columbia. Find a zipped copy of this package here and install it to your system in a place matlab knows where to look for it.  (See here for the original source of these routines.)
    2. Before you set your script loose on this calculation (it could take a while), be very sure it is doing what you wanted it to do, for instance by testing it on a small subset of the full matrix.  We may still bring the PCs to their knees.
    3. Find the time series which have at least 25 years of monthly values (this way, covariances are guaranteed to have at least 10 years of overlap; why?).  Be sure to save the indices of these time series, which will allow you later to plot them on the full original spatial grid. Remove the mean from each time series in the ocean-only SST anomaly dataset.
    4. Calculate the covariance.  Don't forget you can roughly halve the number of calculations by exploiting the symmetry of the covariance matrix!  Verify that there are no NaNs in the covariance matrix.
  1. Analyze this.  Perform the eigenvector decomposition of the covariance matrix you just calculated.
    1. Calculate the eigenvalues, EOFs, and principal components of the covariance matrix you just calculated.  Hint: make a copy of the SST dataset with only the time series with at least 25 years of monthly values, replacing all NaNs in it with zeros.  When you multiply the EOF matrix by this matrix to get the principal components, you will be adding zeros where you have no data for the inner products; this give an unbiased result, as you removed the mean of each data series over exactly this period as well.  (Make the calculation by hand for a 1x3 eof matrix and a 3x3 data series matrix, with NaNs in it, to see what I mean.)
    2. As before, verify that you can reconstruct the SST anomaly series from the eigenvalues and eigenvectors of the covariance matrix. 
    3. Make three plots to report your results.  Your first plot is of normalized eigenvalue amplitude vs. eigenvalue number.  Use the command axis to focus on the range [0 10  0 0.2].  The second plot should show principal component amplitudes vs. time, for the first  four principal components only.   Use the Matlab commands subplot(411), subplot(412), ... subplot(414) to make these time series plots. Your third plot should show the first four EOF patterns in space (use Matlab plotting commands orient landscape  and subplot(221), subplot(222), ... subplot (224) to put these plots on the same page.  Use the code for plotting maps iusing the m_map routine m_pcolor.m.  Have Matlab make color postscript copies of your plots using print -dpsc  and call the three figures hw4q5c1.ps, hw4q5c2.ps, hw4q5c3.ps.  Helpful Matlab commands: reshape (see ld.m for examples), squeeze (see ld.m for example), zeros (for creating a blank matrix for which you'll then fill in specific elements with useful numbers).
    4. How much of the variance in the covariance matrix do the first four EOF patterns explain?  Is this a relatively large or small number?  Why is this the case?
    5. Do the first four EOF patterns and their principal components  have large-scale structures?  Why/Why not?  If yes, do the spatial patterns of the EOFs resemble SST anomaly patterns you have seen before, in this week's reading, or in other papers you have seen in the literature?
  1. Save your work.  As in HW2, save your calculations by issuing the following matlab command, replacing lastname with your last name, but saving only key matrices:
    >> save hw4soln_lastname.mat R evals eofs pcs

replacing  R eofs evals pcs with the names of the variables you assigned to the calculation of the SST covariance matrix, eigenvalues, eigenvectors and principal components you calculated in Problem 5.    If necessary, clean up or compress older files not used this week.  But we will be returning to results from this week's exercise, so don't delete anything precious or time-consuming to recreate.
  1. Products.  Please be sure you've handed in a copy of your answers to Prework 4.  Please write "Prework 4" and your name on it. Please hand in your answers to Homework 4 questions 1, 3de, and 5de above.   Please write "Homework 4" and your name on it.  Print a copy of your HW4 script;  call it hw4_lastname.m.  Print your carefully labeled plots for questions 3c and 5c and hand them in your HW4 script.
 

Back to Schedule/Syllabus.