GEOS 597e: Spatiotemporal Data Analysis Workshop

Homework 3: Covariance calculation

Last updated 9/13/06.
To be completed prior to class session Weds., Sept. 20th.  


Introduction: Today we'll review and code calculation of the covariance of two time series with missing values.  Again, a little matrix algebra practice to get started.  You'll need paper and pencil for Problem 1.  Depending on your comfort programming in Matlab, allow some time to struggle with the programming in this Homework.  Don't forget about your handy Matlab Quick Reference from the Navy website, and don't forget the Matlab commands help functionname,
which gives you information on syntax and how to use the command functionname, and lookfor keyword which finds functions with the keyword in the comments and help lines.   I've also included some hints about functions which may come in handy for various programming problems.
  1. Paper and pencil.  Suppose I have two variables, X and Y, which both vary in time (i.e. X and Y are time series).  I have a realization (loosely speaking, a finite set of observations) of each time series variable, x and y respectively.  x and y are column vectors of length nt.  Given the two column vectors x and y,
 xT =     [-4    -1     1     3     1     6]
 yT =     [-4     2    -1     2     2     5]

calculate the covariance of x and y (don't forget to remove the mean of each vector).  Is the covariance positive, negative, or near zero?  Plot y vs. x, and draw a best-fit regression line by eye through the data points.  Is the slope of the line positive, negative, zero, or undefined?
  1. Code it.  Next, open a script (hw3_lastname.m) and write a line of code to make the same calculation in Matlab: covariance c of x and y.  Check that your script produces the same result for c as your calculation for Problem 1.  Helpful matlab functions: the transpose operator  ' , lengthmean. (For instance, type help mean to find out how to use the command mean.)  Note that typing x(n)in Matlab, where n is an integer from 1 to the length of x, returns the nth row value of x.  For instance x(4) is 3; x(1)is -4; x(2:4) is [-1 1 3]; if  i = x(2:4), then i = [-1 1 3].  These are just examples of how Matlab allows you to keep track of the vector x as an array of numbers.
  2. Simple example.  Now load the matlab workfile hw3data (use save as... from your browser and save to local disk) into Matlab.  
    1. Use your script to calculate the covariance of vector a with vector b.
    2. Use your script to calculate the covariance of  vector c with vector d.
  3. The nature of covariance estimates.  It turns out that vector a is just a small segment of vector c.  Similarly, vector b is just a small segment of vector d.
    1. Why isn't the covariance of  a and b equal to the covariance of c and d, and why are both covariance calculations estimates of the true covariance C between variables A and B, C and D?
    2. Why do we always have an estimate c of the true covariance C when we are using real observations?
  4. Real data throw curves.  Real data sets, such as the one we will study, have missing values, as do vectors cr, dr, er, and fr.  These missing values are denoted as "NaN" ("NaN" stands for "Not a Number").  NaN*NaN = NaN, and NaN multiplied by a number is also NaN.
    1. Now modify your script to calculate the covariance of two vectors, taking into account that each vector may have missing values.  Helpful matlab functions:  find, isnan, the "is not" operator ~ , intersect.  Use your modified script to calculate the covariance of vector cr and vector dr.
    2. Use your modified script to calculate the covariance of vector er and vector fr.
  5. Interpret some results.  Let's look at some real time series from the MOHSST5 SST dataset.  These are in the matrix variable ts, which has 360 rows and 4 columns.  You can think of the matrix ts as a set of column vectors, each of which is an SST anomaly time series.  The 360 rows correspond to sequential months, from January 1961 through December 1990.  The 4 columns correspond to time series SST anomaly data (relative to the period 1961-1990 climatology) from four different locations in the global ocean.  The locations are: eastern equatorial Pacific (122.5W, 2.5S); Northern Atlantic (37.5W, 57.5N); Western Indian Ocean (37.5E, 7.5N); and the South Atlantic (37.5W, 37.5S). 
    1. Modify your script once again to compute the covariance matrix of the four SST anomaly time series.  That is to say, compute the covariances of all possible pairs of the four time series, taking into account that there are missing values in each of the time series:
e.g. Csst = [ c11 c12 c13 c14;
                    c21 c22 c23 c24;
                    c31 c32 c33 c34;
                    c41 c42 c43 c44];

where cnm is the covariance between the time series in column n and in column m of the matrix ts.  As shown above, your matrix Csst will be 4 x 4 in dimension and have no missing values.  It will also be symmetric by construction.  Helpful Matlab function: for ... end.   Once you've calculated the covariance matrix, answer the following questions.
    1. What do the values on the diagonal of csst represent?  (Recall that the diagonal of the matrix are the elements for which m=n.)   How do they compare with each other?  What does this mean for SST variabillity in the different locations?
    2. Which locations have the largest positive covariance?  What does this mean in terms of simultaneous SST anomaly at these two locations?   
    3. Which locations have the largest negative covariance?  What does this mean in terms of simultaneous SST anomaly at these two locations?
    4.  Which locations have covariance closest to zero? What does this mean in terms of simultaneous SST anomaly at these two locations?
    5. Why wouldn't you get the same results if we calculated the covariances over, say, Jan 1931- Dec 1960?
  1. Products.  Please hand in a copy of your answers to Prework 3.  Please write "Prework 3" and your name on it. Please hand in  your answers to Homework 3; be sure you've answered all the questions.   Please write "Homework 3" and your name on it. Print a copy of your hw3 script and hand it in together with your answers to the homework questions.

Back to Schedule/Syllabus.