GEOS 597e: Spatiotemporal Data
Analysis Workshop
Homework 3: Covariance calculation
Last updated 9/13/06. To be completed prior to class
session Weds., Sept. 20th.
Introduction: Today
we'll review and code calculation of the covariance of two time series
with missing values. Again, a little matrix algebra practice to
get started. You'll need paper and pencil for Problem 1.
Depending on your comfort programming in Matlab, allow some time
to struggle with the programming in this Homework. Don't forget
about your handy Matlab
Quick Reference from the Navy website, and don't forget the Matlab
commands help
functionname, which gives you
information on syntax and how to use the command functionname,
and lookfor keyword
which finds functions with the keyword in the comments and help lines.
I've also included some hints about functions which may come in
handy for various programming problems.
- Paper and pencil.
Suppose I have two variables, X
and Y, which both vary in time
(i.e. X and Y are time series). I have a realization (loosely speaking, a
finite set of observations) of each time series variable, x and y respectively. x and y are column vectors of length nt. Given the two column
vectors x and y,
xT =
[-4 -1
1 3
1 6]
yT = [-4
2 -1
2
2 5]
calculate the covariance of x and y (don't forget to remove the mean
of
each vector). Is the covariance positive, negative, or near zero?
Plot y vs. x, and draw a best-fit regression line by eye through
the data points. Is the slope of the line positive, negative,
zero, or undefined?
- Code it. Next, open
a script (hw3_lastname.m)
and write a line of code to make the same calculation in Matlab:
covariance c of x and y. Check that your script
produces the same result for c
as your calculation for Problem 1. Helpful matlab functions:
the transpose operator ' , length, mean. (For
instance, type help
mean to find out how to use the command mean.)
Note that typing x(n)in Matlab,
where n is an integer from 1 to the length of x, returns the nth row
value of x. For instance x(4) is 3; x(1)is -4;
x(2:4) is [-1 1 3]; if i =
x(2:4), then i
= [-1
1 3]. These are just examples of how Matlab allows you to
keep track of the vector x as
an array of numbers.
- Simple example. Now
load the matlab workfile hw3data
(use save as... from your browser and save to local disk) into
Matlab.
- Use your script to calculate the covariance of vector a with vector b.
- Use your script to calculate the covariance of vector c with vector d.
- The nature of covariance
estimates. It turns out that vector a
is just a small segment of vector c.
Similarly, vector b is
just a small segment of vector d.
- Why isn't the covariance of a and b equal to the covariance of c and d, and why are both covariance
calculations estimates of the true covariance C between variables A and B, C and D?
- Why do we always have an estimate c of the true covariance C when we are using real
observations?
- Real data throw curves.
Real data sets, such as the one we will study, have missing
values, as do vectors cr, dr, er,
and fr. These missing
values are denoted as "NaN" ("NaN" stands for "Not a Number").
NaN*NaN = NaN, and NaN multiplied by a number is also NaN.
- Now modify your script to calculate the covariance of two
vectors, taking into account that each vector may have missing values.
Helpful matlab functions: find, isnan,
the "is
not"
operator ~ , intersect. Use your
modified
script to calculate the covariance of vector cr and vector dr.
- Use your modified
script to calculate the covariance of vector er and vector fr.
- Interpret some results. Let's
look at some
real time series from the MOHSST5 SST dataset. These are in
the matrix variable ts, which has 360 rows
and 4 columns. You can think of the matrix ts as a set of column
vectors, each of which is an SST anomaly time series. The 360
rows
correspond to sequential months, from January 1961 through December
1990. The 4 columns correspond to time series SST anomaly data
(relative to the period 1961-1990 climatology) from four different
locations in the global ocean. The locations are: eastern
equatorial Pacific (122.5W, 2.5S); Northern Atlantic (37.5W, 57.5N);
Western Indian Ocean (37.5E, 7.5N); and the South Atlantic (37.5W,
37.5S).
- Modify your script
once again to compute the covariance
matrix of the four SST anomaly time series. That is to
say, compute the covariances of all possible pairs of the four time
series, taking into
account that there are missing values in each of
the time series:
e.g. Csst = [ c11 c12
c13 c14;
c21 c22 c23
c24;
c31 c32 c33
c34;
c41 c42 c43
c44];
where cnm is the covariance
between the time series in column
n and in column m of the matrix ts.
As shown above, your matrix Csst will be 4 x 4 in dimension and
have
no missing values. It will also be symmetric by construction.
Helpful Matlab function: for ... end.
Once you've calculated the covariance matrix, answer the
following questions.
- What do the values on the diagonal of csst represent?
(Recall that the diagonal of the matrix are the elements for
which
m=n.) How do they compare with each other? What does this
mean for SST variabillity in the different locations?
- Which locations have the largest positive covariance?
What does this mean in terms of simultaneous SST anomaly at these
two locations?
- Which locations have the largest negative
covariance? What does this mean in terms of simultaneous SST
anomaly at these two locations?
- Which locations have covariance closest to
zero? What does this mean in terms of simultaneous SST anomaly at
these two locations?
- Why wouldn't you get the same results if we calculated the
covariances over, say, Jan 1931- Dec 1960?
- Products. Please
hand in a copy of your answers to Prework 3. Please
write "Prework 3" and your name on it. Please hand in your
answers
to Homework 3; be sure you've answered all the questions. Please
write
"Homework 3" and your name on it. Print a copy of your hw3
script and hand it in
together with your answers to the homework questions.
Back
to Schedule/Syllabus.