ATMO/GEOS 595c Patterns and Mechanisms of Decadal Climate Variability
Data Exercise 1: Constructing and applying a decadal timescale filter


Exercise Questions
Introduction
Help
Part I.  Filter construction
Part II.  Interpretation

Questions to discuss Monday, Feb 4th, as a result of this data analysis exercise:
Mann et al. (2000) regional SAT reconstructions
 
Introduction

In this exercise we'll construct a simple decadal-timescale filter, and apply it to these data series (here and here, extracted and de-headered from here and here, respectively).  Given your results and reading of the papers assigned for this week and last week, bring your answers to these questions for discussion, Monday, 4 February 2008. 

Help 

Feel free to employ any computational environment you prefer.  I will give some direction on how to do this in matlab, which is currently sitelicensed at the University of Arizona (freeware Octave workalike is here).  A short introduction to operation, nuts and bolts coding, and displaying graphics in matlab from another course I offer is here; a rough set of computing principles is here.  See also the links in this assignment to further explanation of terminology and concepts. 

A good general reference for time series analysis is Chatfield, Analysis of Time Series: An Introduction (6th ed., 2003).

Pascal's triangle, first five rows, from WikipediaPart I.  Construction of a decadal timescale filter. 

There are many ways to filter data for a signal of interest and none of them are perfect.  But some are better than others. 

Consider the filter that we can construct using Pascal's triangle (illustrated at right courtesy of Wikipedia).  Starting with one in the zeroth (topmost; mathematicians like to start numbering with zero) row, each row of Pascal's triangle can be constructed from the sum of the two elements above and to the immediate right and left.  For example, imagine zeroes in the zeroth row, to the left and right of the two ones illustrated.  The leftmost element of the first row is the sum 0+1=1; the central element is the sum 1+1=2, and the rightmost element is the sum 1+0=1.   Notice that consequently each row has  1 more element than the row above from which it is derived.  Notice also that each row is symmetric, so you really only need to derive the first half of each; the second half is the mirror image of the first half.   Row elements are also labeled starting with the zeroth element.  So the row number is given by the first element of each row.  In the illustration above, the last row is labeled row 4.

Each row of the triangle can be interpreted as the unnormalized shape of a time series filter with increasing filter length.  For instance, the second row is a simple 3-point triangle.

Let's construct a decadal scale filter from Pascal's triangle.  The filter length (number of elements in a row of Pascal's triangle) should be about as long as the timescale you wish to emphasize, and longer than the timescales you wish to filter out.  Longer timescales will remain in the filtered version of the data series you develop; band pass filters are for you to investigate (and you should!). 

Pick a half filter length h you think appropriate.  We'll use Pascal's rule to derive the coefficients.  The coefficients are given by

Pascal's rule

where n = length(-h:h) is the number of elements in the filter, and k is the kth element of the filter, for nonnegative n and k=0...n.  The exclamation mark is the factorial abbreviation, with n! = n*(n-1)*(n-2)*(n-3)*...*1.

For example, consider the half filter length h=5.  The filter itself will have elements corresponding to [-5 -4 -3 -2 -1 0 1 2 3 4 5], for a 1+2*h (in this case, 1+2*5=11) element series of coefficients.   Note that by beginning with the filter half-length, we get an odd filter length, which will be centered on and symmetric about the middle element.

The filter elements are then:
You can see how a computer is going to be useful here.  First, let's load up the reconstruction data.  In matlab this is

% load the Mann N. American-average warm season reconstructions
% data source: http://www.ncdc.noaa.gov/paleo/ei/ei_data/namerwarm-recon.dat
% publication url: http://www.ncdc.noaa.gov/paleo/ei/ei_cover.html
load namerwarm_rec.dat
nyr=namerwarm_rec(:,1);
nt=namerwarm_rec(:,2);

% load the Mann N. American-average warm season reconstructions
% data source: http://www.ncdc.noaa.gov/paleo/ei/ei_data/namerwarm-recon.dat
% publication url: http://www.ncdc.noaa.gov/paleo/ei/ei_cover.html
load tropicc_rec.dat
tyr=tropicc_rec(:,1);
tt=tropicc_rec(:,2);

To plot the data as above, here is the matlab code:

subplot(211);plot(nyr,nt,tyr,tt);
axis([1750 2000 -0.4 0.4]);grid;
title('Mann et al. (2000) regional SAT reconstructions')
legend('NAm. warm','tropics cold','Location','Northwest')

In matlab, for calculation of the elements of the filter, the relevant function is nchoosek.  As with any function in matlab, type help nchoosek to figure out how to use this function.

% design a binomial filter with 2*hl+1 elements
h=5;
filt=zeros(1,length(-h:1:h)); % row vector
for i=1:length(-h:h)
  filt(i)=nchoosek(h*2,i-1);
end
[filt']

When you have your raw filter elements, divide each by the sum of the raw filter elements to get normalized filter elements.  You can also think of these coefficients as the weights of a weighted average of any 1+2*h sequential elements of the filtered data series.  (Why?)

% normalize it
filtn=filt./sum(filt);

Now, convolve your filter with the dataseries to be filtered.  Convolution is a mathematical function denoted by an asterisk (not to be confused with multiplication).  Suppose we are convolving a filter with elements [e1 e2 e3 ] with a data series with elements [d1 d2 d3 d4 ].  The convolution can be written as the sum of inverse diagonals of a matrix, which itself is formed as the products of the elements of e and d as follows:


d1 d2 d3 d4
e1 e1d1 e1d2 e1d3 e1d4
e2 e2d1 e2d2 e2d3 e2d4
e3 e3d1 e3d2 e3d3 e3d4

The convolution series q = d*e = e*d is the following series of 6 elements:

q1 = sum of first set of inverse diagonal elements (red products) = e1d1
q1 = sum of second set of inverse diagonal elements (blue products)= e2d1 + e2d2
q1 = sum of third set of inverse diagonal elements (green products)= e3d1 + e2d2 + e1d3
q1 = ...
q1 = ...
q1 = ...

Again you can see how good a computer would be at this - general algorithm is given  in the convolution link above.  In matlab we can use the conv operator.

% convolve with dataseries
ntf=conv(filtn,nt);

The convolution produces a smoothed data series with number of elements length(q) equal to length(e)+length(d)-1.  This is longer than the original data series by the one element less than the length of the filter: at either end of the convolved series we have half this number of elements. These elements are the result of an incomplete convolution, e.g. the convolution of only a portion of the filter with the data series.  This produces a biased filtering result on either end of the data series which should be thrown away.  So we retain just the elements of the convolution from the (1+h)th element from the beginning of the series, through the (1+h)th element from the end of the convolution.
 
% chop convolution ends

ntfc=ntf((1+2*h:(length(ntf)-2*h)));
nyrc=[nyr(1)+h:nyr(length(nyr)-h)];

To see the results of your filtering effort, plot the unfiltered and filtered data series vs. time.   (You can also subtract the filtered version from your original data series -- what does this give you?)  Is there decadal variability in either series?  Does it appear regular, periodic and predictable? Or otherwise?

Part II.  Interpretation.

Compare what you get with the results of a running mean or "boxcar" filter.  This is retrieved by convolving the filter consisting of  1+2*h elements all equal to 1.  Normalize by the length of the filter; this is simply an unweighted average, where our binomial filter was a weighted average. 

How is this result different from the one you got with the binomial filter?  Which looks like a better filtering job?  Why?  Why is this the case?  (Hint: read about Gibbs oscillations here; compute the power spectra of the raw and filtered series, and look at how they differ.)
Calculate the correlation of the two filtered data series: normalize each data series by removing the mean and then dividing through by the standard deviation. Then take the sum of the product of the two series, and divide by number of elements in the sum minus 1:

% calculate filtered version correlation:
d1=(ntfc-mean(ntfc))./std(ntfc);
d2=(ttfc-mean(ttfc))./std(ttfc);
c=d1'*d2./(length(d1)-1);


Are the two series correlated?  If so, why, and what are possible sources of the correlation, discussed by Mann et al. 2000, and Schubert et al. (2004)?  How would you go about separating these influences on the correlation? 

Now go back to the top of this page and prepare your thoughts on the answers to our discussion questions.



back to Syllabus/Schedule.  Last updated 30 Jan 2008.