Last time we downloaded data from quandl.com. This was privately-owned homes completed in a
month in thousands of units(not seasonally adjusted). Now, let's take a look at some basic R
functions to examine time series along with my first exploration of what I feel
is an intriguing package called "changepoint". Please note that this is not a forecasting
effort with much bloodshed determining the proper ARIMA or Holt-Winters model
etc. That I shall leave for another day.

After loading and attaching the data let's begin.

> head(data)

Date Value

1 1/1/1968 1257

2 2/1/1968 1174

3 3/1/1968 1323

4 4/1/1968 1328

5 5/1/1968 1367

6 6/1/1968
118

This goes all the way back to 1968 and the last observation is June of
2013. I want to trim this down. Let's look at the golden era of home
construction, from January, 2000 through 2007.
Now, we need to turn Value into a time series so we can subset by
date. Otherwise, we would need to
identify which observation equals our date and go from there.

> value.ts = ts(Value, frequency=12, start=c(2000,1), end=c(2006,12))
#create subset by date and make it a time series

> plot(value.ts)

Much to my surprise, new home construction had already topped out in
late 2005. I had no idea. Side note:
I still regret not taking advantage of this bubble. The sure sign of the top was when a good
friend of mind sent me an email wanting to know if I was interested in becoming
an investor in oceanfront property in North Carolina, circa 2007. If only R could create a time machine 'eh.

The changepoint package seems to be a simple way to execute a rather
complicated process; the identification of shifts in mean and/or variance in a
time series. This is a lengthy subject
to cover in-depth, so consider this a mere introduction. First of all, why would we want to determine
change in mean and variance for a time series?
Well, my first exposure to changepoint analysis was during Six Sigma
training on control charts. The CUSUM
control chart allows one to identify when a process has undergone a
transformation, leading to a significant shift in the mean. Now, someone with a well-calibrated eyeball
and knowledge of the underlying process can easily point out a shift. Take the time series plot above for instance,
where it is rather obvious when something has changed dramatically. However, what I'm interested in via this
package is the visual capability and the examination of changes in
variance. One of the quaint sayings we
had in Lean Six Sigma was, "customers don't feel the mean, they feel the
variation". I believe if you ask a
statistician that is drowning in a river that is, on average, 3 feet deep they
would agree.

They are a number of changepoint statistical methods and I will show
only results for Pruned Exact Linear Time (PELT) and Binary Segmentation. I will not get into the algorithms behind
them nor shall I list the pros and cons of each as that is your
assignment...should you choose to accept it.
At any rate, I still need to get comfortable with the various methods,
so who am I too lecture anyone on this topic?

Here is the synopsis (in R) of looking for mean and variance
changepoints:

> library(changepoint)

> mvalue = cpt.mean(value.ts, method="PELT") #mean
changepoints using PELT

> cpts(mvalue)

[1] 1
2 3 4 5 6
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 24
25 26

[26] 27 28 29 30 31 32 33 34 35 36 37 38 39 40 42 43 44 45 46 47 48 49
50 51 52

[51] 53 54 55 56 57 58 59 60 61 62 64 65 66 67 68 69 70 71 72 73 74 75
76 77 78

[76] 79 80 81 82 83

The command "cpts" lists the changepoints. All observations are changes, so this is a
complete waste I believe. Time to try
BinSeg.

> mvalue = cpt.mean(value.ts, method="BinSeg")

> cpts(mvalue)

[1] 13 39 48 74 78

> plot(mvalue)

Again, a calibrated eyeball could make perform this task, but can we
learn anything from variance?

> vvalue = cpt.var(diff(value.ts), method="PELT")

> cpts(vvalue)

[1] 16 25 32 38 66 69 79

> plot(vvalue)

> par(mfrow=c(2,1))

> plot(value.ts)

> plot(vvalue)

What have we here? This is not
seasonally adjusted and it looks like we are picking up some of that
seasonality.

After several iterations using the BinSeg method, to no useful outcome,
I went with the method "SegNeigh" (segment neighborhoods)

vnvalue = cpt.var(diff(value.ts), method="SegNeigh", Q=6)
#the Q=6 caps the maximum number of changepoints at 6

> cpts(vnvalue)

[1] 16 25 66 69 79

We have only 5 changepoints according to this method.

> plot(vvalue)

> plot(vnvalue)

Nearly the same outcome. At the
end of the day and given this dataset, I'm not sure changepoint analysis has
added any value. However, I do like the
ability to produce graphics, which may allow an analyst to tell a compelling
story. I also believe it is worth
further examination on other time series.

One more thing...

The base R package allows you to decompose a time series into trend,
seasonality and a random component with "decompose".

> d = decompose(value.ts)

> plot(d)

This plot clearly shows the summer peaks and winter troughs in new
homes.

More to follow...

Is it possible to do changpoint analysis in a rolling fashion? ie, real time?

ReplyDeleteYes it is!

ReplyDeleteWhat do you think about this?

ReplyDeletehttp://stats.stackexchange.com/questions/144763/r-package-changepoint-does-this-make-any-sense