London Weather/Crime (Multivariate)
Introduction
In this example we will study the relationship between weather and crime in in Greater London. In particular we will be considering ten years of monthly data about weather and crime in Greater London from 2008 to 2018.
x = london()
plot(x)
The variables at play will be:
- MaxTemp: Monthly data for the mean daily maximum temperature in C° for each month.
- Violence: Violent crimes comprising; Assault with Injury, Common Assault, Grievous Bodily Harm, Harassment, Murder, Offensive Weapon, Other violence, Wounding/GBH
- Sexual: Rape, Other sexual crimes.
Descriptive Analysis
Seasonality
splot(x[:,["Date","MaxTemp"]], title="Seasonal Maximum Temperature")
splot(x[:,["Date","Violence"]], title="Seasonal Violent Crimes")
splot(x[:,["Date","Sexual"]], title="Seasonal Sexual Crimes")
Sesonality patterns for MaxTemp
are obvious and noticeable for Violence
, however there's barely any to be seen in Sexual
.
Normalization
To compare time series with different positive magnitudes and to facilitate an stationary forecasting we will normalize data by dividing each variable by its maximum value.
tx = copy(x)
tx[!,2:end] = Array(x[:,2:end]) ./ (1. * mapslices(maximum, Array(x[:,2:end]), dims=[1]))
tx = tx[:,["Date","MaxTemp","Violence","Sexual"]]
To achieve an stationary time series to start our analysis we need to differentiate MaxTemp
with a lag of 12, however this might not be enough for Violence
and Sexual
since they also show a marked trend from 2013 onwards.
dtx = d(tx,1,12)
plot(dtx)
An extra differentation might be required for Violence
and Sexual
but this seems to be one to many for MaxTemp
indicated by an increase in its variance when doing so. Also, the apparent need for a differentation in Violence
and Sexual
seems to be a temporary effect rather than a permanent trend or seasonality therefore we will continue with just one differentation of order 12.
Fitting an AR(1) model
Given the scarcity of data for a multivariate model we cannot fit too many parameters while keeping its significance. We will therefore fit an AR of order one.
xar = ar(dtx,1,false)
Multivariate Autoregressive Model ar(X, order=1, constant=false) Residuals Summary ┌──────────────────┬────────────┬────────────┬──────────────┬────────────┬───────────┬───────────┬───────────┐ │ Variable │ Min │ 1Q │ Median │ Mean │ 3Q │ Max │ H0 Normal │ ├──────────────────┼────────────┼────────────┼──────────────┼────────────┼───────────┼───────────┼───────────┤ │ d[1,12]_MaxTemp │ -0.26383 │ -0.0586484 │ -0.000102283 │ 0.00233686 │ 0.0612348 │ 0.205427 │ 0.359532 │ │ d[1,12]_Violence │ -0.0621247 │ -0.0201322 │ -0.000376752 │ 0.00265504 │ 0.0240683 │ 0.0918733 │ 0.282503 │ │ d[1,12]_Sexual │ -0.113873 │ -0.0224348 │ 0.0130658 │ 0.0117103 │ 0.0432341 │ 0.110499 │ 0.561249 │ └──────────────────┴────────────┴────────────┴──────────────┴────────────┴───────────┴───────────┴───────────┘ ┌──────────────────┬────────────┬─────────────┬─────────────┬─────────────┐ │ Variable │ Mean │ Variance │ Skewness │ Kurtosis │ ├──────────────────┼────────────┼─────────────┼─────────────┼─────────────┤ │ d[1,12]_MaxTemp │ 0.00233686 │ 0.00689961 │ 0.00689961 │ 0.00689961 │ │ d[1,12]_Violence │ 0.00265504 │ 0.000875094 │ 0.000875094 │ 0.000875094 │ │ d[1,12]_Sexual │ 0.0117103 │ 0.00224528 │ 0.00224528 │ 0.00224528 │ └──────────────────┴────────────┴─────────────┴─────────────┴─────────────┘ Coefficients Φ0 ┌ ┐ │ 0.0 + │ │ 0.0 + │ │ 0.0 + │ └ ┘ Φ1 ┌ ┐ │ 0.372 *** -0.179 0.149 │ │ -0.131 *** 0.911 *** 0.024 │ │ -0.074 ^ 0.295 ** 0.601 *** │ └ ┘ Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘^’ 0.1 ‘ ’ 1 and ‘+’ if fixed Σ2 Variance/Covariance Matrix ┌ ┐ │ 0.00734058 0.0016834 0.00124417 │ │ 0.0016834 0.000937838 0.000718899 │ │ 0.00124417 0.000718899 0.00253389 │ └ ┘ Information Criteria ┌──────────┬───────────┬─────────┬──────────┐ │ AIC │ AICC │ BIC │ H&Q │ ├──────────┼───────────┼─────────┼──────────┤ │ -3.42655 │ -0.344114 │ 548.682 │ -2.66244 │ └──────────┴───────────┴─────────┴──────────┘ Statistics ┌──────────────────┬─────────────────┬──────────┬──────────┐ │ Variable │ Fisher's p-test │ R2 │ R2adj │ ├──────────────────┼─────────────────┼──────────┼──────────┤ │ d[1,12]_MaxTemp │ 1.12247e-8 │ 0.142054 │ 0.122388 │ │ d[1,12]_Violence │ 0.0 │ 0.77584 │ 0.770702 │ │ d[1,12]_Sexual │ 2.93029e-8 │ 0.407072 │ 0.39348 │ └──────────────────┴─────────────────┴──────────┴──────────┘
The first thing we can do is to use Φ1 to infer directional causality, in this case we see how MaxTemp
is not influenced at all (with statistical significance) by Violence
and Sexual
, which could not possibly be otherwise in this case. On the other hand we have Violence
and Sexual
being affected significantly by MaxTemp
.
The relathionship between Violence
and Sexual
is both ways, however, the effect Sexual
has on Violence
is small and barely significant which might prompt us to consider to remove it altogether, the opposite though is not true, Violence
has a strong and significant influence in Sexual
.
Fixing Coefficients
Let's now fix to 0 the influcence of Violence
and Sexual
on MaxTemp
, remove the influence from MaxTemp
on Sexual
since it is accounted for via Violence
and the non-significant influence of Sexual
on Violence
. If now we fit again the model we have
dϕ1 = xar.Φ
dϕ2 = copy(dϕ1)
dϕ2[1,2:3,1] .= 0.0
dϕ2[3,1,1] = 0.0
dϕ2[2,3,1] = 0.0
dΦ = (dϕ1, dϕ2)
fxar = ar(dtx,1,false; dΦ)
Multivariate Autoregressive Model ar(X, order=1, constant=false) Residuals Summary ┌──────────────────┬────────────┬────────────┬─────────────┬────────────┬───────────┬───────────┬───────────┐ │ Variable │ Min │ 1Q │ Median │ Mean │ 3Q │ Max │ H0 Normal │ ├──────────────────┼────────────┼────────────┼─────────────┼────────────┼───────────┼───────────┼───────────┤ │ d[1,12]_MaxTemp │ -0.258699 │ -0.0586711 │ 0.00278472 │ 0.00460774 │ 0.0593254 │ 0.20366 │ 0.38802 │ │ d[1,12]_Violence │ -0.0620228 │ -0.0195767 │ 0.000967609 │ 0.00322703 │ 0.0232835 │ 0.0921191 │ 0.279685 │ │ d[1,12]_Sexual │ -0.112371 │ -0.0192755 │ 0.0145777 │ 0.012186 │ 0.0461189 │ 0.109148 │ 0.544738 │ └──────────────────┴────────────┴────────────┴─────────────┴────────────┴───────────┴───────────┴───────────┘ ┌──────────────────┬────────────┬─────────────┬─────────────┬─────────────┐ │ Variable │ Mean │ Variance │ Skewness │ Kurtosis │ ├──────────────────┼────────────┼─────────────┼─────────────┼─────────────┤ │ d[1,12]_MaxTemp │ 0.00460774 │ 0.00695722 │ 0.00695722 │ 0.00695722 │ │ d[1,12]_Violence │ 0.00322703 │ 0.000873262 │ 0.000873262 │ 0.000873262 │ │ d[1,12]_Sexual │ 0.012186 │ 0.00227608 │ 0.00227608 │ 0.00227608 │ └──────────────────┴────────────┴─────────────┴─────────────┴─────────────┘ Coefficients Φ0 ┌ ┐ │ 0.0 + │ │ 0.0 + │ │ 0.0 + │ └ ┘ Φ1 ┌ ┐ │ 0.368 *** 0.0 + 0.0 + │ │ -0.131 *** 0.932 *** 0.0 + │ │ 0.0 + 0.279 ** 0.594 *** │ └ ┘ Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘^’ 0.1 ‘ ’ 1 and ‘+’ if fixed Σ2 Variance/Covariance Matrix ┌ ┐ │ 0.00741873 0.0016939 0.00124207 │ │ 0.0016939 0.000939497 0.000719417 │ │ 0.00124207 0.000719417 0.00257882 │ └ ┘ Information Criteria ┌──────────┬──────────┬─────────┬──────────┐ │ AIC │ AICC │ BIC │ H&Q │ ├──────────┼──────────┼─────────┼──────────┤ │ -4.01645 │ -3.50617 │ 376.335 │ -3.59194 │ └──────────┴──────────┴─────────┴──────────┘ Statistics ┌──────────────────┬─────────────────┬──────────┬──────────┐ │ Variable │ Fisher's p-test │ R2 │ R2adj │ ├──────────────────┼─────────────────┼──────────┼──────────┤ │ d[1,12]_MaxTemp │ 5.65306e-9 │ 0.13292 │ 0.123095 │ │ d[1,12]_Violence │ 0.0 │ 0.775443 │ 0.772899 │ │ d[1,12]_Sexual │ 2.62374e-8 │ 0.396559 │ 0.389722 │ └──────────────────┴─────────────────┴──────────┴──────────┘
Transformed Forecasting
Next we can see the two years forecast for the transformed dataset.
dtfc = forecast(fxar,12*2)
plot(dtfc)
In order to obtain a forecast for the original data we need to invert the transformations carried out previously.
Scaled Forecasting
Next we can see the two years forecast for the scaled dataset.
x0 = Array(tx[1:12,2:end])
x0 = reshape(x0',1,3,12)
tfc = p(dtfc, x0)
setnames!(tfc,["MaxTemp / $(maximum(x.MaxTemp))",
"Violence / $(maximum(x.Violence))",
"Sexual / $(maximum(x.Sexual))"])
plot(tfc,title="Greater London Crime/Weather Scaled Forecast")
The two years forecast follows:
Final Forecast
fc = transform(tfc,(v->v*maximum(x.MaxTemp)),1);
fc = transform(fc,(v->round(v*maximum(x.Violence))),2)
fc = transform(fc,(v->round(v*maximum(x.Sexual))),3)
setnames!(fc,["MaxTemp","Violence","Sexual"])
fc.mean
24 rows × 4 columns
Date | MaxTemp | Violence | Sexual | |
---|---|---|---|---|
Date… | Float64 | Float64 | Float64 | |
1 | 2019-01-01 | 10.4366 | 21530.0 | 1699.0 |
2 | 2019-02-01 | 6.97131 | 19412.0 | 1578.0 |
3 | 2019-03-01 | 9.89993 | 22321.0 | 1752.0 |
4 | 2019-04-01 | 15.5368 | 21654.0 | 1651.0 |
5 | 2019-05-01 | 20.8136 | 24424.0 | 1877.0 |
6 | 2019-06-01 | 24.205 | 24076.0 | 1920.0 |
7 | 2019-07-01 | 28.3018 | 25919.0 | 2072.0 |
8 | 2019-08-01 | 24.5007 | 22048.0 | 1702.0 |
9 | 2019-09-01 | 20.9002 | 22048.0 | 1780.0 |
10 | 2019-10-01 | 16.5001 | 22658.0 | 1821.0 |
11 | 2019-11-01 | 12.2 | 22219.0 | 1769.0 |
12 | 2019-12-01 | 10.7 | 22143.0 | 1519.0 |
13 | 2020-01-01 | 10.4366 | 22098.0 | 1736.0 |
14 | 2020-02-01 | 6.97131 | 19942.0 | 1613.0 |
15 | 2020-03-01 | 9.89993 | 22815.0 | 1785.0 |
16 | 2020-04-01 | 15.5368 | 22114.0 | 1682.0 |
17 | 2020-05-01 | 20.8136 | 24852.0 | 1906.0 |
18 | 2020-06-01 | 24.205 | 24475.0 | 1946.0 |
19 | 2020-07-01 | 28.3018 | 26291.0 | 2097.0 |
20 | 2020-08-01 | 24.5007 | 22395.0 | 1725.0 |
21 | 2020-09-01 | 20.9002 | 22372.0 | 1802.0 |
22 | 2020-10-01 | 16.5001 | 22959.0 | 1841.0 |
23 | 2020-11-01 | 12.2 | 22500.0 | 1788.0 |
24 | 2020-12-01 | 10.7 | 22405.0 | 1537.0 |