TSAP : TimeSeries Analysis with Python

( 4 ) Trended & Seasonality

import numpy as np         # version : 1.14.0
import pandas as pd        # version : 0.22.0
import statsmodels         # version : 0.8.0

%matplotlib inline
from matplotlib.pylab import plt

print(statsmodels.__version__)
print(np.__version__)
print(pd.__version__)

Let’s start with some informal exploration

air_passengers = pd.read_csv("./data/AirPassengers.csv", header = 0, parse_dates = [0], names = ['Month', 'Passengers'], index_col = 0)

air_passengers.head()

	Passengers
Month
1949-01-01	112
1949-02-01	118
1949-03-01	132
1949-04-01	129
1949-05-01	121

parse_dates : boolean / list of [ints|names] / list of lists / dict, default False

boolean. If True -> try parsing the index.

list of ints or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.

list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.

dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’

如果列或索引包含不可解析的日期，则整个列或索引将作为对象数据类型以不变的方式返回。

Note: A fast-path exists for iso8601-formatted dates.

Visualization

air_passengers.plot(grid=True, figsize=(10, 5))

在这里插入图片描述

# AirPass 每一年的变化曲线,随着时间的推移,方差是越来越大了
plt.figure(figsize=(10,5))
plt.grid(True)
for year in [str(x) for x in range(1949, 1961)]:
    plt.plot(range(1,13), air_passengers[year], label=year)
plt.legend()

在这里插入图片描述

Getting a little more formal time series analysis

mean
variance
autocovariance

plot the moving average


air_passengers.rolling(window = 12).mean().plot(grid=True, figsize=(10,5),title='12month moving average')

在这里插入图片描述

var( ) & Time

air_passengers.resample('1Y').var().plot(figsize=(10,5), grid=True)

年方差
在这里插入图片描述

Autocorrelation

# autocorrelation
from statsmodels.tsa.stattools import acf, pacf
# autocorrelation
ac = acf(air_passengers)
plt.figure(figsize=(10, 5))
plt.title('ACF')
plt.grid(True, axis='x', xdata=np.arange(0,41))
plt.xticks(range(0,41,12))
plt.plot(ac)

在这里插入图片描述

And now let’s make it formal

#   Augmented Dickey-Fuller test
from statsmodels.tsa.stattools import adfuller
adfuller(air_passengers.Passengers, autolag = 'AIC', regression = 'ct')

(-2.100781813844671,
 0.545658934312454,
 13,
 130,
 {'1%': -4.030152423759672,
  '10%': -3.1471816659080565,
  '5%': -3.444817634956759},
 993.2814778200581)

What do these numbers mean?

adf（float） - 测试统计
pvalue（float） - MacKinnon基于MacKinnon的近似p值（1994,2010）
usedlag（int） - 使用的滞后数
nobs（int） - 用于ADF回归和计算临界值的观察数
critical values（dict） - 1％，5％和10％水平的检验统计量的临界值。基于MacKinnon（2010）
icbest（float） - 如果autolag不是None，则为最大化信息标准。
resstore（ResultStore，可选） - 一个虚拟类，其结果作为属性附加

data transformation

# power or log transformation (对数转换)
log_passengers = air_passengers.Passengers.apply(lambda x: np.log(x))
log_passengers.plot(figsize=(10, 5), grid=True, title='Log(Passengers)')

在这里插入图片描述

# power transformation(幂转换)
rt_passengers = air_passengers.Passengers.apply(lambda x: x**.5)
rt_passengers.plot(figsize=(10, 5), grid=True, title='sqrt(passengers)')

在这里插入图片描述

calculate a rolling mean(移动平均值)

# window size = 12
air_passengers.rolling(window = 12).mean().plot(figsize=(10,5),grid=True)

在这里插入图片描述

Detrended

原时间序列 - rolling_mean

rolling_mean = air_passengers.rolling(window = 12).mean()
passengers_detrended = air_passengers - rolling_mean
passengers_detrended.plot(figsize=(10, 5), grid=True)

在这里插入图片描述

detrended: log(Passenger) - rolling_mean(log(Passenger))

log_rolling_mean = log_passengers.rolling(window = 12).mean()
log_detrended = log_passengers - log_rolling_mean
log_detrended.plot(figsize=(10, 5), grid=True)

在这里插入图片描述

# 消除趋势后的,周期变化趋势
log_detrended.rolling(window=5).mean().plot(figsize=(10, 5), grid=True)

在这里插入图片描述

rolling median

log_detrended.rolling(12).median().plot(figsize=(10,5), grid=True)

在这里插入图片描述

log(original time series - rolling_mean)

rolling_mean = air_passengers.rolling(window = 12).mean()
passengers_detrended = air_passengers - rolling_mean
log_detrended2 = passengers_detrended.Passengers.apply(lambda x: np.log(x))
log_detrended2.plot(figsize=(10, 5), grid=True)

在这里插入图片描述

Why didn’t that work?

缺失部分的值 <= 0

detrended (original ts - regression ts)

# Now let's use a regression rather than a rolling mean to detrend
# 用回归取代移动平均(消除趋势)
from statsmodels.regression.linear_model import OLS
model = OLS(air_passengers.Passengers.values, list(range(len(air_passengers.values))))
result = model.fit()
result.params
regression_fit = pd.Series(result.predict(list(range(len(air_passengers.values)))), index = air_passengers.index)

passengers_detrended = air_passengers.Passengers - regression_fit
passengers_detrended.plot(figsize=(10, 5), grid=True)

在这里插入图片描述

经过对数转换后的数据,在经过detrended处理,可以基本消除了原始时序数据的趋势,反映出原始数据的周期性特征.

Seasonality

Differencing(差分)

original timeseries differencing
log(original timeseries) differencing

# original timeseries differencing
(air_passengers.Passengers - air_passengers.Passengers.shift()).plot(figsize=(10,5),grid=True)

在这里插入图片描述

# log(original timeseries) differencing
log_passengers_diff = log_passengers - log_passengers.shift()
log_passengers_diff.plot(figsize=(10, 5), grid=True)

在这里插入图片描述

Seasonal Decompose

decomposition = seasonal_decompose(log_passengers, model = 'multiplicative')
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid

# log(Original timeseries), Trend
plt.figure(figsize=(10,5))
plt.plot(log_passengers, label='Original')
plt.plot(trend, label='Trend')
plt.ylabel('log(Passengers)')
plt.grid(True), plt.legend(loc = 'best')

在这里插入图片描述

# Seasonality & Residuals
plt.figure(figsize=(10, 5))
plt.plot(seasonal,label='Seasonality')
plt.plot(residual, label='Residuals')
plt.grid(True), plt.legend(loc='best')

在这里插入图片描述

TSAP(6) : Trended & Seasonality

TSAP : TimeSeries Analysis with Python

Let’s start with some informal exploration

Visualization

Getting a little more formal time series analysis

plot the moving average

var( ) & Time

Autocorrelation

And now let’s make it formal

What do these numbers mean?

data transformation

calculate a rolling mean(移动平均值)

Detrended

原时间序列 - rolling_mean

detrended: log(Passenger) - rolling_mean(log(Passenger))

rolling median

log(original time series - rolling_mean)

Why didn’t that work?

detrended (original ts - regression ts)

Seasonality

Differencing(差分)

Seasonal Decompose

猜你喜欢