【Paper】Transformers in Times Series: A Suervey

Abstract

In this paper, we sysytematically review Transformer schemes for time series modeling by hignlighting their strengths as well as limitations.
in particular, we examine the development of time series Transformers in two perspectives.

network structure. we sumarize the adaptation and modifacations that have been made to Transformers in order to accommodate the challenges. (我们总结了为了适应时间序列任务的各种挑战，而对 Transformers 所作的修改和调整)
applications: we categorize time series Transformer based on common tasks includeing foecasting, anomaly detection, and classfication.

Github: time-series-transformers-review

Introduction

Seasonality or periodicity is an effectively model long-range and short-range temporal dependency and capture seasonality simultaneously remains a chanllenge.

Preliminaries of the Transformer （Transforms 预备知识）

Vanilla Transformer

Vaila Transformer 解析

Input Encoding and Positional Encoding

Unlike LSTM or RNN, the vanilla Transformer has no recurrence. Instead, it utilizes the positional encoding added in the input embeddings, to model the sequence information. We summarize some positional encodings below:

Absuolute Positional Encoding

In vanilla Transformer, for each position index $t$ , encoding vector is given by
在这里插入图片描述
where $\omega _i$ is the had-crafted frequency for each dimension. Another way is to learn a set of positional embeddings for eachposition which is more flexibel.

Relative Positions Encoding

Multi-head Attention

Taxonomy of Transformers in Time Series

Network Modifications for Time Series

Positional Encoding

As the ordering of time series matters,(由于时间序列的顺序很重要)， It is of great importance to encode the positions of input time series into Transformers. A common design is to first encode positional information as vector and then inject them into the model as an additional input together with the input time series. How to obtain these vectors when modeling time series with Transformers can be divided into thress main categories.

Vanilla positional Encoding: Unable to fully exploit the important features of time series data
Learnable Positional Encoding: As the vanilla positional encoding is hand-crafted and less expressive and adaptive ,several studies found that learning appropriate positional embeddings from time series data can be much more effective. Compared to fixed vanilla positional encoding, learned embeddings are more flexible and can adapt to specific tasks.
TimeStamp Encoding