[Paper Review] Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting (2021)

2023. 3. 27. 23:11ใ†๐Ÿง‘๐Ÿป‍๐Ÿซ Ideas/(Advanced) Time-Series

 

1. Introduction

์‹œ๊ณ„์—ด ๋ถ„์„, ํŠนํžˆ ์˜ˆ์ธก ๋ฌธ์ œ(Forecasting)๋Š” ์• ๋„ˆ์ง€ ์†Œ๋น„, ํŠธ๋ž˜ํ”ฝ, ๊ฒฝ์ œ์ง€ํ‘œ, ๋‚ ์”จ, ์งˆ๋ณ‘ ๋“ฑ ๋‹ค์–‘ํ•œ ๋„๋ฉ”์ธ์—์„œ ํ™œ์šฉ๋˜๊ณ  ์žˆ๋‹ค.

์‹ค์ƒํ™œ์˜ ์—ฌ๋Ÿฌ ์‘์šฉ๋ถ„์•ผ์— ์žˆ์–ด์„œ ๋•Œ๋กœ๋Š” ์‹œ๊ณ„์—ด ์˜ˆ์ธก์˜ ๋ฒ”์œ„๋ฅผ ๋” ํฌ๊ฒŒ, ๋ฉ€๋ฆฌ ํ™•๋Œ€ํ•  ํ•„์š”์„ฑ์ด ์žˆ๋Š”๋ฐ, ์ด๋Š” ๊ฒฐ๊ตญ ์žฅ๊ธฐ ์‹œ๊ณ„์—ด์„ ๋‹ค๋ฃจ๋Š” ๋ฌธ์ œ์™€ ์ง๊ฒฐ๋  ์ˆ˜ ๋ฐ–์— ์—†๋‹ค.

์ด๋Ÿฌํ•œ ์ƒํ™ฉ์—์„œ "ํŠธ๋žœ์Šคํฌ๋จธ"๋Š” long-range dependence"๋ฌธ์ œ, ์ฆ‰, ์žฅ๊ธฐ ์˜์กด์„ฑ ๋ฌธ์ œ๋ฅผ self-attention ๋งค์ปค๋‹ˆ์ฆ˜์„ ํ†ตํ•ด ํ•ด๊ฒฐํ•˜์—ฌ ์ด๋Ÿฌํ•œ ์š”๊ตฌ๋ฅผ ์ถฉ์กฑํ•˜์˜€๊ณ , ์‹ค์ œ๋กœ ๋งŽ์€ ํŠธ๋žœ์Šคํฌ๋จธ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ๋“ค์ด ์—ฌ๋Ÿฌ ์—ฐ๊ตฌ์—์„œ ํฐ ์ง„์ „์„ ์ด๋ฃจ์–ด๋ƒˆ๋‹ค.

 

๊ทธ๋Ÿฐ๋ฐ, ๊ทธ๋Ÿฌํ•œ ์—ฐ๊ตฌ์„ฑ๊ณผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  long-term ๊ธฐ๋ฐ˜์˜ ์˜ˆ์ธก ๋ฌธ์ œ๋Š” ์—ฌ์ „ํžˆ ๋งค์šฐ ์–ด๋ ค์šด ์ผ๋กœ ๋‚จ์•„์žˆ๋‹ค.

 

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ทธ ์ด์œ ๋กœ 2๊ฐ€์ง€๋ฅผ ์ œ์‹œํ•œ๋‹ค.

1. ์žฅ๊ธฐ ์‹œ๊ณ„์—ด์˜ dependencies๋Š” ๋งค์šฐ "๋ณต์žกํ•œ ๋ณ€๋™๋“ค"์— ์˜ํ•ด ๊ฐ€๋ ค์ ธ ์žˆ๊ธฐ์— ๊ทธ Temporal Dependency(์‹œ๊ฐ„ ์˜์กด์„ฑ)์„  ํšจ๊ณผ์ ์œผ๋กœ ํŒŒ์•…ํ•˜๊ธฐ ์–ด๋ ต๋‹ค.

2. ๊ธฐ๋ณธ ํŠธ๋žœ์Šคํฌ๋จธ์˜ self-attention ๊ธฐ๋ฐ˜ ๋ชจ๋ธ๋“ค์€ ๊ทธ ๊ณ„์‚ฐ ๋ณต์žก๋„(quadratic complexity)์— ์˜ํ•ด ์žฅ๊ธฐ ์‹œ๊ณ„์—ด์˜ ๊ณ„์‚ฐ์—์„œ ํฐ ํ•œ๊ณ„์ ์„ ๊ฐ€์ง„๋‹ค.

 

๊ทธ๋Ÿฐ๋ฐ 2๋ฒˆ ์›์ธ ๊ฐ™์€ ๊ฒฝ์šฐ, ๊ณ„์‚ฐ ๋ณต์žก๋„๋ฅผ ์™„ํ™”ํ•˜๋Š” ์—ฌ๋Ÿฌ ์—ฐ๊ตฌ๋“ค๊ณผ ๊ทธ ๋ณ€ํ˜•๋ชจ๋ธ๋“ค๋กœ ์ธํ•ด ์ƒ๋‹น๋ถ€๋ถ„ ์ง„์ „์ด ์žˆ์—ˆ๋‹ค.

ํ•˜์ง€๋งŒ, ๊ทธ๋Ÿฌํ•œ ๋ณ€ํ˜•๋ชจ๋ธ๋“ค์€ ๋Œ€๋ถ€๋ถ„ "Sparse"ํ•œ bias๋ฅผ ํ†ตํ•ด attention์˜ ํšจ์œจ์„ฑ๋งŒ์„ ๋†’์ด๋Š” ์ผ์— ์น˜์ค‘๋˜์–ด ์žˆ์—ˆ๋‹ค.

(๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ทธ๋Ÿฌํ•œ ๋ชจ๋ธ๋“ค์„ point-wise representation aggregation์ด๋ผ๊ณ  ํ•œ๋‹ค.)

๊ทธ๋“ค์˜ ํ•œ๊ณ„์ ์€ ๊ณ„์‚ฐ ํšจ์œจ์„ฑ๋งŒ ๋‹ฌ์„ฑํ•  ๋ฟ, spars-point-wise connection์œผ๋กœ ์ธํ•ด ์‹œ๊ณ„์—ด์˜ ์ •๋ณด๋ฅผ ์žƒ๊ฒŒ๋˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค.

 

๋”ฐ๋ผ์„œ ๋ณธ ๋…ผ๋ฌธ์€ "Temporal Dependency"์„ ํšจ๊ณผ์ ์œผ๋กœ ํฌ์ฐฉํ•˜๋ฉด์„œ "๊ณ„์‚ฐ ํšจ์œจ์„ฑ"๊นŒ์ง€ ๋™์‹œ์— ์ด๋ฃจ์–ด ๋‚ด๋Š” ๋ชจ๋ธ์„ ์—ฐ๊ตฌํ•˜์˜€๋‹ค.

 

๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์‹œํ•˜๋Š” ๋ชจ๋ธ์ธ "Autoformer"๋Š” ๋จผ์ € ์‹œ๊ณ„์—ด ์ •๋ณด๋ฅผ ์ถฉ๋ถ„ํžˆ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด ์‹œ๊ณ„์—ด ๋ถ„์„์˜ ์ „ํ†ต์  ๋ฐฉ๋ฒ•์ธ "Decompose"(์š”์†Œ๋ถ„ํ•ด)์˜ ์•„์ด๋””์–ด๋ฅผ ํ™œ์šฉํ•œ๋‹ค.

์—ฌ๊ธฐ์„œ decomposition์€ ๋‹จ์ˆœํžˆ ์ „์ฒ˜๋ฆฌ ๊ณผ์ •์—๋งŒ ์“ฐ์ด๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ์ตœ์ข…์ ์ธ ์˜ˆ์ธก์— ์žˆ์–ด์„œ๋„ ๊ทธ ํšจ๊ณผ๊ฐ€ ์ถฉ๋ถ„ํžˆ ๋ฐœํ˜„๋  ์ˆ˜ ์žˆ๋„๋ก  Architecture์— ์ด๋ฅผ ๊นŠ์ด ๋ฐ˜์˜ํ•˜์˜€๋‹ค.

๋˜ํ•œ, ์ด ๋ชจ๋ธ์€ self-attention์— ์žˆ์–ด์„œ๋„ point-wiseํ•œ ๋ฐฉ๋ฒ•์ด ์•„๋‹Œ "์œ ์‚ฌํ•œ ์ฃผ๊ธฐ"๋ฅผ ๊ฐ€์ง€๋Š” "sub-series"๋ฅผ ํ™œ์šฉํ•˜๋Š” "series-wise"ํ•œ ๋ฐฉ๋ฒ•์„ ํฌํ•จํ•œ๋‹ค. ๊ทธ๊ฒƒ์ด ๋ฐ”๋กœ "Auto-Correlation" ๋งค์ปค๋‹ˆ์ฆ˜์œผ๋กœ, ์œ ์‚ฌํ•œ ์ฃผ๊ธฐ๋ฅผ ๊ฐ€์ง€๋Š” sub-series๋ฅผ ์ž๊ธฐ์ƒ๊ด€์„ฑ์„ ํ†ตํ•ด ํฌ์ฐฉํ•˜์—ฌ ํ†ตํ•ฉ(aggregate)ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. 

๊ฒฐ๊ณผ์ ์œผ๋กœ ์ด ๋…ผ๋ฌธ์˜ ์ €์ž๋Š” ์ด๋Ÿฌํ•œ Architecture์™€ series-wise mechanism์ด ๊ทธ ๋ณต์žก๋„์™€ ์ •๋ณดํ™œ์šฉ ์ธก๋ฉด์—์„œ ๋” ์ข‹์€ ๊ตฌ์กฐ๋ฅผ ์ง€๋…”๋‹ค๊ณ  ๋งํ•˜๋ฉฐ, "SOTA"์˜ ์ •ํ™•๋„ ๋“ฑ ์‹คํ—˜์„ ํ†ตํ•ด ์ด๋ฅผ ์ž…์ฆํ•˜์˜€๋‹ค.

 

*ํŠธ๋žœ์Šคํฌ๋จธ ์ฐธ์กฐ

https://seollane22.tistory.com/20?category=1012181 

 

Attention Is All You Need(2017) #Transformer

"Attention is all you need", ์ด ๋…ผ๋ฌธ์€ ๊ธฐ์กด seq to seq ๋ชจ๋ธ์˜ ํ•œ๊ณ„์ ์„ ๋ณด์™„ํ•˜๊ณ  ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ๊ฐœ์„ ํ•œ Transformer ๋ชจ๋ธ์˜ ๋“ฑ์žฅ์„ ์•Œ๋ฆฐ ๊ธฐ๋…๋น„์ ์ธ ๋…ผ๋ฌธ์ด๋‹ค. ํ˜„์žฌ NLP์™€ ๊ฐ™์ด seq to seq ํ˜•ํƒœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌ

seollane22.tistory.com

 

 

 

 

2. Related Work

 

 

2-1)  Models for Time Series Forecasting

 

์ด ๋‹จ๋ฝ์—์„œ๋Š” ๋จผ์ € TS Forecasting์— ์ ์šฉ๋˜์–ด ์˜จ ๊ธฐ์กด์˜ ๋ชจ๋ธ๋“ค์„ ๊ฐ„๋žตํ•˜๊ฒŒ ์„ค๋ช…ํ•˜๊ณ  ์žˆ๋‹ค.

 

1. ARIMA: ๊ณ ์ „์ ์ธ ํ†ต๊ณ„๋ชจ๋ธ๋กœ, ๋น„์ •์ƒ์‹œ๊ณ„์—ด์„ "์ฐจ๋ถ„"ํ•˜์—ฌ ์ •์ƒ์‹œ๊ณ„์—ด๋กœ ๋งŒ๋“ค์–ด ๋ชจ๋ธ๋งํ•œ๋‹ค.

2. RNNs: ๊ณ ์ „์ ์ธ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ๋กœ, ์ˆœ์ฐจ์ ์œผ๋กœ ์ธํ’‹์„ ํˆฌ์ž…ํ•˜์—ฌ ์ด์ „ ์‹œ์ ์˜ ์ •๋ณด๋ฅผ ๋‹ค์Œ ์‹œ์ ์— ์ˆœ์ฐจ์ ์œผ๋กœ ๋ฐ˜์˜ํ•œ๋‹ค.

3. DeepAR: RNNs์— Auto-correlation์„ ๊ฒฐํ•ฉํ•˜์˜€๋‹ค. 

4. LSTNet: reccurent-skip connections: CNN์„ ๊ฒฐํ•ฉํ•˜์˜€๋‹ค.

5. Attention-based RNNs: RNN base์— ์žฅ๊ธฐ ์ข…์†์„ฑ์„ ํƒ์ง€ํ•˜๊ธฐ ์œ„ํ•ด temporal attention์„ ๋„์ž…ํ•˜์˜€๋‹ค.

6. TCN: causal convolution์œผ๋กœ ์‹œ๊ฐ„์  ์ธ๊ณผ์„ฑ์„ ๋ชจ๋ธ๋งํ•œ๋‹ค.

 

7. Transformer based models :

 

ํŠธ๋žœ์Šคํฌ๋จธ ๊ธฐ๋ฐ˜์˜ self-attention ๋งค์ปค๋‹ˆ์ฆ˜์€ sequential task์— ์ข‹์€ ๋Šฅ๋ ฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜, ์žฅ๊ธฐ ์‹œ๊ณ„์—ด์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฌธ์ œ์— ์žˆ์–ด์„œ๋Š” ๊ทธ ๋ณต์žก๋„๊ฐ€ ์ธํ’‹ ๊ธธ์ด(์‹œ๊ณ„์—ด ํฌ๊ธฐ)์˜ ์ œ๊ณฑ์ด๋ผ๋Š” quadratic complexity๋ฅผ ๋ณด์ธ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฉ”๋ชจ๋ฆฌ, ์‹œ๊ฐ„์ ์ธ ๋น„ํšจ์œจ์„ฑ์€ ์‹œ๊ณ„์—ด ํŠธ๋žœ์Šคํฌ๋จธ ์—ฐ๊ตฌ์ž๋“ค์˜ ์ฃผ๋œ ๊ด€์‹ฌ์‚ฌ์˜€๊ณ  ๋งŽ์€ ์—ฐ๊ตฌ์—์„œ ์ด๋ฅผ ๊ฐœ์„ ํ•œ ๋ชจ๋ธ์„ ์ œ์•ˆํ•˜๊ธฐ๋„ ํ•˜์˜€๋‹ค.

- Logformer, ์ง€์ˆ˜์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜๋Š” interval์„ ๋‘๊ณ  time step์„ ์„ค์ •ํ•˜์—ฌ attention์„ ์ˆ˜ํ–‰ํ•˜๋Š” LogSparse attention์„ ์ œ์•ˆํ•œ๋‹ค.

- Reformer, local-sensitive hashing (LSH) attention์„ ์ทจํ•˜์—ฌ ๊ณ„์‚ฐ ๋ณต์žก๋„๋ฅผ ์ค„์˜€๋‹ค.

- Informer, time step ๊ฐ„์˜ ์ค‘์š”๋„๋ฅผ ์‚ฐ์ถœํ•˜์—ฌ ๊ทธ ์ค‘์š”๋„๊ฐ€ ๋†’์€ ๊ฒƒ์— attention์„ ์ˆ˜ํ–‰ํ•˜๋Š” ProbSparse attention์„ ์ œ์•ˆํ•œ๋‹ค.

์—ฌ๊ธฐ์„œ ์ฃผ๋ชฉํ•ด์•ผ ํ•  ๊ฒƒ์€ ์ด๋“ค์€ ๋ชจ๋‘ ๊ธฐ๋ณธ ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ์„ ์ด์šฉํ•˜๊ณ  ์žˆ์œผ๋ฉฐ(Informer๋Š” ์ œ์™ธ), point-wiseํ•œ ๊ธฐ๋ฒ•์ด๋ผ๋Š” ๊ฒƒ์ด๋‹ค. ์•ž์„œ ์„ค๋ช…ํ–ˆ๋“ฏ์ด, ์ด๋Ÿฌํ•œ ๊ธฐ๋ฒ•๋“ค์€ ์‹œ๊ณ„์—ด์˜ ๋ณต์žกํ•œ ๋ณ€๋™๋“ค์„ ์ถ”๋ ค๋‚ด์ง€ ๋ชปํ•˜๊ธฐ ๋•Œ๋ฌธ์— depedency๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํŒŒ์•…ํ•˜๋Š” ๋ฐ ์–ด๋ ค์›€์ด ์žˆ๋‹ค.

์ด์— Autoformer์—์„œ๋Š” ์žฅ๊ธฐ ์‹œ๊ณ„์—ด์˜ dependency๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํŒŒ์•…ํ•˜๊ณ ์ž point-wise๊ฐ€ ์•„๋‹Œ ๊ฐ™์€ periodicity๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ series-wise ๊ธฐ๋ฒ•์„ ์ ์šฉํ–ˆ๋‹ค. (Auto-Correlation ๋งค์ปค๋‹ˆ์ฆ˜)

 

2-2) Decomposition of Time Series

 

์‹œ๊ณ„์—ด ๋ถ„์„์˜ standardํ•œ ๋ถ„์„๋ฐฉ๋ฒ•์ธ Decomposition(์š”์†Œ๋ถ„ํ•ด)์€ ์‹œ๊ณ„์—ด์˜ ๋ณ€๋™์„ ์—ฌ๋Ÿฌ ์š”์†Œ๋กœ ๋ถ„ํ•ดํ•˜๋Š” ๋ถ„์„๋ฐฉ๋ฒ•์„ ๋งํ•œ๋‹ค.

๊ทธ ์š”์†Œ๋“ค์€ ํฌ๊ฒŒ Trend(์ถ”์„ธ), Seasonality(๊ณ„์ ˆ), Cycle(์ˆœํ™˜), Random(๋ฌด์ž‘์œ„) ๋ณ€๋™์ด ์žˆ๋Š”๋ฐ, ์ด ๋ณ€๋™๋“ค์€ ์‹œ๊ณ„์—ด์ด ํ˜•์„ฑ๋˜์–ด ์˜จ ๊ณผ์ •์„ ๋” ์ž˜ ๋ณด์—ฌ์ฃผ๋ฏ€๋กœ ์˜ˆ์ธก์— ์žˆ์–ด์„œ๋„ ํฐ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ๋‹ค.

 

๊ทธ๋Ÿฐ๋ฐ ์ด๋Ÿฌํ•œ ์š”์†Œ๋ถ„ํ•ด๋Š” ๊ณผ๊ฑฐ ์ „์ฒ˜๋ฆฌ ๊ณผ์ •์—์„œ๋งŒ ์ด์šฉ๋˜์–ด์™”๋‹ค. ์ด๋Ÿฌํ•œ ์ œํ•œ์ ์ธ ์ด์šฉ์€ ๋จผ ์‹œ์ ์˜ ๋ฏธ๋ž˜๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐ ์žˆ์–ด์„œ ๊ทธ ๋ณ€๋™ ๊ฐ„์˜ "๊ณ„์ธต์ ์ธ ์ƒํ˜ธ์ž‘์šฉ"์„ ๊ฐ„๊ณผํ•˜๊ฒŒ ๋œ๋‹ค. (์ด๋Š” ์žฅ๊ธฐ ์‹œ๊ณ„์—ด ์˜ˆ์ธก์„ ์–ด๋ ต๊ฒŒ ํ•œ๋‹ค.)

 

๋”ฐ๋ผ์„œ Autoformer์—์„œ๋Š” ์š”์†Œ๋ถ„ํ•ด์˜ ํšจ๊ณผ๋ฅผ ์ถฉ๋ถ„ํžˆ ์ด์šฉํ•˜๊ณ ์ž ์ด๊ฒƒ์ด ๋ชจ๋ธ์˜ ๋‚ด๋ถ€์—์„œ ๊ธฐ๋Šฅํ•˜๋„๋ก ์—ฌ๋Ÿฌ ๋ธ”๋ก์„ ๋ฐฐ์น˜ํ•˜์˜€๊ณ , ์ด๋ฅผ ํ†ตํ•ด ๋‚ด๋ถ€์—์„œ hidden series๋ฅผ ์ ์ง„์ ์œผ๋กœ ๋ถ„ํ•ดํ•˜๋„๋ก ์„ค๊ณ„ํ•˜์˜€๋‹ค.

์ด๋Š” ์ „์ฒ˜๋ฆฌ ๋‹จ๊ณ„์—์„œ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๊ฒฐ๊ณผ๋ฅผ ์˜ˆ์ธกํ•ด๋‚˜๊ฐ€๋Š” ์ „์ฒด ๊ณผ์ •์—์„œ ์š”์†Œ๋ถ„ํ•ด๊ฐ€ ํšจ๊ณผ์ ์œผ๋กœ ์ด์šฉ๋˜๋„๋ก ํ•œ ๊ฒƒ์ด๋‹ค.

 

 

 

3. Autoformer

 

 

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด ๋‹จ๋ฝ๋ถ€ํ„ฐ ๋ณธ๊ฒฉ์ ์œผ๋กœ Autoformer์˜ ๋งค์ปค๋‹ˆ์ฆ˜๊ณผ ๊ทธ ๊ตฌ์กฐ๋ฅผ ํ•˜๋‚˜ํ•˜๋‚˜ ์ž์„ธํžˆ ์„ค๋ช…ํ•˜๊ณ  ์žˆ๋‹ค.

 

Time-series forecasting์€ I ๋งŒํผ์˜ ์‹œ๋ฆฌ์ฆˆ๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, O ๋งŒํผ์˜ ๋ฏธ๋ž˜ ์‹œ๋ฆฌ์ฆˆ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฌธ์ œ๋กœ ์ •์˜ํ•  ์ˆ˜ ์žˆ๋‹ค. (input(I)-predict(O))

๊ทธ๋Ÿฐ๋ฐ, ์—ฌ๊ธฐ์„œ ์žฅ๊ธฐ ์‹œ๊ณ„์—ด ์˜ˆ์ธก ๋ฌธ์ œ๋Š” O๊ฐ€ ํฐ ๊ฒƒ์„ ๋งํ•˜๋Š”๋ฐ, ์•ž์„œ ์–ธ๊ธ‰ํ–ˆ๋“ฏ์ด ์ด๊ฒƒ์—๋Š” ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€ ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค.

 

1. ๋ณต์žกํ•œ ์‹œ๊ฐ„์  ํŒจํ„ด์„ ์ฒ˜๋ฆฌํ•˜๊ธฐ ํž˜๋“ค๋‹ค.
2. ๊ณ„์‚ฐ์˜ ๋น„ํšจ์œจ์„ฑ์„ ๊ฐ€์ง€๊ณ  ์‹œ๊ณ„์—ด ์ •๋ณด๋ฅผ ์ถฉ๋ถ„ํžˆ ํ™œ์šฉํ•˜์ง€ ๋ชปํ•œ๋‹ค.

 

๋”ฐ๋ผ์„œ ์ด๋ฅผ ๊ฐœ์„ ํ•˜๋Š” Autoformer์˜ ์ฐจ๋ณ„์ ์ธ ํŠน์ง•์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

Autoformer์˜ ์ฐจ๋ณ„์ ์ธ ์š”์†Œ

1. Decomposition์„ ์ „์ฒ˜๋ฆฌ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ Forecasting ๊ณผ์ • ๊ณณ๊ณณ์— ๋ฐฐ์น˜ํ•œ Architeture๋ฅผ ๋””์ž์ธํ•˜์˜€๋‹ค. 

2.Sparseํ•œ time-step์„ ํ†ตํ•ฉ(aggregate)ํ•˜๋Š” point-wiseํ•œ ๊ธฐ๋ฒ•์ด ์•„๋‹Œ, Auto-correlation์„ ํ†ตํ•ด ๊ฐ™์€ ์ฃผ๊ธฐ๋ฅผ ๊ฐ€์ง€๋Š” time-step์„ ํ†ตํ•ฉํ•˜๋Š” series-wise ๊ธฐ๋ฒ•์„ ์ ์šฉํ•œ๋‹ค.
(๋งค์ปค๋‹ˆ์ฆ˜์˜ ๋ณ€ํ™”, Attention -> Auto-correlation)

 

3-1) Decomposition Architecture

 

Decomposition์€ ์‹œ๊ณ„์—ด์„ ์ถ”์„ธ/์ฃผ๊ธฐ ๋ณ€๋™๊ณผ ๊ณ„์ ˆ ๋ณ€๋™์œผ๋กœ ๋ถ„๋ฆฌํ•˜๋Š”๋ฐ, ๊ฐ๊ฐ์€ ์‹œ๊ณ„์—ด์˜ ์žฅ๊ธฐ์ ์ธ ์ถ”์„ธ ๋ณ€๋™๊ณผ ์ƒ๋Œ€์ ์œผ๋กœ ๋‹จ๊ธฐ์— ๊ณ ์ •๋œ ์ฃผ๊ธฐ(๋ณดํ†ต 1๋…„)๋กœ ์›€์ง์ด๋Š” ๊ณ„์ ˆ์ ์ธ ๋ณ€๋™์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.

Autoformer๋Š” ์ธํ’‹์„ ๊ตฌ์„ฑํ•˜๋Š” ์ „์ฒ˜๋ฆฌ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋‚ด๋ถ€์— series - decomposition ๋ธ”๋ก์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค.

๋ฐ‘์˜ ๊ทธ๋ฆผ์—์„œ ํ•˜๋Š˜์ƒ‰ ๋ธ”๋ก์ด ์ด๋ฅผ ๋‚˜ํƒ€๋‚ด๊ณ  ์žˆ๋Š”๋ฐ, ์ด๋Š” ๋ชจ๋ธ ๋‚ด์—์„œ hidden series๋ฅผ ์ ์ง„์ ์œผ๋กœ ๋ถ„ํ•ดํ•˜๋Š” ์—ญํ• ์„ ํ•œ๋‹ค.

์ด๋Š” ์ธ์ฝ”๋”์™€ ๋””์ฝ”๋” ๋‚ด๋ถ€์˜ ๊ฒฐ๊ณผ๊ฐ’๋“ค์„ ๋ถ„ํ•ดํ•จ์œผ๋กœ์จ ๊ฒฐ๊ณผ์ ์œผ๋กœ "the long-term stationary trend"(์žฅ๊ธฐ์ ์œผ๋กœ ๋ฐ˜๋ณต๋˜๋Š” ์ถ”์„ธ)๋ฅผ ๋” ์ž์„ธํžˆ ์ถ”์ถœํ•˜๋Š” ๊ธฐ๋Šฅ์„ ํ•œ๋‹ค. (์ž์„ธํ•œ ๊ณผ์ •์€ ์•„๋ž˜์—์„œ ์„ค๋ช…)

 

Autoformer Architecture

 

๊ตฌ์ฒด์ ์œผ๋กœ Decomposition ๊ณผ์ •์„ ์‚ดํŽด๋ณด๋ ค๋ฉด ์ด ๋ชจ๋ธ์— ๋“ค์–ด๊ฐ€๋Š” input์„ ๋จผ์ € ์‚ดํŽด๋ด์•ผ ํ•œ๋‹ค.

 

Model inputs

 

Autoformer์˜ ์ธ์ฝ”๋”, ๋””์ฝ”๋”์— ๋“ค์–ด๊ฐ€๋Š” ์ธํ’‹๋“ค์€ ์•„๋ž˜์˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด ํ˜•์„ฑ๋œ๋‹ค.

 

* seq_len(input series) : ์ธํ’‹์˜ ์ „์ฒด ํฌ๊ธฐ (๋…ผ๋ฌธ์—์„œ I)
* label_len(start token series) : seq_len์—์„œ ๋ ˆ์ด๋ธ”์˜ ํฌ๊ธฐ (seq_len์˜ ์ ˆ๋ฐ˜)
* pred_len(padding) : ์˜ˆ์ธกํ•˜๋Š” ์‹œ์ ์˜ ํฌ๊ธฐ(๋…ผ๋ฌธ์—์„œ O)

 

์—ฌ๊ธฐ์„œ ์ธ์ฝ”๋”์˜ ์ธํ’‹์œผ๋กœ๋Š” seq_len์ด ๋“ค์–ด๊ฐ€๋Š”๋ฐ, ๊ทธ ์ธํ’‹ ์‹œ๋ฆฌ์ฆˆ๋ฅผ ์ ˆ๋ฐ˜์œผ๋กœ ๋‚˜๋ˆˆ ๋’ค ๋” ์ตœ๊ทผ ์‹œ์ ์˜ ์ ˆ๋ฐ˜(1/2)์„ label๋กœ ํ•™์Šตํ•œ๋‹ค.

ํ•œํŽธ ๋””์ฝ”๋”์˜ ์ธํ’‹์€ ์•ž์—์„œ ์ž๋ฅธ label_len๋ฅผ ์š”์†Œ๋ถ„ํ•ด(Series Decompose)ํ•˜์—ฌ ํ˜•์„ฑ๋˜๋Š”๋ฐ, ์—ฌ๊ธฐ์„œ ์ฃผ๋ชฉํ•ด์•ผ ํ•  ๊ฒƒ์€ ๋ฐ”๋กœ Series Decompose์™€ padding์ด๋‹ค.

๋””์ฝ”๋”๋Š” lable_len์„ Series Decomposeํ•œ Trend์™€ Seasonality๋ฅผ ์ธํ’‹์œผ๋กœ ๋ฐ›๋Š”๋‹ค. ๊ทธ๋Ÿฐ๋ฐ, lable_len์€ seq_len์˜ ์ ˆ๋ฐ˜์ด๊ธฐ ๋•Œ๋ฌธ์—, ๊ทธ ๊ธธ์ด๋ฅผ ๋งž์ถ”๊ธฐ ์œ„ํ•˜์—ฌ padding์„ ์ ์šฉํ•œ๋‹ค.

์ด๋•Œ Trend๋Š” ์ธ์ฝ”๋” ์ธํ’‹์˜ Mean(ํ‰๊ท )๊ฐ’์„ ํŒจ๋”ฉํ•˜๊ณ , ์ธ์ฝ”๋”์˜ ์ •๋ณด์™€ ๊ฒฐํ•ฉํ•˜๋Š” Seasonality๋Š” ์˜ˆ์ธก์„ ์œ„ํ•ด 0์œผ๋กœ ํŒจ๋”ฉํ•œ๋‹ค.

(์œ„ ๊ทธ๋ฆผ ์ฐธ์กฐ)

 

๊ตฌ์ฒด์ ์œผ๋กœ Decomposition ๊ณผ์ •์„ ์‚ดํŽด๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

$$X_{t}  = \mathbf{AvgPool}(\mathbf{Padding(X)})$$

$$X_{s} = X - X_{t}$$

$$X_{t}:์ถ”์„ธ๋ณ€๋™$$ $$X_{s}:๊ณ„์ ˆ๋ณ€๋™$$

์œ„ ์‹์€ ์ด ๋ชจ๋ธ์˜ ํ•ต์‹ฌ์š”์†Œ์ธ Decomposing์˜ ๊ณผ์ •์ด๋‹ˆ ์ž˜ ๊ธฐ์–ตํ•ด๋‘˜ ํ•„์š”๊ฐ€ ์žˆ๋‹ค.

๋จผ์ €, ์ด ๋ชจ๋ธ์—์„œ๋Š” ์ฃผ๊ธฐ์ ์ธ ๋ณ€๋™์„ ํ‰ํ™œํ™”ํ•˜๊ณ  ์žฅ๊ธฐ์ ์ธ ์ถ”์„ธ๋ฅผ ๊ฐ•์กฐํ•˜๊ธฐ ์œ„ํ•ด MA๊ธฐ๋ฒ•์ธ AvgPooling์„ ์‚ฌ์šฉํ•˜๋ฉฐ, ์‹œ๊ณ„์—ด์˜ ๊ธธ์ด๊ฐ€ ๋ฐ”๋€Œ๋Š” ๊ฒƒ์„ ๋ง‰๊ธฐ ์œ„ํ•ด padding์„ ์ด์šฉํ•œ๋‹ค.

(์ดํ›„์—๋Š” ์œ„ ๊ณผ์ •์„ ์š”์•ฝํ•˜์—ฌ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ•จ์ˆ˜๋กœ ํ‘œ๊ธฐํ•œ๋‹ค.) $$SeriesDecomposition(X)$$

์œ„ ๊ณผ์ •์„ ๊ฑฐ์ณ, Autoformer๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋””์ฝ”๋”์˜ ์ธํ’‹์„ ํ˜•์„ฑํ•œ๋‹ค. 

$$X_{en_s}, X_{en_t}  = SeriesDecomposition(X_{en_\frac{1}{2}:i})$$

$$X_{de_s} = Concat(X_{en_s},X_{o})$$

$$X_{de_t} = Concat(X_{en_t},X_{mean})$$

 

์œ„ ์„ค๋ช…๊ณผ ์ˆ˜์‹์„ ์ดํ•ดํ•œ ๋’ค์— ์•„ํ‚คํ…Œ์ฒ˜๋ฅผ ๋ณด๋ฉด ์ด ๋ชจ๋ธ์˜ decomposition๊ณผ ๊ทธ ์ „์ฒด์ ์ธ ํ๋ฆ„์„ ์ž˜ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

Autoformer Architecture

 

Encoder

์ธ์ฝ”๋”๋Š” input series ์ „์ฒด๋ฅผ ๋ฐ›์•„ Auto-Correlation ๋ ˆ์ด์–ด๋ฅผ ํ†ต๊ณผ์‹œํ‚ค๋Š”๋ฐ, ๊ทธ ๋‚ด๋ถ€์—์„œ ์œ ์‚ฌํ•œ ์ฃผ๊ธฐ๋ฅผ ๊ฐ€์ง€๋Š” sub-series๋ฅผ ํ†ตํ•ฉํ•œ๋‹ค. (Auto-Correlation์„ ๊ตฌํ•œ ๋’ค, sofmax ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ์Šค์ฝ”์–ด๋ฅผ ์‚ฐ์ถœํ•˜์—ฌ sub-series๋ฅผ ๊ฐ€์ค‘ํ•ฉ)

๊ทธ ๋’ค์— Series Decomposition์„ ์ง„ํ–‰ํ•˜๋Š”๋ฐ, ์ธ์ฝ”๋”์˜ Series Decomposition์€ Seasonal ๋ณ€๋™๋งŒ์„ ๋‚จ๊ธฐ๊ณ  ๋‚˜๋จธ์ง€ Trend(+Cyclical)๋ณ€๋™์„ ์ œ๊ฑฐํ•œ๋‹ค. 

Seasonal, ์ฆ‰, ๊ณ„์ ˆ๋ณ€๋™์€ ์ƒ๋Œ€์ ์œผ๋กœ ์งง์€ ์ฃผ๊ธฐ๋กœ(ํ†ต์ƒ 1๋…„) ๋ฐ˜๋ณต๋˜๋Š” ๋ณ€๋™์„ ๋งํ•˜๋Š”๋ฐ, ์ด๋Š” ์•ž์„œ ์–ธ๊ธ‰ํ•œ "the long-term stationary trend"(์žฅ๊ธฐ์ ์œผ๋กœ ๋ฐ˜๋ณต๋˜๋Š” ์ถ”์„ธ)์— ์ง‘์ค‘ํ•˜๊ธฐ ์œ„ํ•จ์ด๋‹ค.

๊ฒฐ๊ณผ์ ์œผ๋กœ ์š”์†Œ๋ถ„ํ•ด๋ฅผ ํ†ตํ•ด ๋‚จ์€ ๊ณ„์ ˆ๋ณ€๋™์„ FFlayer์— ํ†ต๊ณผ์‹œํ‚จ ๋’ค ๋‹ค์‹œ ํ•œ๋ฒˆ ์š”์†Œ๋ถ„ํ•ด๋ฅผ ์ง„ํ–‰ํ•˜์—ฌ ํ•˜๋‚˜์˜ ์ธ์ฝ”๋”์—์„œ ๋งŒ๋“ค์–ด๋‚ด๋Š” ์ตœ์ข… output์„ ์‚ฐ์ถœํ•œ๋‹ค. (Encoder N๊ฐœ์˜ ์ค‘์ฒฉ by ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ N)

 

Decoder

๋””์ฝ”๋”๋Š” ์•ž์„œ ์„ค๋ช…ํ•œ๋Œ€๋กœ input series์˜ ์ ˆ๋ฐ˜์ธ label_len์„ ๋ฐ›์•„ Series Decomposition์„ ๋จผ์ € ์ง„ํ–‰ํ•œ๋‹ค.

์ดํ›„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ Seasonal ๋ณ€๋™์„ Auto-Correlation ๋ ˆ์ด์–ด๋ฅผ ํ†ต๊ณผ์‹œํ‚ค๋Š”๋ฐ, ์ธ์ฝ”๋”์™€๋Š” ๋‹ค๋ฅด๊ฒŒ Trend(+Cyclical)๋ณ€๋™์„ ๋ฒ„๋ฆฌ์ง€ ์•Š๊ณ  Series Decomposition ๋ธ”๋ก์„ ํ†ต๊ณผํ•  ๋•Œ๋งˆ๋‹ค ๋ถ„๋ฆฌ๋˜๋Š” Trend(+Cyclical)๋ณ€๋™๋“ค์„ ๋”ํ•ด์ค€๋‹ค.

์ดํ›„ ํƒ€ ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ์˜ Encoder-Decoder Attention์ฒ˜๋Ÿผ, ์ธ์ฝ”๋”์˜ ์ตœ์ข… ์•„์›ƒํ’‹(Seasonal ๋ณ€๋™)์„ ๋ฐ›์•„ ๋””์ฝ”๋”๊ฐ€ ์‚ฐ์ถœํ•œ ์ค‘๊ฐ„ ์•„์›ƒํ’‹(Seasonal ๋ณ€๋™)์„ ๋ฐ›์•„ ๋‘˜์„ ๋งตํ•‘ํ•˜๋Š” Auto-Correlation ๋ ˆ์ด์–ด๋ฅผ ํ†ต๊ณผ์‹œํ‚จ๋‹ค.

์ดํ›„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๊ทธ ๊ฒฐ๊ณผ๊ฐ’์„ Decompose ํ›„, FFlayer์— ํ†ต๊ณผ์‹œํ‚จ ๋’ค ๋‹ค์‹œ ํ•œ๋ฒˆ ์š”์†Œ๋ถ„ํ•ด๋ฅผ ์ง„ํ–‰ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ตœ์ข…์ ์œผ๋กœ ๊ณ„์† ๋”ํ•ด์ฃผ๋˜ Trend(+Cyclical)๋ณ€๋™์„ ํ•ฉ์ณ ํ•˜๋‚˜์˜ ๋””์ฝ”๋”์—์„œ ๋งŒ๋“ค์–ด๋‚ด๋Š” ์ตœ์ข… output์„ ์‚ฐ์ถœํ•œ๋‹ค. (Decoder M๊ฐœ์˜ ์ค‘์ฒฉ by ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ M)

 

 

+ Positional Encoding

 

๋‹ค๋ฅธ ํŠธ๋žœ์Šคํฌ๋จธ ๊ธฐ๋ฐ˜์˜ ๋ชจ๋ธ๋“ค๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ด ๋ชจ๋ธ ๋˜ํ•œ ์œ„์น˜์ •๋ณด๋ฅผ ๋„ฃ์–ด์ฃผ๋Š” Positional Encoding์ด ํ•„์š”ํ•˜๋‹ค.

๊ธฐ๋ณธ ํŠธ๋žœ์Šคํฌ๋จธ(Vanilla)๋Š” localํ•œ ์ •๋ณด๋งŒ์„ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์— ๊ทธ์น˜๋Š”๋ฐ, ํšจ๊ณผ์ ์ธ ์žฅ๊ธฐ์‹œ๊ณ„์—ด ์˜ˆ์ธก์„ ์œ„ํ•ด์„œ๋Š” ๋” globalํ•œ time-stamp๋ฅผ ๋„ฃ์–ด์ค„ ํ•„์š”๊ฐ€ ์žˆ๋‹ค.

๊ทธ๋Ÿฌํ•œ time-stamp์—๋Š” hierarchical time stamps (week, month and year)์™€ agnostic time stamps (holidays, events)๊ฐ€ ํฌํ•จ๋œ๋‹ค. 

์ด๋Š” "Informer"์—์„œ ์ œ์•ˆ๋œ ๊ฒƒ๊ณผ ๊ฐ™์€ ๊ฒƒ์œผ๋กœ, ๋จผ์ € ์ธํ’‹ ์‹œ๋ฆฌ์ฆˆ๋ฅผ d_model์— ๋งž๊ฒŒ project ํ•˜์—ฌ u๋ฅผ ๋งŒ๋“ ๋‹ค.

๊ทธ ํ›„ Local Time Stamps๋Š” sin, cosํ•จ์ˆ˜์— ๋”ฐ๋ผ "fixed" position์„ embaddingํ•˜๋ฉฐ, Global Time Stamps๋Š” ๊ฐ ์œ„์น˜์ •๋ณด๋ฅผ "learnable embadding"์„ ํ†ตํ•ด ๋„ฃ์–ด์ค€๋‹ค.

 

 

3-2) Auto-Correlation Mechanism

 

Autoformer๊ฐ€ ๊ฐ€์ง€๋Š” ๋˜ ๋‹ค๋ฅธ ์ฐจ๋ณ„์ ์ธ ์š”์†Œ๋Š” "Auto-Correlation"๋งค์ปค๋‹ˆ์ฆ˜์ด๋‹ค.

๊ธฐ์กด์˜ ํŠธ๋žœ์Šคํฌ๋จธ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ๋“ค์€ attention ๋งค์ปค๋‹ˆ์ฆ˜์„ ํ†ตํ•ด ๊ณผ๊ฑฐ ํฌ์ธํŠธ๊ฐ„์˜ ์œ ์‚ฌ์„ฑ์„ ๊ตฌํ•˜๋Š” ๋ฐ˜๋ฉด, ์ด ๋…ผ๋ฌธ์—์„œ ์ œ์‹œํ•˜๋Š” Autoformer๋Š” ๊ธฐ์กด์˜ ์–ดํ…์…˜์„ ์ž๊ธฐ์ƒ๊ด€์„ฑ์„ ์ด์šฉํ•œ ๋งค์ปค๋‹ˆ์ฆ˜์œผ๋กœ ๋Œ€์ฒดํ•œ๋‹ค.

์ด๋Ÿฌํ•œ ๋ฐฉ์‹์€ ๋…ผ๋ฌธ์—์„œ ํ‘œํ˜„ํ•˜๊ธธ, ๊ธฐ์กด์˜ "point-wise"๊ฐ€ ์•„๋‹Œ "series-wise"ํ•œ ๋ฐฉ๋ฒ•์ด๋‹ค.

 

 

 

 

์œ„ ๊ทธ๋ฆผ์—์„œ์ฒ˜๋Ÿผ, (d)๋ฅผ ์ œ์™ธํ•œ ์ „ํ†ต์ ์ธ ์–ดํ…์…˜(a)์ด๋‚˜, sparseํ•œ ๋ฐฉ๋ฒ•์„ ๋„์ž…ํ•œ(b),(c)๋“ค๊ณผ ๋‹ฌ๋ฆฌ (d)๋Š” ์–ด๋– ํ•œ ์ง€์ ์„ ๊ธฐ์ค€์œผ๋กœ dependence๋ฅผ ๋ฝ‘์•„๋‚ด๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ์œ ์‚ฌํ•œ ํ๋ฆ„์„ ๋ณด์ด๋Š” series๋ฅผ ๊ธฐ์ค€์œผ๋กœ Period-based dependence๋ฅผ ๋ฝ‘์•„๋‚ธ๋‹ค.

์ด๋•Œ ๊ทธ ์œ ์‚ฌํ•œ series๋ฅผ ํŒ๋‹จํ•˜๋Š” ๋งค์ปค๋‹ˆ์ฆ˜์— ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๊ฐ€ ๊ฐ€์ง€๋Š” ์†์„ฑ์ธ ์ž๊ธฐ์ƒ๊ด€์„ฑ(Auto Correlation)์„ ์ด์šฉํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

 

์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ์ƒ, ๊ธฐ๋ณธ์ ์œผ๋กœ ๋ณ€์ˆ˜์˜ ๋ณ€ํ™”์— ๋”ฐ๋ฅด๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ์‹œ๊ฐ„์— ๋”ฐ๋ฅธ ์ž๊ธฐ ์ž์‹ ์˜ ํ๋ฆ„์— ์˜ํ•ด ์‹œ๊ณ„์—ด์ด ์ง„ํ–‰๋˜๊ธฐ ๋•Œ๋ฌธ์— ๊ณผ๊ฑฐ์˜ ์ž๊ธฐ ์ž์‹ ๊ณผ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๊ฐ€์ง„๋‹ค. ๋˜ํ•œ ์‹œ๊ณ„์—ด์ด๋ž€, ๋ง ๊ทธ๋Œ€๋กœ ์‹œ๊ฐ„์— ๋”ฐ๋ฅธ ๋ฐ์ดํ„ฐ์˜ ํ๋ฆ„์ด๊ธฐ ๋•Œ๋ฌธ์— ์–ด๋– ํ•œ ํฌ์ธํŠธ๋ฅผ ๋”ฑ ์ž˜๋ผ์„œ ๋ณด๋Š” ๊ฒƒ์€ ๊ทธ period์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์žƒ๊ฒŒ๋œ๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์–ด๋– ํ•œ ์‹œ์ ๊ณผ ๋†’์€ ์ž๊ธฐ์ƒ๊ด€์„ฑ์„ ๊ฐ€์ง€๋Š” "sub-series"๋ฅผ ํŒŒ์•…ํ•˜์—ฌ ์—ฐ๊ฒฐํ•จ์œผ๋กœ์จ ์‹œ๊ณ„์—ด์ด ๊ฐ€์ง€๋Š” ์ •๋ณด๋“ค์„ ๋” ํญ๋„“๊ฒŒ ์ด์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค๊ณ  ๋งํ•œ๋‹ค. (expand information utilization) 

Auto-Correlation ๋งค์ปค๋‹ˆ์ฆ˜

์œ„ ๊ทธ๋ฆผ์€ Auto-Correlation ๋งค์ปค๋‹ˆ์ฆ˜์˜ ๊ณผ์ •์„ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ๋‹ค. ์™ผ์ชฝ์€ ๊ทธ ์ „์ฒด๊ณผ์ •์„, ์˜ค๋ฅธ์ชฝ์€ Time Delay Aggregation ๋ธ”๋ก์„ ๋‚˜ํƒ€๋‚ธ๋‹ค. ๊ทธ ๊ณผ์ •์„ ๊ฐ„๋‹จํžˆ ์š”์•ฝํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. 

1. ๋จผ์ € encoder-decoder auto-correlation ๊ธฐ์ค€, decoder์—์„œ Q๋ฅผ ์–ป๊ณ  encoder์—์„œ K,V๋ฅผ ์–ป๋Š”๋‹ค.

2. ์ดํ›„, Q(de),K(en)๋ฅผ FFT(Fast Furier Transformation)๋ฅผ ํ†ตํ•ด ๋นˆ๋„์ˆ˜๋กœ ๋ณ€ํ™˜ํ•œ ๋’ค ๊ทธ ๋‘˜(์ผค๋ ˆ ๋ณต์†Œ์ˆ˜)์„ ๋‚ด์ ํ•œ๋‹ค.

3. ๊ทธ๋Ÿฌํ•œ ๊ฒฐ๊ณผ๊ฐ’์„ ๋‹ค์‹œ inverse FFT๋ฅผ ํ†ตํ•ด ๋‹ค์‹œ ํƒ€์ž„ ๋„๋ฉ”์ธ์œผ๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค. (2์™€ 3๊ณผ์ •์„ ๊ฑฐ์น˜๋ฉด Auto-Correlation์„ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.)

4. ์•ž์„œ ๊ตฌํ•œ Auto-Correlation ๊ฒฐ๊ณผ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์œ ์‚ฌํ•œ series TOP k๊ฐœ๋ฅผ ์„ ์ •ํ•œ๋‹ค.

5. ๋งˆ์ง€๋ง‰์œผ๋กœ Time Delay Aggregation ๋ธ”๋ก์„ ํ†ตํ•ด, ์˜ˆ์ธก๊ธธ์ด์—์„œ ์ธํ’‹๊ธธ์ด๋กœ(S->L) Resizeํ•œ V(en)๋ฅผ "τ"๋งŒํผ delay๋ฅผ ๋กค๋งํ•˜์—ฌ(k๊ฐœ) ์ƒ์„ฑํ•œ sub-series์™€, Auto- Correlation ๊ฒฐ๊ณผ๋“ค์„ softmaxํ•จ์ˆ˜์— ํ†ต๊ณผ์‹œ์ผœ ์ƒ์„ฑํ•œ ๊ฐ€์ค‘์น˜๋ฅผ ๊ณฑํ•œ๋‹ค.

 

 

Period-based dependencies

 

๋‚ด๋ถ€์ ์œผ๋กœ ๋™์ผํ•œ phase position, ์ฆ‰, ๊ฐ™์€ ์ž๊ธฐ์ƒ๊ด€์„ ๋ณด์ด๋Š” sub series๋Š”, ๊ฐ™์€ ๊ณผ์ •, ํ๋ฆ„์„ ๊ฐ€์ง€๊ณ  ๊ฒฐ๊ณผ์ ์œผ๋กœ ๊ฐ™์€ period๋ฅผ ๋ณด์ธ๋‹ค.

๊ฒฐ๊ตญ ์œ„ ๋งค์ปค๋‹ˆ์ฆ˜์˜ ๋ชฉ์ ์€ ๊ทธ๋Ÿฌํ•œ ๊ฐ™์€ Period์— ๊ธฐ๋ฐ˜ํ•œ dependencies์„ ๋ฝ‘์•„๋‚ด๊ธฐ ์œ„ํ•œ ๊ฒƒ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค. 

์ด์— Period-based dependencies๋ฅผ ๋ฝ‘์•„๋‚ด๋Š” auto-correlation์˜ ์ˆ˜์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

 

Time Delay Aggregation

 

auto-correlation์ด ๋†’๋‹ค๋Š” ๊ฒƒ์€ ์‹œ๊ณ„์—ด ์ „์ฒด๋ฅผ ๋ดค์„ ๋•Œ ๊ฐ™์€ ์ฃผ๊ธฐ๋ฅผ ๊ฐ€์ง„๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

(๊ฒฐ๊ณผ์ ์œผ๋กœ auto-correlation์€ ๊ฐ™์€ period๋ฅผ ์ฐพ๋Š” ๊ณผ์ •์ด๋‹ค.)

๊ฒฐ๊ตญ ๊ทธ dependence๋ฅผ ๋ฝ‘์•„๋‚ด๊ธฐ ์œ„ํ•ด์„œ ์œ„ ์‹์—์„œ ๋„์ถœ๋œ R(auto-correlation)์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฐ™์€ ์ฃผ๊ธฐ(period)๋ฅผ ๊ฐ€์ง€๋Š” sub-series(by τ)๋ฅผ ์—ฐ๊ฒฐํ•œ๋‹ค. 

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋ ‡๊ฒŒ ์‹œ์ ์ด ๋‹ค๋ฅธ time delay๋ฅผ ๊ฐ€์ง„ sub-series๋ฅผ ์—ฐ๊ฒฐ/ํ†ตํ•ฉํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ˆ˜์‹์„ ์ œ์‹œํ•œ๋‹ค.

 

 

$$TimeSeries X, Length L$$

$$Topk(), top k๊ฐœ์˜ auto corrlation. k = c*logL (c=hyperparameter)$$

$$R_{Q,K}= Q,K๊ฐ„์˜ auto correlation$$

$$Roll{(X,\tau)}= ์‹œ๋ฆฌ์ฆˆ X์˜ \tau๋งŒํผ์˜ delay. (์ด๋•Œ \tau๋งŒํผ์˜ ๋’ท ์‹œ์ ์„ ๋งจ ์•ž์œผ๋กœ shiftํ•œ๋‹ค. = sub series)$$

 

์ตœ์ข…์ ์œผ๋กœ ์ •๋ฆฌํ•˜์ž๋ฉด,

1. Q์™€ K์— ๋Œ€ํ•ด Auto-Correlation์„ ๊ตฌํ•˜๊ณ , ๋†’์€ ์ž๊ธฐ์ƒ๊ด€์„ฑ์„ ๊ฐ€์ง€๋Š”(=๊ฐ™์€ ์ฃผ๊ธฐ๋ฅผ ๊ฐ€์ง€๋Š”) sub-series๋ฅผ k๊ฐœ ๋ฝ‘๋Š”๋‹ค.

2. ์ด๋“ค์„ softmax๋ฅผ ํ†ตํ•ด Q,K์˜ ์œ ์‚ฌ๋„๋ฅผ ๊ฐ€์ค‘์น˜๋กœ ์‚ฐ์ถœํ•œ๋‹ค.

3. ๊ทธ๋Ÿฌํ•œ ๊ฐ€์ค‘์น˜๋ฅผ V์˜ sub-series(by τ)๋“ค์— ๊ณฑํ•˜๊ณ  ํ†ตํ•ฉ(concat)ํ•œ๋‹ค.

์ตœ์ข…์ ์œผ๋กœ ์ด๋Ÿฌํ•œ ๊ณผ์ •์˜ ๊ฒฐ๊ณผ๋ฌผ์€ ํ•˜๋‚˜์˜ head๋กœ, ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ์ง€์ •ํ•œ head์˜ ๊ฐœ์ˆ˜๋งŒํผ ์ง„ํ–‰๋˜์–ด ๋งˆ์ง€๋ง‰์œผ๋กœ concatํ•œ ๋’ค ๊ฐ ๋ธ”๋ก์˜ ๊ฐ€์ค‘์น˜๋ฅผ ๊ณฑํ•ด ๋‹ค์Œ์œผ๋กœ ๋„˜๊ฒจ์ง„๋‹ค. ์ด๋Š” ๊ธฐ๋ณธ transformer์˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํฌํ•จํ•œ ๊ทธ ๊ณผ์ •๊ณผ ๋™์ผํ•˜๋‹ค.

 

Efficient computation

๊ธฐ๋ณธ ํŠธ๋žœ์Šคํฌ๋จธ๊ฐ€ ๊ฐ€์ง€๋Š” L(O^2)์˜ ๋ณต์žก๋„์™€ ๊ณ„์‚ฐ ๋น„ํšจ์œจ์„ฑ์€ ์—ฌ๋Ÿฌ ๋…ผ๋ฌธ์—์„œ ์ง€์ ๋ฐ›๊ณ  ์žˆ๋Š” ํ•œ๊ณ„์ ์ด๋‹ค.

๋”ฐ๋ผ์„œ ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ์˜ ํšจ์œจํ™”๋Š” ์—ฌ๋Ÿฌ ๋ณ€ํ˜• ๋ชจ๋ธ๋“ค์„ ํฌํ•จํ•œ ๋‹ค์–‘ํ•œ ์—ฐ๊ตฌ์—์„œ ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ๋Š” ์ค‘์š”ํ•œ ๋ฐฉํ–ฅ ์ค‘ ํ•˜๋‚˜์ด๋‹ค.

 

์ด์— ์ด ๋ชจ๋ธ์—์„œ๋Š” forecasting ์„ฑ๋Šฅ๊ณผ ๋”๋ถˆ์–ด Auto-Correlation์˜ ๋ณต์žก๋„๋ฅผ ์ค„์ด๋Š” ๋ฐฉ๋ฒ• ๋˜ํ•œ ์ œ์‹œํ•˜๊ณ  ์žˆ๋‹ค. 

Autoformer๋Š” ๋ชจ๋ธ์˜ ๋ณต์žก๋„, ๊ณ„์‚ฐ์˜ ํšจ์œจ์„ฑ์„ ์œ„ํ•ด "FFT(Fast Furier Transformation)"๋ฅผ ์ด์šฉํ•œ๋‹ค.

 

์ฒซ๋ฒˆ์งธ ํ•จ์ˆ˜ S๋Š” Furier series์˜ ์ œ๊ณฑ, ์ •ํ™•ํžˆ๋Š” ์ผค๋ ˆ ๋ณต์†Œ์ˆ˜ ๋‚ด์ (๊ทธ๋ฆผ์—์„œ Conjugate)์„ ์˜๋ฏธํ•˜๋ฉฐ ๋‘๋ฒˆ์งธ ํ•จ์ˆ˜๋Š” FT( Furier Transformation)๋กœ ๋นˆ๋„์ˆ˜ ๋„๋ฉ”์ธ์œผ๋กœ ๋ณ€ํ™˜๋œ series๋ฅผ ๋‹ค์‹œ ์‹œ๊ฐ„ ๋„๋ฉ”์ธ์œผ๋กœ ๋ณ€ํ™˜ํ•ด์ฃผ๋Š” inverse transformer์ด๋‹ค. ์ˆ˜์‹ ๋‚ด์˜ ์ˆ˜ํ•™์ ์ธ ๊ณผ์ •์€ ๋ณต์žกํ•˜์ง€๋งŒ ๊ฒฐ๊ณผ์ ์œผ๋กœ ์ „์ฒด ๊ณผ์ •์€ Auto-Correlation๊ณผ ๊ฐ™์€ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง„๋‹ค. ๊ฐ€์žฅ ์ฃผ๋ชฉํ•ด์•ผ ํ•˜๋Š” ๊ฒƒ์€ ์ด๋Ÿฌํ•œ ๋ณ€ํ™˜๋“ค์„ ํ†ตํ•ด ๋ณต์žก๋„๊ฐ€ O(LlogL)๋กœ ์ค„์–ด๋“ ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

*FT(Furier Transformation), FFT(Fast Furier Transformation)์™€ ๊ด€๋ จ๋œ ๋‚ด์šฉ์€ ๋‹ค๋ฅธ ํฌ์ŠคํŠธ์—์„œ ๋‹ค์‹œ ์ •๋ฆฌํ•  ์˜ˆ์ •์ด๋‹ค.

 

 

 

4. Experiments

4-1) Main Results

๋งˆ์ง€๋ง‰ ๋‹จ๋ฝ์ธ Experiments์—์„œ๋Š” Autoformer์˜ ์„ฑ๋Šฅ๊ณผ ๊ธฐํƒ€ ์žฅ์ ์„ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ ์‹คํ—˜์„ ์ง„ํ–‰ํ•œ๋‹ค.

๋ณธ ๋…ผ๋ฌธ์˜ ์‹คํ—˜์—๋Š” 6 ๊ฐœ์˜ ๋ฒค์น˜๋งˆํฌ ๋ฐ์ดํ„ฐ๊ฐ€ ์“ฐ์˜€๋‹ค. ๋ฐ์ดํ„ฐ์˜ ๊ฐ ๋„๋ฉ”์ธ์€ energy, traffic, economics, weather and disease์ด๋‹ค.

(with L2 loss function, ADAM optimizer, an initial learning rate of 10^-4, Batch size is set to 32, The training process is early stopped within 10 epochs)

 

Multivariate results with different prediction lengths(96, 192, 336, 720)

 

Multivariate test ๊ฒฐ๊ณผ, ๋ชจ๋“  ๋ฒค์น˜๋งˆํฌ ๋ฐ์ดํ„ฐ์—์„œ Autoformer๊ฐ€ ๋‹ค๋ฅธ ๋ชจ๋ธ๋“ค์„ ๋›ฐ์–ด๋„˜๋Š” "sota"์˜ ์„ฑ๋Šฅ์„ ๋ƒˆ๋‹ค. ์ด๋Š” ์•ฝ 38%์˜ MSE ๊ฐ์†Œ๋กœ, ๋ชจ๋“  ๋ฐ์ดํ„ฐ์™€ ์•„์›ƒํ’‹์˜ ๊ธธ์ด์—์„œ ์„ฑ๋Šฅํ–ฅ์ƒ์„ ๋ณด์˜€๋‹ค(long-term robustness). ํ•œํŽธ, ๋” ๋ˆˆ์— ๋„๋Š” ๊ฒฐ๊ณผ๋Š” ํŠน๋ณ„ํ•œ ์ฃผ๊ธฐ์„ฑ์ด ์—†๋Š” ๋ฐ์ดํ„ฐ์ธ exchange(ํ™˜์œจ) ๋ฐ์ดํ„ฐ์—์„œ๋„ ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ƒˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

์ด๋Š” ๊ธด ์ธํ’‹์—์„œ๋„, ๋‹ค์–‘ํ•œ ์ฃผ๊ธฐ ๋“ฑ ๋ณต์žกํ•œ ๋ณ€๋™์ด ์žˆ๋Š” ์‹ค์ƒํ™œ์—์„œ๋„ ์ด Autoformer๊ฐ€ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋‚ผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์‹œ์‚ฌํ•œ๋‹ค.

 

Univariate results with different prediction lengths

์ด ์‹คํ—˜์€ ์ฃผ๊ธฐ์„ฑ์ด ๋‘๋“œ๋Ÿฌ์ง€๋Š” ETT์™€ ๊ทธ๋ ‡์ง€ ์•Š์€ Exchange ๋ฐ์ดํ„ฐ๋กœ ์ง„ํ–‰๋˜์—ˆ๋Š”๋ฐ, ๋‹ค๋ณ€์ˆ˜๋กœ ์ข…์†๋ณ€์ˆ˜๋ฅผ ์˜ˆ์ธกํ•˜๋Š” Univariate results์—๋„ Autoformer๊ฐ€ ๊ฐ€์žฅ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค.

๊ทธ๋Ÿฐ๋ฐ, Exchange data์—์„œ ์˜ˆ์ธก ์•„์›ƒํ’‹์˜ ๊ธธ์ด๊ฐ€ ๊ฐ€์žฅ ์งง์„ ๋•Œ ARIMA๊ฐ€ ๊ฐ€์žฅ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค.

์ด ๋…ผ๋ฌธ์˜ ์—ฐ๊ตฌ์ž๋“ค์€ ์ฐจ๋ถ„์„ ํ†ตํ•ด ๋น„์ •์ƒ์ ์ธ ๊ฒฝ์ œ ๋ฐ์ดํ„ฐ์˜ ๋‹จ๊ธฐ ๋ณ€๋™์„ ์ž˜ ์žก์•„๋‚ผ ์ˆ˜ ์žˆ๋Š” ARIMA์˜ ์žฅ์ ์ด ๋‹๋ณด์ด์ง€๋งŒ, ๊ธด ์‹œ์ ์„ ์˜ˆ์ธกํ• ์ˆ˜๋ก ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ๊ฐ์†Œํ•˜๋Š” ARIMA์˜ ํ•œ๊ณ„์  ๋˜ํ•œ ์ž˜ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ๋‹ค๊ณ  ์–ธ๊ธ‰ํ•œ๋‹ค.

 

4-2) Ablation studies

 

์ด ์—ฐ๊ตฌ์—์„œ๋Š” Autoformer์˜ ์ฐจ๋ณ„์ ์ธ ํŠน์ง•์ธ "Series Decomposition"๊ณผ "Auto-Correlation"์˜ ํšจ๊ณผ๋ฅผ ํ™•์ธํ•˜์˜€๋‹ค.

๋งจ ์œ„ ์‹คํ—˜์—์„œ๋Š” Origin(๊ธฐ๋ณธ), Sep(์‚ฌ์ „์— ๋ถ„๋ฆฌํ•˜์—ฌ ๋‘ ๋ณ€๋™์„ ๊ฐ๊ฐ ์˜ˆ์ธก), Ours(Autoformer์˜ ์•„ํ‚คํ…Œ์ฒ˜)๋กœ ์กฐ๊ฑด์„ ๋‚˜๋ˆ„์–ด ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•˜์˜€๋Š”๋ฐ, Ours(Autoformer์˜ ์•„ํ‚คํ…Œ์ฒ˜)์˜ ๊ฒฐ๊ณผ๊ฐ€ ๊ฐ€์žฅ ์šฐ์ˆ˜ํ•˜์˜€๋‹ค.

 

2๋ฒˆ์งธ ์‹คํ—˜์—์„œ๋Š” Autoformer์— ๊ฐ๊ฐ ๋‹ค๋ฅธ ๋งค์ปค๋‹ˆ์ฆ˜์„ ์ ์šฉํ•œ ๊ฒฐ๊ณผ์ด๋‹ค.

๋งˆ์ฐฌ๊ฐ€์ง€๋กœ Auto-Correlation์˜ ๋งค์ปค๋‹ˆ์ฆ˜์ด ๊ฐ€์žฅ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋ฉฐ, "-"(out of memory), ์ฆ‰, ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ดˆ๊ณผ๋˜๋Š” ๊ฒฐ๊ณผ์—†์ด ๋งค์šฐ ๊ธด ์ธํ’‹๊ณผ ์•„์›ƒํ’‹์—๋„ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์˜€๋‹ค.

 

4-3) Model Analysis

 

๋งˆ์ง€๋ง‰์œผ๋กœ ์—ฐ๊ตฌ์ž๋“ค์€ Autoformer๋ชจ๋ธ์˜ ํŠน์„ฑ๋“ค์„ ์‹คํ—˜ํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์‹œ์‚ฌ์ ์„ ๋„์ถœํ•˜์˜€๋‹ค.

 

1. Time series decomposition

(a)๋ฅผ ๋ณด๋ฉด decomposition ๋ธ”๋ก์ด ์—†์„ ๋•Œ ์ฆ๊ฐ€ํ•˜๋Š” trend์™€ seasonal ๋ณ€๋™์˜ peak ์ง€์ ์„ ์ œ๋Œ€๋กœ ํฌ์ฐฉํ•˜์ง€ ๋ชปํ•˜์˜€๋‹ค.

์ดํ›„ ๋ธ”๋ก์„ ๋Š˜๋ ค๊ฐˆ์ˆ˜๋ก ๋ชจ๋“  ๋ณ€๋™๋“ค์„ ๋”์šฑ ์ž˜ ํฌ์ฐฉํ•˜๊ณ  ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

์ด๋Š” Autoformer๊ฐ€ "์™œ Decomposition ๋ธ”๋ก์„ ์„ธ ๊ฐœ ๊ฐ€์ง€๊ณ  ์žˆ์–ด์•ผ ํ•˜๋Š” ์ง€"๋ฅผ ์‹œ์‚ฌํ•˜๊ณ  ์žˆ๋‹ค.

 

2. Dependencies learning

์ด ์‹คํ—˜์€ ๊ฐ ๋งค์ปค๋‹ˆ์ฆ˜๋“ค์ด ๊ฐ€์žฅ ๋งˆ์ง€๋ง‰ timestep(๊ฐ์†Œํ•˜๋Š” phase)๊ณผ์˜ Dependency๋“ค์„ ์ถ”์ถœํ•œ series(a)๋‚˜ point((b),(c),(d))๋ฅผ ๋ณด์—ฌ์ค€๋‹ค.

(a)๋ฅผ ๋ณด๋ฉด, Auto-Correlation์ด ๋‹ค๋ฅธ ์–ดํ…์…˜ ๋งค์ปค๋‹ˆ์ฆ˜๋“ค๊ณผ ๋‹ฌ๋ฆฌ "์ „์ฒด ํ๋ฆ„ ์†์—์„œ" dependency๋ฅผ ๋” "ํญ๋„“๊ฒŒ", "์ •ํ™•ํ•˜๊ฒŒ" ์ฐพ์•˜๋‹ค.

์ด๋Š” Auto-Correlation, ์ฆ‰, series-wiseํ•œ ๋ฐฉ๋ฒ•์ด ์ „์ฒด ํ๋ฆ„์„ ๋”์šฑ ์ž˜ ํฌ์ฐฉํ•˜๋ฉฐ information utilization ์ธก๋ฉด์—์„œ ๋” ํšจ๊ณผ์ ์ธ ๋งค์ปค๋‹ˆ์ฆ˜์ด๋ผ๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค. 

 

3. Complex seasonality modeling

 

Model Analysis์˜ ๋งˆ์ง€๋ง‰์œผ๋กœ, ์—ฐ๊ตฌ์ž๋“ค์€ ํ•™์Šตํ•œ lags์— ๋”ฐ๋ฅธ ๋ฐ€๋„๋ฅผ histogram์œผ๋กœ ์‹œ๊ฐํ™”ํ•˜์˜€๋‹ค.

lag์— ๋”ฐ๋ฅธ ๋ฐ€๋„๋ฅผ ์‹œ๊ฐํ™”ํ•œ ๊ฒฐ๊ณผ, ์ด ํžˆ์Šคํ† ๊ทธ๋žจ์€ ๊ฐ ๋ฐ์ดํ„ฐ์˜ ์‹œ์ ์— ๋”ฐ๋ผ ์‹ค์ƒํ™œ์˜ seasonality ๋ณ€๋™์„ ๋‚˜ํƒ€๋‚ด๊ณ  ์žˆ์—ˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด (a)๋Š” ํ•˜๋ฃจ์˜ ์‹œ๊ฐ„์ธ 24 lag๊นŒ์ง€ ํ•˜๋ฃจ์˜ ์ฃผ๊ธฐ๋ฅผ ๋‚˜ํƒ€๋‚ด๊ณ , ์ผ์ฃผ์ผ์„ ๋‚˜ํƒ€๋‚ด๋Š” 168lag(24*7)๊นŒ์ง€๋Š” ์ผ์ฃผ์ผ์˜ ์ฃผ๊ธฐ๋ฅผ ๋‚˜ํƒ€๋‚ด๊ณ  ์žˆ๋‹ค. 

์ด๋Š” ์ฆ‰, Autoformer๊ฐ€ ๋‹จ์ˆœํ•œ ์˜ˆ์ธก์˜ ๊ฒฐ๊ณผ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๊ทธ ๊ณผ์ • ์†์—์„œ ๋ณ€๋™์˜ ์ฃผ๊ธฐ๋ฅผ ์ž˜ ํฌ์ฐฉํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ์ด๋ฅผ ์‹œ๊ฐํ™”ํ•จ์œผ๋กœ์จ ์ธ๊ฐ„์ด ํ•ด์„ํ•  ์ˆ˜ ์žˆ๋Š” ์˜ˆ์ธก์„ ์‹คํ˜„ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค.

 

4. Efficiency analysis

์ด Autoformer๋Š” ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์„ฑ, ์‹œ๊ฐ„ ํšจ์œจ์„ฑ ์ธก๋ฉด์—์„œ ํƒ€ ๋ชจ๋ธ๋“ค์— ๋น„ํ•ด ๋” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋ƒˆ๋‹ค.

์ด๋Š” O(Output Length)๊ฐ€ ๋งค์šฐ ๊ธธ์–ด์ง€๋Š” ์žฅ๊ธฐ ์‹œ๊ณ„์—ด ์˜ˆ์ธก๋ฌธ์ œ์—์„œ Autofomer๊ฐ€ ๋‹จ์ˆœํ•œ ์„ฑ๋Šฅ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ํšจ์œจ์„ฑ ์ธก๋ฉด์—์„œ๋„ ๋” ๋›ฐ์–ด๋‚œ ๋ชจ๋ธ์ด๋ผ๋Š” ๊ฒƒ์„ ์ž…์ฆํ•˜๋Š” ๊ฒฐ๊ณผ์ด๋‹ค.

 

 

๋งˆ์น˜๋ฉฐ

 

์ด ๋…ผ๋ฌธ์€ ๊ธฐ์กด ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ์— ์‹œ๊ณ„์—ด์˜ ํŠน์ง•์„ ์ถ”๊ฐ€ํ•œ ๋ณ€ํ˜• ๋ชจ๋ธ์ธ Autoformer๋ฅผ ์ œ์•ˆํ•œ ๋…ผ๋ฌธ์ด๋‹ค.

ํŠธ๋žœ์Šคํฌ๋จธ์˜ ๋“ฑ์žฅ ์ดํ›„, self-attention์„ ์žฅ์ฐฉํ•œ ํŠธ๋žœ์Šคํฌ๋จธ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์€ ๊ธฐ์กด RNN, CNN ๊ธฐ๋ฐ˜์˜ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ๋“ค์˜ ์„ฑ๋Šฅ๋ณด๋‹ค ๋” ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋‚ด๊ณ ์žˆ๋‹ค.

์—ฌ๋Ÿฌ ํŠธ๋žœ์Šคํฌ๋จธ ๊ธฐ๋ฐ˜์˜ ๋ณ€ํ˜•๋ชจ๋ธ๋“ค์€ ์žฅ๊ธฐ(long-term, long-range) ์‹œ๊ณ„์—ด์˜ ์˜ˆ์ธก์„ฑ๋Šฅ์„ "ํšจ์œจ์ ์œผ๋กœ" ๋†’์ด๊ธฐ ์œ„ํ•ด attention module์ด๋‚˜ ๊ทธ architecture๋ฅผ ๊ฐœ์กฐํ•˜์—ฌ ์ข‹์€ ์„ฑ๋Šฅ์„ ์ž…์ฆํ•˜์˜€๋‹ค.

 

๊ทธ๋Ÿฌ๋‚˜, ์ด ๋…ผ๋ฌธ์€ ์ง€๋‚œ ๋ณ€ํ˜•๋ชจ๋ธ๋“ค๊ณผ๋Š” ๋‹ค๋ฅด๊ฒŒ ์ „ํ†ต์ ์ธ "์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์˜ ํŠน์ง•์„ ์ž…ํžŒ", "์‹œ๊ณ„์—ด ๋ถ„์„์— ๋”์šฑ ํŠนํ™”๋œ" ๋ชจ๋ธ์ธ Autoformer๋ฅผ ์ œ์•ˆํ•œ๋‹ค.

Autoformer์˜ ์ฐจ๋ณ„์ ์ธ ์š”์†Œ๋Š” ๋ฐ”๋กœ "Decomposition"๊ณผ "Auto-Correlation"์˜ ๋งค์ปค๋‹ˆ์ฆ˜์„ ์ด์šฉํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

 

๊ฒฐ๊ณผ์ ์œผ๋กœ ์ด ๋ชจ๋ธ์€ ๋‹ค๋ฅธ ๋ชจ๋ธ ๋Œ€๋น„ ๊ฐ€์žฅ ํƒ„ํƒ„ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋Š” ๋“ฑ, ์žฅ๊ธฐ ์‹œ๊ณ„์—ด Forecasting์—์„œ "SOTA"์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜์˜€๊ณ  ์•ž์œผ๋กœ ๋”ฅ๋Ÿฌ๋‹์—์„œ ์‹œ๊ณ„์—ด์ด ์–ด๋– ํ•œ ๋ฐฉํ–ฅ์œผ๋กœ ๋‚˜์•„๊ฐ€์•ผ ํ•  ์ง€, ๊ทธ ๊ฐ€๋Šฅ์„ฑ๊นŒ์ง€ ์‹œ์‚ฌํ•˜๊ณ  ์žˆ๋Š” ์ค‘์š”ํ•œ ๋…ผ๋ฌธ์ด๋‹ค.

 

PS. ์ด ๋…ผ๋ฌธ์€ ํŠธ๋žœ์Šคํฌ๋จธ์—์„œ ์‹œ๊ณ„์—ด ๋ถ„์„์ด ๋‚˜์•„๊ฐ€์•ผ ํ•  ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•ด์ค€ ๋งค์šฐ ์ค‘์š”ํ•œ ๋…ผ๋ฌธ์ด๋ผ๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ ์—ฌ๊ธฐ์„œ ์ฒ˜์Œ ์ œ์‹œํ•œ Decomposition ์•„ํ‚คํ…Œ์ฒ˜๋Š” ํ˜„์žฌ ์—ฐ๊ตฌ ํŠธ๋ Œ๋“œ์˜ ํ•œ ์ถ•์„ ์ด๋ฃจ๊ณ  ์žˆ์„ ์ •๋„์ž…๋‹ˆ๋‹ค. ๋„ˆ๋ฌด๋‚˜๋„ ์ข‹์€ ์•„์ด๋””์–ด์™€ ์ค‘์š”ํ•œ ๋‚ด์šฉ๋“ค์ด ํฌํ•จ๋œ ๋…ผ๋ฌธ์ด๋ผ ์ƒ๊ฐ๋˜์–ด ๋”์šฑ ์ž์„ธํžˆ ๋ฆฌ๋ทฐํ•˜๋‹ค๋ณด๋‹ˆ ๋‚ด์šฉ์ด ๊ธธ์–ด์กŒ์Šต๋‹ˆ๋‹ค. ๊ธด ๋‚ด์šฉ์„ ๋๊นŒ์ง€ ์ฝ์–ด์ฃผ์‹  ๋ถ„๋“ค๊ป˜ ๊ฐ์‚ฌ์˜ ๋ง์”€์„ ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

 

Paper ์›๋ฌธ

https://arxiv.org/abs/2106.13008

 

Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting

Extending the forecasting time is a critical demand for real applications, such as extreme weather early warning and long-term energy consumption planning. This paper studies the long-term forecasting problem of time series. Prior Transformer-based models

arxiv.org