iccv2019论文全集8672-shape-and-time-distortion-loss-for-training-deep-time-series-forecasting-models_第1页
iccv2019论文全集8672-shape-and-time-distortion-loss-for-training-deep-time-series-forecasting-models_第2页
iccv2019论文全集8672-shape-and-time-distortion-loss-for-training-deep-time-series-forecasting-models_第3页
iccv2019论文全集8672-shape-and-time-distortion-loss-for-training-deep-time-series-forecasting-models_第4页
iccv2019论文全集8672-shape-and-time-distortion-loss-for-training-deep-time-series-forecasting-models_第5页
已阅读5页,还剩8页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Shape and Time Distortion Loss for Training Deep Time Series Forecasting Models Vincent Le Guen 1,2 vincent.le-guenedf.fr Nicolas Thome 2 nicolas.thomecnam.fr (1) EDF R timing errors can be casted as a detection problem by computing Precision and Recall scores after segmenting series by Change Point

2、 Detection 8,33, or by computing the Hausdorff distance between two sets of change points 22,51. For assessing the detection of ramps in wind and solar energy forecasting, specifi c algorithms were designed: for shape, the ramp score 18,52 based on a piecewise linear approximation of the derivatives

3、 of time series; for temporal error estimation, the Temporal Distortion Index (TDI) 20,52. However, these evaluation metrics are not differentiable, making them unusable as loss functions for training deep neural networks. The impossibility to directly optimize the appropriate (often non-differentia

4、ble) evaluation metric for a given task has bolstered efforts to design good surrogate losses in various domains, for example in ranking 15, 62 or computer vision 38, 59. Recently, some attempts have been made to train deep neural networks based on alternatives to MSE, especially based on a smooth a

5、pproximation of the Dynamic time warping (DTW) 13,37,1. Training DNNs with a DTW loss enables to focus on the shape error between two signals. However, since DTW is by design invariant to elastic distortions, it completely ignores the temporal localization of the change. In our context of sharp chan

6、ge detection, both shape and temporal distortions are crucial to provide an adequate forecast. A differentiable timing error loss function based on DTW on the event (binary) space was proposed in 41 ; however it is only applicable for predicting binary time series. This paper specifi cally focuses o

7、n designing a loss function able to disentangle shape and temporal delay terms for training deep neural networks on real world time series. 3Training Deep Neural Networks with DILATE Our proposed framework for multi-step forecasting is depicted in Figure 2. During training, we consider a set ofNinpu

8、t time seriesA = xii1:N. For each input example of lengthn, i.e.xi= (x1 i,.,xni) Rpn, a forecasting model such as a neural network predicts the future k-step ahead trajectory yi= ( y1 i,., yki) Rdk. Our DILATE objective function, which compares this prediction yiwith the actual ground truth future t

9、rajectory yi= ( yi1,., yik)of lengthk, is composed of two terms balanced by the hyperparameter 0,1: LDILATE( yi, yi) = Lshape( yi, yi) + (1 ) Ltemporal( yi, yi)(1) Notations and defi nitionsBoth our shapeLshape( yi, yi)and temporalLtemporal( yi, yi)distor- tions terms are based on the alignment betw

10、een predicted yi Rdkand ground truth yi Rdk time series. We defi ne a warping path as a binary matrixA 0,1kkwithAh,j= 1if yh i is associ- ated to yij, and0otherwise. The set of all valid warping paths connecting the endpoints(1,1)to(k,k) 1A Seq2Seq architecture was the winner of a 2017 Kaggle compet

11、ition on multi-step time series forecasting ( 3 Figure 2: Our proposed framework for training deep forecasting models. with the authorized moves, and Tensor-Train RNN (TT-RNN) 60 trained for multi-step2 . Results in Table 4 for the traffi c dataset reveal the superiority of TT-RNN over LSTNet-rec, w

12、hich shows that dedicated multi-step prediction approaches are better suited for this task. More importantly, we can observe that our Seq2Seq DILATE outperforms TT-RNN in all shape and time metrics, although it is inferior on MSE. This highlights the relevance of our DILATE loss function, which enab

13、les to reach better performances with simpler architectures. Eval lossLSTNet-rec 30TT-RNN 60, 61Seq2Seq DILATE EuclidianMSE (x100)1.74 0.110.837 0.1061.00 0.260 ShapeDTW (x100)42.0 2.225.9 1.9923.0 1.62 Ramp (x10)9.00 0.5776.71 0.5465.93 0.235 TimeTDI (x10)25.7 4.7517.8 1.7314.4 1.58 Hausdorff2.34 1

14、.412.19 0.1252.13 0.514 Table 4: Comparison with state-of-the-art forecasting architectures trained with MSE on Traffi c, averaged over 10 runs (mean standard deviation). 5Conclusion and future work In this paper, we have introduced DILATE, a new differentiable loss function for training deep multi-

15、 step time series forecasting models. DILATE combines two terms for precise shape and temporal localization of non-stationary signals with sudden changes. We showed that DILATE is comparable to the standard MSE loss when evaluated on MSE, and far better when evaluated on several shape and timing met

16、rics. DILATE compares favourably on shape and timing to state-of-the-art forecasting algorithms trained with the MSE. For future work we intend to explore the extension of these ideas to probabilistic forecasting, for example by using bayesian deep learning 21 to compute the predictive distribution

17、of trajectories, or by embedding the DILATE loss into a deep state space model architecture suited for probabilistic forecasting. Another interesting direction is to adapt our training scheme to relaxed supervision contexts, e.g. semi-supervised 42 or weakly supervised 16, in order to perform full t

18、rajectory forecasting using only categorical labels at training time (e.g. presence or absence of change points). AknowledgementsWe would like to thank Stphanie Dubost, Bruno Charbonnier, Christophe Chaussin, Loc Vallance, Lorenzo Audibert, Nicolas Paul and our anonymous reviewers for their useful f

19、eedback and discussions. 2We use the available Github code for both methods. 9 References 1Abubakar Abid and James Zou. Learning a warping distance from unlabeled time series using sequence autoencoders. In Advances in Neural Information Processing Systems (NeurIPS), pages 1054710555, 2018. 2Nguyen

20、Hoang An and Duong Tuan Anh. Comparison of strategies for multi-step-ahead prediction of time series using neural network. In International Conference on Advanced Computing and Applications (ACOMP), pages 142149. IEEE, 2015. 3Yukun Bao, Tao Xiong, and Zhongyi Hu. Multi-step-ahead time series predict

21、ion using multiple- output support vector regression. Neurocomputing, 129:482493, 2014. 4Richard Bellman. On the theory of dynamic programming. Proceedings of the National Academy of Sciences of the United States of America, 38(8):716, 1952. 5Anastasia Borovykh, Sander Bohte, and Cornelis W Oosterle

22、e. Conditional time series forecast- ing with convolutional neural networks. arXiv preprint arXiv:1703.04691, 2017. 6George EP Box, Gwilym M Jenkins, Gregory C Reinsel, and Greta M Ljung. Time series analysis: forecasting and control. John Wiley & Sons, 2015. 7Rohitash Chandra, Yew-Soon Ong, and Chi

23、-Keong Goh. Co-evolutionary multi-task learning with predictive recurrence for multi-step chaotic time series prediction. Neurocomputing, 243:2134, 2017. 8Wei-Cheng Chang, Chun-Liang Li, Yiming Yang, and Barnabs Pczos. Kernel change-point detection with auxiliary deep generative models. In Internati

24、onal Conference on Learning Representations (ICLR), 2019. 9Sucheta Chauhan and Lovekesh Vig. Anomaly detection in ECG time signals via deep long short-term memory networks. In International Conference on Data Science and Advanced Analytics (DSAA), pages 17. IEEE, 2015. 10Yanping Chen, Eamonn Keogh,

25、Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen, and Gustavo Batista. The UCR time series classifi cation archive. 2015. 11Kyunghyun Cho, Bart Van Merrinboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN enco

26、der- decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014. 12Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. RETAIN: An interpretable predictive model for healthcare using reverse time attention mechanism. In Advances in N

27、eural Information Processing Systems (NIPS), pages 35043512, 2016. 13Marco Cuturi and Mathieu Blondel. Soft-dtw: a differentiable loss function for time-series. In International Conference on Machine Learning (ICML), pages 894903, 2017. 14Xiao Ding, Yue Zhang, Ting Liu, and Junwen Duan. Deep learnin

28、g for event-driven stock prediction. In International Joint Conference on Artifi cial Intelligence (IJCAI), 2015. 15Thibaut Durand, Nicolas Thome, and Matthieu Cord. Mantra: Minimum maximum latent structural svm for image classifi cation and ranking. In IEEE International Conference on Computer Visi

29、on (ICCV), pages 27132721, 2015. 16Thibaut Durand, Nicolas Thome, and Matthieu Cord. Exploiting negative evidence for deep latent structured models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2):337351, 2018. 17James Durbin and Siem Jan Koopman. Time series analysis by state

30、space methods. Oxford university press, 2012. 10 18Anthony Florita, Bri-Mathias Hodge, and Kirsten Orwig. Identifying wind and solar ramping events. In 2013 IEEE Green Technologies Conference (GreenTech), pages 147152. IEEE, 2013. 19Ian Fox, Lynn Ang, Mamta Jaiswal, Rodica Pop-Busui, and Jenna Wiens

31、. Deep multi-output forecasting: Learning to accurately predict blood glucose trajectories. In ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 13871395. ACM, 2018. 20Laura Fras-Paredes, Fermn Mallor, Martn Gastn-Romeo, and Teresa Len. Assessing energy forecasting inac

32、curacy by simultaneously considering temporal and absolute errors. Energy Conversion and Management, 142:533546, 2017. 21Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning (ICML), page

33、s 10501059, 2016. 22Damien Garreau, Sylvain Arlot, et al. Consistent change-point detection with kernels. Electronic Journal of Statistics, 12(2):44404486, 2018. 23Amir Ghaderi, Borhan M Sanandaji, and Faezeh Ghaderi. Deep forecast: Deep learning-based spatio-temporal forecasting. In ICML Time Serie

34、s Workshop, 2017. 24Agathe Girard and Carl Edward Rasmussen. Multiple-step ahead prediction for non linear dynamic systems - a gaussian process treatment with propagation of the uncertainty. In Advances in neural information processing systems (NIPS), volume 15, pages 529536, 2002. 25Sepp Hochreiter

35、 and Jrgen Schmidhuber. Long short-term memory. Neural Computing, 9(8):17351780, November 1997. 26Shamina Hussein, Rohitash Chandra, and Anuraganand Sharma. Multi-step-ahead chaotic time series prediction using coevolutionary recurrent neural networks. In IEEE Congress on Evolutionary Computation (C

36、EC), pages 30843091. IEEE, 2016. 27Rob Hyndman, Anne B Koehler, J Keith Ord, and Ralph D Snyder. Forecasting with exponential smoothing: the state space approach. Springer Science & Business Media, 2008. 28Young-Seon Jeong, Myong K Jeong, and Olufemi A Omitaomu. Weighted dynamic time warping for tim

37、e series classifi cation. Pattern Recognition, 44:22312240, 2011. 29Vitaly Kuznetsov and Zelda Mariet. Foundations of sequence-to-sequence modeling for time series. In International Conference on Artifi cial Intelligence and Statistics (AISTATS), 2019. 30Guokun Lai, Wei-Cheng Chang, Yiming Yang, and

38、 Hanxiao Liu. Modeling long-and short- term temporal patterns with deep neural networks. In ACM SIGIR Conference on Research & Development in Information Retrieval, pages 95104. ACM, 2018. 31Nikolay Laptev, Jason Yosinski, Li Erran Li, and Slawek Smyl. Time-series extreme event forecasting with neur

39、al networks at Uber. In International Conference on Machine Learning (ICML), number 34, pages 15, 2017. 32Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Wang Yu-Xiang, and Yan Xifeng. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In

40、 Advances in neural information processing systems (NeurIPS), 2019. 33Shuang Li, Yao Xie, Hanjun Dai, and Le Song. M-statistic for kernel change-point detection. In Advances in Neural Information Processing Systems (NIPS), pages 33663374, 2015. 34Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. Diff

41、usion convolutional recurrent neural net- work: Data-driven traffi c forecasting. In International Conference on Learning Representations (ICLR), 2018. 35 Yisheng Lv, Yanjie Duan, Wenwen Kang, Zhengxi Li, and Fei-Yue Wang. Traffi c fl ow prediction with big data: a deep learning approach. IEEE Trans

42、actions on Intelligent Transportation Systems, 16(2):865873, 2015. 11 36Shamsul Masum, Ying Liu, and John Chiverton. Multi-step time series forecasting of electric load using machine learning models. In International Conference on Artifi cial Intelligence and Soft Computing, pages 148159. Springer,

43、2018. 37Arthur Mensch and Mathieu Blondel. Differentiable dynamic programming for structured prediction and attention. International Conference on Machine Learning (ICML), 2018. 38Sebastian Nowozin. Optimal decisions from probabilistic models: the intersection-over-union case. In IEEE Conference on

44、Computer Vision and Pattern Recognition (CVPR), pages 548555, 2014. 39Yao Qin, Dongjin Song, Haifeng Cheng, Wei Cheng, Guofei Jiang, and Garrison W Cottrell. A dual-stage attention-based recurrent neural network for time series prediction. In International Joint Conference on Artifi cial Intelligenc

45、e (IJCAI), pages 26272633. AAAI Press, 2017. 40Syama Sundar Rangapuram, Matthias W Seeger, Jan Gasthaus, Lorenzo Stella, Yuyang Wang, and Tim Januschowski. Deep state space models for time series forecasting. In Advances in Neural Information Processing Systems (NeurIPS), pages 77857794, 2018. 41Fra

46、nois Rivest and Richard Kohar. A new timing error cost function for binary time series prediction. IEEE transactions on neural networks and learning systems, 2019. 42 Thomas Robert, Nicolas Thome, and Matthieu Cord. Hybridnet: Classifi cation and reconstruc- tion cooperation for semi-supervised lear

47、ning. In European Conference on Computer Vision (ECCV), pages 153169, 2018. 43Hiroaki Sakoe and Seibi Chiba. Dynamic programming algorithm optimization for spoken word recognition. Readings in speech recognition, 159:224, 1990. 44David Salinas, Valentin Flunkert, and Jan Gasthaus. DeepAR: Probabilis

48、tic forecasting with autoregressive recurrent networks. In International Conference on Machine Learning (ICML), 2017. 45Matthias W Seeger, David Salinas, and Valentin Flunkert. Bayesian intermittent demand forecasting for large inventories. In Advances in Neural Information Processing Systems (NIPS)

49、, pages 46464654, 2016. 46Rajat Sen, Yu Hsiang-Fu, and Dhillon Inderjit. Think globally, act locally: a deep neural network approach to high dimensional time series forecasting. In Advances in Neural Information Processing Systems (NeurIPS), 2019. 47Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequ

50、ence to sequence learning with neural networks. In Advances in neural information processing systems (NIPS), pages 31043112, 2014. 48Souhaib Ben Taieb and Amir F Atiya. A bias and variance analysis for multistep-ahead time series forecasting. IEEE Transactions on Neural Networks and Learning Systems

51、, 27(1):6276, 2016. 49Souhaib Ben Taieb, Gianluca Bontempi, Amir F Atiya, and Antti Sorjamaa. A review and com- parison of strategies for multi-step ahead time series forecasting based on the NN5 forecasting competition. Expert systems with applications, 39(8):70677083, 2012. 50Yunzhe Tao, Lin Ma, W

52、eizhong Zhang, Jian Liu, Wei Liu, and Qiang Du. Hierarchical attention- based recurrent highway networks for time series prediction. arXiv preprint arXiv:1806.00685, 2018. 51Charles Truong, Laurent Oudre, and Nicolas Vayatis. Supervised kernel change point detec- tion with partial annotations. In In

53、ternational Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 31473151. IEEE, 2019. 52Loc Vallance, Bruno Charbonnier, Nicolas Paul, Stphanie Dubost, and Philippe Blanc. To- wards a standardized procedure to assess solar forecast accuracy: A new ramp and time alignment metric. So

54、lar Energy, 150:408422, 2017. 12 53Aron Van Den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew W Senior, and Koray Kavukcuoglu. WaveNet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016. 54Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论