Modular Deep Learning for Multivariate Time-Series: Decoupling Imputation and Downstream Tasks

arXiv:2411.03941v3 Announce Type: replace-cross Abstract: Missing values are pervasive in large-scale time-series data, posing challenges for reliable analysis and decision-making. Many neural architectures have been designed to model and impute the complex and heterogeneous missingness patterns ...

arXiv:2411.03941v3 Announce Type: replace-cross Abstract: Missing values are pervasive in large-scale time-series data, posing challenges for reliable analysis and decision-making. Many neural architectures have been designed to model and impute the complex and heterogeneous missingness patterns of such data. Most existing methods are end-to-end, rendering imputation tightly coupled with downstream predictive tasks and leading to limited reusability of the trained model, reduced interpretability, and challenges in assessing model quality. In this paper, we call for a modular approach that decouples imputation and downstream tasks, enabling independent optimisation and greater adaptability. Using the largest open-source Python library for deep learning-based time-series analysis, PyPOTS, we evaluate a modular pipeline across six state-of-the-art models that perform imputation and prediction on seven datasets spanning multiple domains. Our results show that a modular approach maintains high performance while prioritising flexibility and reusability - qualities that are crucial for real-world applications. Through this work, we aim to demonstrate how modularity can benefit multivariate time-series analysis, achieving a balance between performance and adaptability.