Papers
arxiv:2512.11251

Insight Miner: A Time Series Analysis Dataset for Cross-Domain Alignment with Natural Language

Published on Dec 12
· Submitted by
Yunkai Zhang
on Dec 19
Authors:
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Insight Miner, a large-scale multimodal model, generates high-quality time-series descriptions using a novel agentic workflow and outperforms existing models with the help of the TS-Insights dataset.

AI-generated summary

Time-series data is critical across many scientific and industrial domains, including environmental analysis, agriculture, transportation, and finance. However, mining insights from this data typically requires deep domain expertise, a process that is both time-consuming and labor-intensive. In this paper, we propose Insight Miner, a large-scale multimodal model (LMM) designed to generate high-quality, comprehensive time-series descriptions enriched with domain-specific knowledge. To facilitate this, we introduce TS-InsightsAvailable at \href{https://huggingface.co/datasets/zhykoties/time-series-language-alignment{https://huggingface.co/datasets/zhykoties/time-series-language-alignment}.}, the first general-domain dataset for time series and language alignment. TS-Insights contains 100k time-series windows sampled from 20 forecasting datasets. We construct this dataset using a novel agentic workflow, where we use statistical tools to extract features from raw time series before synthesizing them into coherent trend descriptions with GPT-4. Following instruction tuning on TS-Insights, Insight Miner outperforms state-of-the-art multimodal models, such as LLaVA liu2023llava and GPT-4, in generating time-series descriptions and insights. Our findings suggest a promising direction for leveraging LMMs in time series analysis, and serve as a foundational step toward enabling LLMs to interpret time series as a native input modality.

Community

Paper submitter
•
edited 1 day ago

Can LLMs natively understand time series? We constructed TS-Insights, the first large-scale alignment dataset containing 100k time series and text pairs across 20 domains. We use a novel agentic workflow to synthesize high-quality trend descriptions. Inspired by LLaVA, we showed instruction-tuning on TS-Insights can enable LLMs to understand time series as a native input modality and generate textual descriptions.

  • This work was originally done in Summer 2023.

arXiv lens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/insight-miner-a-time-series-analysis-dataset-for-cross-domain-alignment-with-natural-language-7698-7e56892b

  • Key Findings
  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2512.11251 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.11251 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.