I tried running NVIDIA's time series foundation model NV-Tesseract on DGX Spark after it was released as open source

With the OSS release of NV-Tesseract as a trigger, I actually ran NVIDIA's time series foundation model, now distributed under Apache 2.0, on DGX Spark. I've summarized the two-module architecture of forecasting and anomaly detection, the potential applications across three axes of hyperparameter tuning, domain adaptation, and interpretability, as well as troubleshooting during implementation.

森茂洋 / Hiroshi Morishige

2026.07.04

This page has been translated by machine translation. View original

 IntroductionHello, I'm Morishige from Classmethod's Manufacturing Business Technology Department.
NVIDIA has released its time series foundation model NV-Tesseract as open source on GitHub under Apache 2.0. The Hugging Face model weights (nvidia/nv-tesseract-forecasting / nvidia/nv-tesseract-ad-diffusion) are also publicly distributed and can be run directly without an HF token.
https://github.com/NVIDIA/NV-Tesseract
I previously mentioned NV-Tesseract alongside Chronos-2 / TimesFM 2.5 in Comparing Time Series Foundation Models on DGX Spark (2026-05-25). At the time it hadn't been publicly released, so the article was limited to introducing official information and Cognite's Celanese case study. Now that it's been released as OSS and I can run it locally, I've organized the logs from running it on DGX Spark (NVIDIA GB10, aarch64, Blackwell sm_121).
 What is NV-TesseractNV-Tesseract is NVIDIA's time series foundation model library, with a short repository description reading an open-source time series analysis library covering forecasting and anomaly detection. It takes a stance of providing both "time series forecasting" and "anomaly detection" in a single library.
The repository is broadly composed of two modules.


Module
Role
Key Technologies


forecasting/
Multivariate time series forecasting + context enhancement + interpretability
MOMENT-1-large encoder + forecasting head, DARR mode, Lag-Horizon Attribution

ad_diffusion/
Diffusion model-based anomaly detection
Fast sampling with DPM-Solver, adaptive thresholding with SCS / MACS, multi-GPU support

There is also an ad_transformer/ module in the repository for univariate anomaly detection + classification, but as of now it only contains a placeholder with no implementation. Looking forward to its future release.
 Forecasting ModuleLooking at the nvidia/nv-tesseract-forecasting model card on Hugging Face, the base is MOMENT-1-large (a Transformer encoder with approximately 340 million parameters), and the design trains only the forecasting head while keeping the encoder and embedder weights frozen. Supported sequence lengths are 256 / 512 / 1024 / 2048, and the training data includes 3 million data points from sources such as the Monash Time Series Forecasting Archive / ECL / Traffic / ETTh.
A distinctive feature is the DARR (Domain-Aware Representation and Retrieval) mode, which uses kNN to retrieve similar past patterns and blends them into predictions. It's a design that suits the use case of "wanting to tune per line without retraining the model each time."
Another feature is the Lag-Horizon Attribution Matrix interpretability function. When called with interpretability=True, it outputs which past lags contributed how much to each forecasted timestep as a JSON and visual PDF report.
 AD-Diffusion ModuleThe ad_diffusion/ side is diffusion model-based anomaly detection, in the lineage of Memory-Augmented Forecasting / SCS (Segmented Confidence Sequences) from IEEE BigData 2025. When you feed multivariate time series directly, it returns per-row anomaly scores (MAE-based).
The detection philosophy differs from the "measure by prediction residuals" approach (Chronos-2 / TimesFM family) — since it directly accepts multivariate input and returns MAE per row, there's no need to handcraft residual design, sliding-window, or PCA aggregation. When building a pipeline that needs to immediately report point anomalies, this reduces implementation costs.
Sampling is compressed to about 20 steps with DPM-Solver (Lu et al., 2022, vendored under MIT license as third_party/dpm-solver), making it relatively fast for a diffusion model. Thresholds are adaptively determined by SCS / MACS (Multi-Scale Adaptive Confidence Segments), which automatically adjusts thresholds according to the data.
 Intended Primary Use CasesTo understand the design philosophy of NV-Tesseract, it's important to understand its intended primary use cases. The officially promoted Cognite × NVIDIA integration case (reactor water level prediction for chemical/specialty materials company Celanese) is symbolic — industrial plants handling massive multivariate data at the scale of hundreds to thousands of sensors are the primary target. The details of this case were covered in the previous article, so this article focuses on running it locally.
https://dev.classmethod.jp/articles/dgx-spark-timeseries-fm-3-bench/
 License and DistributionThe license is Apache 2.0, with LICENSE / NOTICE / THIRD_PARTY_LICENSES.md properly maintained. One note: while the Hugging Face model card states Apache 2.0, it also includes an operational note of "research and development only," so it's safer to confirm with NVIDIA before production adoption (the license file itself is Apache 2.0 which permits commercial use, so the note may be more of a nuance about recommended operational scope).
 Running on DGX Spark aarch64From here, these are the logs from actually running it on DGX Spark. The verification environment is as follows.


Item
Value


Hardware
DGX Spark (NVIDIA GB10, aarch64, 128GB UMA)

Kernel / Driver
Linux 6.17.0-1021-nvidia / NVIDIA driver 580.159.03

CUDA
13.0

Python
3.12.3

Package Manager
uv 0.11.22 (aarch64-unknown-linux-gnu)

Verification Commit
1231541 (repository 2.8 MB)

Setup follows the official README — just git clone and run uv sync for each module.
git clone https://github.com/NVIDIA/NV-Tesseract.git
cd NV-Tesseract

# venv for forecasting
cd forecasting
uv sync --python 3.12
cd ..

# venv for ad_diffusion
cd ad_diffusion
uv sync --python 3.12
forecasting/ and ad_diffusion/ each have their own independent venv, with torch dependency specifications managed separately. This difference in dependency specifications turned out to be the deciding factor for "whether the GPU could be used," as described below.
 Running forecasting/sdk/quick_example.pyVerifying the forecasting side is completed with the included forecasting/sdk/quick_example.py. It's a sample that sequentially runs three modes on the included ETTh-type data: standard forecasting + DARR mode + interpretability mode.
cd forecasting
uv run python sdk/quick_example.py
Running it auto-downloads 3 files from Hugging Face — standardizer.pkl / moment_head_512_6hr.pt / run8_best_model_cr.pt — and outputs forecasting CSVs for each mode. When interpretability mode is enabled, explanation.json + explanation_report.pdf + lag_horizon_*.csv are generated under sdk/interpretability_output/run_<timestamp>/, visualizing the contribution of each lag to each forecasted timestep.
However, at the time of verification (commit 1231541), it took 6 minutes and 5 seconds on my machine. Investigating the reason, the forecasting/pyproject.toml at that time specified torch as >=2.0.0 + pinned to torch==2.4.0 with mac-mps extras, and the general PyPI build of torch==2.4.0 fell back to CPU-only mode on aarch64 + Blackwell. Checking with torch.cuda.is_available() returned False, meaning the GB10's GPU wasn't being used.
>>> import torch
>>> torch.cuda.is_available()
False
>>> torch.__version__
'2.4.0+cpu'
The fix is to loosen the torch dependency range in the forecasting side to torch>=2.10.0 and point to the cu130 wheel index, which enables GPU operation (torch 2.12.1+cu130 recognizes sm_121). Re-running quick_example.py in this state completed in 19 seconds with weights already cached — approximately 19x faster than the 6 minutes 5 seconds CPU fallback, showing how much a single dependency specification can change things. The ad_diffusion side (described below) has torch>=1.13.0 with no pinning, so it was already using the GPU — the contrast also helps isolate this as a dependency specification issue.
Note that PR #24 from the community (loosening to torch>=2.7.0 + removing the mac-mps pin) was addressing this issue, and I participated with a comment including GB10 operation data. While writing this article (2026-07-03), it was merged into upstream. After fresh cloning the post-merge main and trying from a clean uv sync, torch 2.12.1+cu130 resolved and the GPU was recognized directly, with quick_example.py completing in a similar 17 seconds with weights already cached. Anyone trying this now should be able to avoid this pitfall.
 Running ad_diffusion/examples/quick_example.pyFor the ad_diffusion side, run ad_diffusion/examples/quick_example.py. It's a sample that auto-generates sample_timeseries.csv (500 samples × 3 sensors) and performs anomaly detection with DPM-Solver 20 steps + SCS adaptive thresholding.
cd ad_diffusion
uv run python examples/quick_example.py
This worked fully with CUDA 13 + sm_121 (Blackwell) without any modifications. With torch 2.11.0+cu130 + triton 3.6.0, torch.cuda.is_available() = True and torch.cuda.get_device_capability() = (12, 1) were returned, and inference ran on the GPU.
Execution time was 34 seconds, with roughly the following breakdown.


Phase
Time


Weight download from Hugging Face
5 seconds

Synthetic data generation
A few seconds

DPM-Solver 20-step inference
26 seconds

Post-processing + output
A few seconds

500 samples × 3 sensors in 26 seconds translates to approximately 52 ms/sample. Detection results found 16 anomalies out of 500 samples (3.20%). Since this is a demo where ground truth labels are also auto-generated alongside the synthetic data, the accuracy numbers need to be re-evaluated on real data, but it's sufficient to get a feel for the speed of point anomaly detection.
 Addendum (2026-07-06): Also Tried with Official Test DatasetAt the time of the verification commit in the main text, only synthetic data was auto-generated, but the current main now includes an official test dataset for ad_diffusion examples/datasets/test-dataset.csv (a univariate CSV with 500 samples). I took the opportunity to fresh clone the latest main (commit 9b31ca4) and re-run everything from a clean uv sync.
cd ad_diffusion
uv sync
uv run python examples/quick_example.py --dataset-path examples/datasets/test-dataset.csv
Even in an environment without an HF token configured, the weight final_model.pth was automatically downloaded from the public repository. The entire execution took 36 seconds, with the DPM-Solver 20-step inference completing in 26 seconds. Detection results found 15 anomalies out of 500 samples (3.00%), with an average MAE score of 0.948 and a maximum of 1.324. At approximately 53 ms/sample, the speed is nearly identical to the synthetic data measurement in the main text.
One thing to note is that since the custom CSV has no ground truth, Precision / Recall / F1 evaluation is skipped. This behavior is also explicitly noted in the repository's README. Also, looking at the execution logs, the sequential ts column was being counted as an analysis target channel, so when passing real data, it's best to drop numerical timestamp columns like epoch seconds or sequential numbers beforehand.
 What to Look at for Business UseNow that I've been able to get hands-on with NV-Tesseract for the first time, it's easiest to understand from the perspective of three axes for business evaluation: degree of tuning freedom, domain adaptation, and interpretability.
 Fine-tunable with Hyperparameter AdjustmentsYou can fine-tune inference parameters and post-processing to your specific use case. The 52 ms/sample figure measured earlier is with default settings, and for AD-Diffusion there's still room to optimize with DPM sampling step count, FP8 / INT8 quantization, and batching. Being able to write your own threshold/window/post-processing aggregation and align it to the same evaluation axis as Chronos-2 / TimesFM is also freedom that comes with OSS.
 Accuracy Can Be Improved Through Domain Adaptation (Fine-tuning)examples/finetune_example.py is included for both forecasting and ad_diffusion, with standard support for warm-starting the Cross-Channel head (run8_best_model_cr.pt). When comparing foundation models on public benchmarks (mild general-purpose time series like ETTh1), you get raw win/loss comparisons, but even if you lose there, fine-tuning on your domain data gives you the option to adapt it toward your intended use case. As mentioned earlier, NV-Tesseract's intended primary use case is massive multivariate sensor data, so I think the design is better suited to comparison through domain adaptation rather than forcing comparison on general benchmarks.
 Interpretability Reports Can Address Accountability RequirementsAs seen in the forecasting quick_example, simply adding interpretability=True outputs lag contributions as JSON and PDF reports. In manufacturing settings where you need to explain "why this particular prediction was made," being able to produce interpretability materials alongside model output is a strength. With standalone foundation models like Chronos-2 / TimesFM 2.5, you'd need to separately set up SHAP or attention visualization, so this difference can be significant in practice.
 SummaryNV-Tesseract has now been released as OSS under Apache 2.0, provided with all the essentials in place: Hugging Face public distribution, included examples/finetune_example.py, and interpretability via Lag-Horizon Attribution. I was able to confirm that both forecasting and ad_diffusion run on actual hardware on DGX Spark, and it has joined the lineup of time series foundation model options.
Looking at the NVIDIA ecosystem as a whole, you can now sketch a three-layer architecture: DAQIRI (an I/O platform for zero-copy streaming from sensors to GPU memory) at the data acquisition layer, external platforms like Cognite Data Fusion at the data infrastructure layer, and NV-Tesseract at the inference layer. The vision of building "sensors → zero-copy GPU transfer → time series inference → knowledge graph integration → agent automation" in a single stack for manufacturing environments has become more realistic with this OSS release.
 Reference Links NVIDIA OfficialNVIDIA/NV-Tesseract GitHub Repository
Hugging Face: nvidia/nv-tesseract-forecasting
Hugging Face: nvidia/nv-tesseract-ad-diffusion
New NVIDIA NV-Tesseract Time Series Models (NVIDIA Developer Blog)
NVIDIA DAQIRI — Sensor → GPU zero-copy I/O platform
 Cognite × NVIDIA IntegrationCognite and NVIDIA Operationalize NV-Tesseract to Transform Industrial Forecasting (announced 2026-03-16, includes Celanese case study)
 Existing SeriesComparing Time Series Foundation Models on DGX Spark
Predicting PLC-style Time Series Data with Chronos-2 and Generating Maintenance Comments with Nemotron
Trying Industrial Sensor Anomaly Detection with SKAB and Time Series Foundation Models

I tried running NVIDIA's time series foundation model NV-Tesseract on DGX Spark after it was released as open source

Introduction

What is NV-Tesseract

Forecasting Module

AD-Diffusion Module

Intended Primary Use Cases

License and Distribution

Running on DGX Spark aarch64

Running forecasting/sdk/quick_example.py

Running ad_diffusion/examples/quick_example.py

Addendum (2026-07-06): Also Tried with Official Test Dataset

What to Look at for Business Use

Fine-tunable with Hyperparameter Adjustments

Accuracy Can Be Improved Through Domain Adaptation (Fine-tuning)

Interpretability Reports Can Address Accountability Requirements

Summary

Reference Links

NVIDIA Official

Cognite × NVIDIA Integration

Existing Series

AI白書2026 配布中

AWS Topics

Trending Topics

Products & Services

Features and Series

Module	Role	Key Technologies
`forecasting/`	Multivariate time series forecasting + context enhancement + interpretability	MOMENT-1-large encoder + forecasting head, DARR mode, Lag-Horizon Attribution
`ad_diffusion/`	Diffusion model-based anomaly detection	Fast sampling with DPM-Solver, adaptive thresholding with SCS / MACS, multi-GPU support

Item	Value
Hardware	DGX Spark (NVIDIA GB10, aarch64, 128GB UMA)
Kernel / Driver	Linux 6.17.0-1021-nvidia / NVIDIA driver 580.159.03
CUDA	13.0
Python	3.12.3
Package Manager	uv 0.11.22 (`aarch64-unknown-linux-gnu`)
Verification Commit	`1231541` (repository 2.8 MB)

Phase	Time
Weight download from Hugging Face	5 seconds
Synthetic data generation	A few seconds
DPM-Solver 20-step inference	26 seconds
Post-processing + output	A few seconds