
I tried running NVIDIA's time series foundation model NV-Tesseract on DGX Spark after it was released as open source
This page has been translated by machine translation. View original
Introduction
Hello, I'm Morishige from Classmethod's Manufacturing Business Technology Department.
NVIDIA has publicly released the time series foundation model NV-Tesseract as OSS on GitHub under Apache 2.0. The model weights on Hugging Face (nvidia/nv-tesseract-forecasting / nvidia/nv-tesseract-ad-diffusion) are also publicly distributed and can be run as-is without an HF token.
I previously touched on NV-Tesseract in Comparing Time Series Foundation Models on DGX Spark (2026-05-25), alongside Chronos-2 and TimesFM 2.5. At the time it was not yet publicly available, so the article was limited to introducing official information and Cognite's Celanese use case. Now that it has been released as OSS and can be run locally, I've organized the logs from running it on DGX Spark (NVIDIA GB10, aarch64, Blackwell sm_121).
What is NV-Tesseract
NV-Tesseract is NVIDIA's time series foundation model library, with a brief repository description that reads an open-source time series analysis library covering forecasting and anomaly detection. It takes the stance of providing both "time series forecasting" and "anomaly detection" in a single library.
The repository is broadly composed of two modules.
| Module | Role | Key Technologies |
|---|---|---|
forecasting/ |
Multivariate time series forecasting + context augmentation + interpretability | MOMENT-1-large encoder + forecasting head, DARR mode, Lag-Horizon Attribution |
ad_diffusion/ |
Diffusion model-based anomaly detection | Fast sampling via DPM-Solver, adaptive thresholding with SCS / MACS, multi-GPU support |
There is also an ad_transformer/ module in the repository for univariate anomaly detection + classification, but at this point it contains only a placeholder with no implementation. Looking forward to its future release.
Forecasting Module
Looking at the nvidia/nv-tesseract-forecasting model card on Hugging Face, the base is MOMENT-1-large (a Transformer encoder with approximately 340 million parameters), designed so that only the forecasting head is trained while the encoder and embedder weights are frozen. Supported sequence lengths are 256 / 512 / 1024 / 2048, and the training data includes the Monash Time Series Forecasting Archive / ECL / Traffic / ETTh and others, totaling 3 million data points.
A distinctive feature is the DARR (Domain-Aware Representation and Retrieval) mode, which retrieves similar past patterns via kNN and blends them into predictions. It's a design that fits use cases where you want to tune per line but don't want to retrain the model each time.
Another feature is the Lag-Horizon Attribution Matrix interpretability function. When called with interpretability=True, it outputs how much each past lag contributed to each prediction timestep as a JSON and visual PDF report.
AD-Diffusion Module
The ad_diffusion/ side is diffusion model-based anomaly detection, tracing its academic lineage to IEEE BigData 2025's Memory-Augmented Forecasting / SCS (Segmented Confidence Sequences) line. Feeding in multivariate time series directly returns row-wise anomaly scores (MAE-based).
The detection philosophy differs from the "measure by prediction residuals" approach (Chronos-2 / TimesFM style) — since it directly accepts multivariate data and returns MAE per row, there's no need to build your own residual design, sliding-window, or PCA aggregation. When building a pipeline that needs to report point anomalies in real time, implementation costs are reduced.
Sampling is compressed to around 20 steps via DPM-Solver (Lu et al., 2022, vendored as third_party/dpm-solver under the MIT license), which is on the faster end for diffusion models. Thresholds are adaptively determined by SCS / MACS (Multi-Scale Adaptive Confidence Segments), automatically adjusting thresholds based on the data.
Anticipated Primary Use Cases
To understand the design philosophy of NV-Tesseract, it's important to grasp its anticipated primary use cases. The Cognite × NVIDIA integration case study officially promoted (reaction tank water level prediction for chemical and specialty materials company Celanese) is symbolic — industrial plants handling massive multivariate data at the scale of hundreds to thousands of sensors are the primary target. Details of this case study are introduced in the previous article, so in this article I'll focus on running it locally.
License and Distribution
The license is Apache 2.0, with LICENSE / NOTICE / THIRD_PARTY_LICENSES.md properly managed. One caveat: while the Hugging Face model card body states Apache 2.0, an operational note saying "research and development only" is also written alongside it, so when adopting for production use, it's safer to get final confirmation from NVIDIA (the license file itself is Apache 2.0 with conditions permitting commercial use; the note may just be a nuance regarding the recommended operational scope).
Running It on DGX Spark aarch64
From here are logs from actually running it on DGX Spark. The verification environment is as follows.
| Item | Value |
|---|---|
| Hardware | DGX Spark (NVIDIA GB10, aarch64, 128GB UMA) |
| Kernel / Driver | Linux 6.17.0-1021-nvidia / NVIDIA driver 580.159.03 |
| CUDA | 13.0 |
| Python | 3.12.3 |
| Package Manager | uv 0.11.22 (aarch64-unknown-linux-gnu) |
| Verification Commit | 1231541 (repository 2.8 MB) |
Setup follows the official README — just git clone and run uv sync for each module.
git clone https://github.com/NVIDIA/NV-Tesseract.git
cd NV-Tesseract
# venv for forecasting
cd forecasting
uv sync --python 3.12
cd ..
# venv for ad_diffusion
cd ad_diffusion
uv sync --python 3.12
forecasting/ and ad_diffusion/ are each configured to have independent venvs, with torch dependency specifications managed separately. This difference in dependency specifications was the deciding factor in whether or not the GPU could be utilized, as described later.
Running forecasting/sdk/quick_example.py
Operation verification on the forecasting side is fully handled by the bundled forecasting/sdk/quick_example.py. It's a sample that runs three modes in sequence — standard forecasting + DARR mode + interpretability mode — against bundled ETTh-type data.
cd forecasting
uv run python sdk/quick_example.py
Upon execution, three files are auto-downloaded from Hugging Face — standardizer.pkl / moment_head_512_6hr.pt / run8_best_model_cr.pt — and forecast CSVs are output for each mode. When interpretability mode is enabled, explanation.json + explanation_report.pdf + lag_horizon_*.csv are generated under sdk/interpretability_output/run_<timestamp>/, visualizing lag contribution for each prediction timestep.
However, at the time of verification (commit 1231541), it took 6 minutes and 5 seconds on my machine. Investigating the reason, I found that the forecasting/pyproject.toml at that time specified torch as >=2.0.0 + pinned to torch==2.4.0 with mac-mps extras, and the general PyPI build of torch==2.4.0 for aarch64 + Blackwell fell back to CPU-only. Checking with torch.cuda.is_available() returned False, meaning the GB10's GPU was not being used.
>>> import torch
>>> torch.cuda.is_available()
False
>>> torch.__version__
'2.4.0+cpu'
The fix is to relax the torch dependency range on the forecasting side to torch>=2.10.0 and apply the cu130 wheel index, which enables GPU operation (torch 2.12.1+cu130 recognizes sm_121). Re-running quick_example.py in this state completes in 19 seconds with weights already cached — roughly 19 times faster than the 6 minutes 5 seconds of the CPU fallback, showing how much a single dependency specification change matters. The ad_diffusion side (described later) has a dependency range of torch>=1.13.0 with no pinning, so the GPU was available there, and that contrast confirms the issue was with the dependency specification.
Note that PR #24 from the community (relaxing to torch>=2.7.0 + removing the mac-mps pin) was addressing this issue, and I had also participated in the comments with GB10 operation data. While writing this article (2026-07-03), it was merged upstream. After doing a fresh clone of the post-merge main and trying from a clean uv sync, torch 2.12.1+cu130 was resolved and the GPU was recognized without any issues, with quick_example.py also completing in the same range of 17 seconds with weights already cached. Those trying it from now should be able to avoid this pitfall.
Running ad_diffusion/examples/quick_example.py
For the ad_diffusion side, run ad_diffusion/examples/quick_example.py. It's a sample that auto-generates sample_timeseries.csv (500 samples × 3 sensors) and performs anomaly detection with DPM-Solver 20 steps + SCS adaptive thresholding.
cd ad_diffusion
uv run python examples/quick_example.py
This ran fully on CUDA 13 + sm_121 (Blackwell) without any modifications. With torch 2.11.0+cu130 + triton 3.6.0, torch.cuda.is_available() = True and torch.cuda.get_device_capability() = (12, 1) were returned, and inference ran on the GPU.
Execution time was 34 seconds, with the breakdown approximately as follows.
| Phase | Time |
|---|---|
| Weight download from Hugging Face | 5 seconds |
| Synthetic data generation | A few seconds |
| DPM-Solver 20-step inference | 26 seconds |
| Post-processing + result output | A few seconds |
500 samples × 3 sensors in 26 seconds works out to approximately 52 ms/sample. Detection results were 16 out of 500 samples (3.20%). Since the ground truth labels are auto-generated together with the synthetic data in this demo, the accuracy figures need to be re-evaluated with real data, but it's sufficient for getting a feel for the speed of point anomaly detection.
What to Look At When Using It for Business
Having been able to get hands-on with NV-Tesseract for the first time, it's easy to understand from a business evaluation perspective by looking at three axes: degree of tuning freedom, domain adaptation, and interpretability.
Fine-tuning Hyperparameters in Detail
You can finely adjust inference parameters and post-processing to suit your use case. The measured 52 ms/sample mentioned earlier is with default settings, and there's still room to optimize for AD-Diffusion by adjusting DPM sampling steps, FP8/INT8 quantization, and batching. The freedom that comes with OSS also allows you to write your own threshold, window, and post-processing aggregation to align with the same evaluation axes as Chronos-2 / TimesFM.
Improving Accuracy Through Domain Adaptation (Fine-tuning)
examples/finetune_example.py is bundled for both forecasting and ad_diffusion, with standard support for warm-starting the Cross-Channel head (run8_best_model_cr.pt). When comparing foundation models on public benchmarks (mild general-purpose time series like ETTh1), raw win/loss results may vary, but even if you lose there, you have the option of fine-tuning on your own domain data to adapt it toward your primary use case. As mentioned earlier, NV-Tesseract's intended primary use case is large-scale multivariate sensor data, so I think it's a design where comparing with domain adaptation is more revealing of its true nature than forcing comparisons on general benchmarks.
Meeting Accountability Requirements with Interpretability Reports
As seen in the forecasting quick_example, simply adding interpretability=True outputs lag contributions as a JSON and PDF report. In manufacturing settings where you need to explain "why did it produce this forecast," the ability to output interpretable materials alongside model outputs is a strength. Compared to foundation model-only approaches like Chronos-2 / TimesFM 2.5, where you need to separately set up SHAP or attention visualization, this difference matters in practice.
Summary
NV-Tesseract has now been released as OSS under Apache 2.0, provided with all elements in place: public Hugging Face distribution, bundled examples/finetune_example.py, and interpretability via Lag-Horizon Attribution. I was able to confirm that both forecasting and ad_diffusion run on actual DGX Spark hardware, and it has joined the lineup of time series foundation model options.
Looking at the NVIDIA ecosystem as a whole, a three-layer structure is taking shape: DAQIRI (an I/O platform that streams from sensors to GPU memory with zero-copy) at the data acquisition layer, external platforms like Cognite Data Fusion at the data infrastructure layer, and NV-Tesseract at the inference layer. The vision of building "sensor → zero-copy GPU transfer → time series inference → knowledge graph integration → agent automation" as a single stack in manufacturing settings is starting to become a realistic prospect, spurred by this OSS release.
Reference Links
NVIDIA Official
- NVIDIA/NV-Tesseract GitHub Repository
- Hugging Face: nvidia/nv-tesseract-forecasting
- Hugging Face: nvidia/nv-tesseract-ad-diffusion
- New NVIDIA NV-Tesseract Time Series Models (NVIDIA Developer Blog)
- NVIDIA DAQIRI — Sensor → GPU zero-copy I/O platform
Cognite × NVIDIA Integration
- Cognite and NVIDIA Operationalize NV-Tesseract to Transform Industrial Forecasting (Announced 2026-03-16, includes Celanese case study)
