Detecting documentation gaps before AI model release? A breakdown of how the NVIDIA MCG Toolkit works

Detecting documentation gaps before AI model release? A breakdown of how the NVIDIA MCG Toolkit works

2026.05.30

This page has been translated by machine translation. View original

Hello, I'm Mori Shige from Classmethod's Manufacturing Business Technology Department.

A post titled How to Automate AI Model Documentation with the NVIDIA MCG Toolkit was published on NVIDIA's technical blog. It introduces a tool for automatically generating AI model documentation—so-called model cards—and as I read through it, I found myself thinking, "This is interesting not just as a generation tool, but as a way to find gaps in documentation before release," so I decided to organize my thoughts on it.

https://developer.nvidia.com/blog/how-to-automate-ai-model-documentation-with-the-nvidia-mcg-toolkit/

Since the MCG Toolkit itself is still in early access and not yet in a state where I can run it locally, I've summarized this from a slightly broader perspective than a mere tool introduction—focusing on how to systematize accountability when publishing or adopting AI models.

The Era of Requiring Pre-Release Documentation for AI Models Has Arrived

Writing READMEs and API documentation when releasing software has become standard practice. The same thing is now beginning to be required of AI models as well.

The backdrop is regulatory movement. California's AB-2013 (Generative Artificial Intelligence: Training Data Transparency Act) takes effect on January 1, 2026, and requires generative AI developers to disclose information such as an overview of training data, the number of data points, the presence of copyrighted works, and how personal information is handled. The EU AI Act also requires transparency obligations and publication of copyright summaries for training data, and a trend is solidifying where simply "building and releasing a model" is no longer sufficient.

The items that need to be explainable include roughly the following:

  • What the model is for (intended use)
  • What data it was trained on
  • Which use cases it is and isn't suited for
  • What its limitations and risks are
  • How bias, privacy, and safety are handled

The document that organizes these has traditionally been called a "model card." NVIDIA has been pursuing transparency efforts through model cards for some time, and the MCG Toolkit is a natural extension of that. The intended audience is not limited to developers. It also includes procurement managers, risk assessors, and policy staff—people who decide whether or not to adopt a model. This is where AI model documentation differs slightly from software READMEs: AI model documentation is becoming a document written with the assumption that it will be audited.

Model Cards and the Model Card++ Format

Traditional model cards were documents that summarized a model's use cases, performance, constraints, and license on a single sheet. What the MCG Toolkit generates is an expanded format called Model Card++.

Model Card++ has an Overview plus four sub-cards.

The key point is that it doesn't stop at a simple model overview—it has bias, explainability, privacy, and safety and security as independent sections covering risk and accountability. These are precisely the areas that regulations care about, so it makes sense that the format is built around filling in these four sections from the start.

One more detail that's small but important: the output passes through a CycloneDX-compliant structured JSON before being converted to Markdown. CycloneDX is the standard used in SBOM (Software Bill of Materials), which makes it easier to integrate AI model documentation in a machine-readable format with other systems. The idea is that you can produce both human-readable Markdown and machine-processable JSON.

What Is the NVIDIA MCG Toolkit?

MCG stands for Model Card Generator, and as the name suggests, it is a tool for generating model cards. Its biggest feature is that rather than writing from scratch, it reads directly from existing sources to generate documentation.

It accepts two types of input:

  • URL specification: GitHub / GitLab / Hugging Face / any public web page
  • File upload: ZIP / PDF / DOCX / Markdown

In other words, the idea is that you can simply feed in an existing repository or materials and it will produce a draft model card from them. In addition to an interactive UI, a REST API is also available, making it usable for both manual review and CI/CD integration.

While the MCG Toolkit itself is currently in early access, the Model Card++ template and various transparency cards that serve as the basis for its outputs are published as open source in the NVIDIA/Trustworthy-AI repository. You can try out the templates without waiting for the tool itself.

Reading the Mechanism as a Three-Stage Pipeline

The MCG Toolkit is a containerized pipeline consisting of three stages: Ingestion → Extraction → Rendering. A central orchestrator receives requests for URLs or files and calls each stage in sequence.

The official NVIDIA architecture diagram is summarized in Figure 1 of the original article, so I recommend taking a look at it to get a sense of the overall flow. Here is a rough walkthrough of each stage.

In Ingestion, inputs are fetched, split into chunks, and classified into documents, configuration files, and code. In Extraction, those classified chunks are passed through a RAG (Retrieval-Augmented Generation) pipeline. Here, Nemotron RAG running on NVIDIA's inference microservices (NIM) handles embedding (llama-nemotron-embed-1b-v2) and reranking (llama-nemotron-rerank-500m-v2), while GPT-OSS-120B handles the core extraction. A subtle refinement is the use of separate retrievers for code, configuration, and documents, giving priority to sources with richer signals.

What I found most compelling here is the inclusion of a validation step rather than directly accepting the extraction results. A check runs before the output is finalized as JSON, and fields that cannot be filled with confidence are not guessed at. They are explicitly marked as "not found" or "information not available." Finally, in Rendering, the structured JSON is fed into the Markdown template to produce the finished output containing the Overview and four sub-cards.

Less a Generation Tool, More a Way to Find Documentation Gaps

The MCG Toolkit is a "tool that writes model cards," but it can equally be used as a "tool that visualizes how incomplete documentation is before release."

This is clearly demonstrated in experimental results published by NVIDIA itself. First, here are the results for MC++ Overview generation when public model repositories were run through a standard test.

Model Generation Time Completion Accuracy
NVIDIA Nemotron Nano 8B 56 sec 97% 92%
NVIDIA Cosmos Reason 2 86 sec 94% 82%
NVIDIA Parakeet 65 sec 92% 87%
NVIDIA Proteina 52 sec 94% 82%
Third-party models (DeepSeek-V3 / Evo2 / Gemma / Llama) ~80 sec ~89% ~80%

Completion is the proportion of fields filled with meaningful content, and accuracy is the proportion of non-placeholder answers that were correct. Most repositories completed in around one minute, with completion rates of 92–97% and accuracy of 80–92% for NVIDIA models, and overall completion of 91% / accuracy of 76%.

What's interesting here is what happens when documentation is removed. When all documentation (.pdf / .md / .txt) was stripped from the same repositories, leaving only code, and the repositories were reprocessed, the average completion rate across five models dropped from 91% to 61%, and the strict accuracy measured only on verifiable fields dropped from 76% to 28%.

I think it's important to think carefully about what these numbers mean. The reason completion stays at 61% even without documentation is that the tool can still extract a reasonable amount of meaningful information from code, configuration files, and repository structure alone. On the other hand, the significant drop in accuracy shows that documentation contributes heavily to filling fields correctly. In other words, good READMEs and configuration files produce good model cards.

And this is where the validation behavior described earlier becomes relevant. If information is insufficient, the tool shows it as a gap rather than guessing, so looking at the finished model card immediately reveals which areas are heading toward release without adequate explanation. Practical applications might include the following:

  • Pre-release checks to identify documentation gaps just before publication
  • Integration into CI/CD release gates that block releases when too many fields are empty
  • Due diligence material when adopting third-party models, measuring the completeness of explanations
  • A starting point for customer-facing documentation

The fact that it "confronts you with what's missing" rather than just "generating and finishing" is valuable for teams in the middle of writing documentation.

Viewing This Through the Lens of AI Governance and Accountability

Let me revisit the four sub-cards (Bias / Explainability / Privacy / Safety & Security) from the perspective of regulatory compliance.

What AB-2013 and the EU AI Act require, broadly speaking, is the ability to explain "what this model was trained on, what risks it carries, and how it can be used safely." The sub-cards of Model Card++ correspond almost one-to-one with these questions. Privacy covers data handling and personal information, Bias covers analysis of skew, Safety & Security covers safety and security, and Explainability covers the ability to explain decision rationale. When you're unsure what to write to meet regulatory requirements, this very division of sections functions as a checklist.

Another thing I find important from a practitioner's perspective is that this is not solely a document for the development team. In manufacturing projects as well, decisions about whether to adopt AI are not made by on-site engineers alone—quality assurance, procurement, and sometimes legal are involved too. For those people to judge "is this model safe to use?", a structured model card works as a common language. Hand-written READMEs tend to vary in granularity depending on the author, but having everything in a consistent format makes comparison and auditing much easier.

The Design for Self-Hosted Deployment and How to Extend Its Applications

When dealing with governance documents, where the data is processed cannot be ignored. In many situations, sending model code and internal materials to an external SaaS is something you'd want to avoid from a confidentiality standpoint.

The MCG Toolkit is designed with this concern in mind. It is provided as a containerized service that can be set up with a single command. The orchestrator, Ingestion, Extraction, and sub-card generation stages each run as separate containers, and a database and task queue are included as well. There is no lock-in to a specific cloud, and it is said to run on-premises, in a private cloud, or on Kubernetes. The ability to keep everything under your own management when handling sensitive models or internal code is a significant advantage.

As a real-world example, Oracle is one of the first partners to integrate it into production infrastructure. The configuration involves placing MCG pods and NIM pods on OCI's Kubernetes (OKE), using Llama-3.3-Nemotron-Super-49B-v1 for the extraction model, having Nemotron RAG handle embedding and reranking, and hosting and validating GPT-OSS-120B on both a dedicated AI cluster (2×H100) and on-demand. It's a fairly substantial configuration that conveys the assumption of real production use.

What I find interesting about extending its applications is that there are three axes along which components can be swapped.

  • Models: The endpoints for language models, embedding, and reranking can be swapped. Different NIM instances or compatible APIs can be specified to meet performance, cost, or data residency requirements.
  • Templates: Since the output is determined by Markdown templates, they can be swapped to match not just Model Card++ but also internal standards or new regulatory formats, without touching the extraction logic.
  • Guides: Field-level guides specifying what to extract and how to write it can be updated as a knowledge base. Even when regulations or industry requirements change, you can keep up without touching the core code.

With this level of separation, use cases emerge beyond simply using Model Card++ as-is—such as adapting it into a template for an internal AI review process or an industry-specific checklist. The design philosophy of updating templates and guides rather than the pipeline when new disclosure requirements arise is practitioner-friendly.

Summary

I've organized the NVIDIA MCG Toolkit not as a tool introduction but from the perspective of "how to systematize the process of publishing and auditing AI models." Here are four key takeaways.

First, the era of requiring pre-release documentation for AI models has arrived. With AB-2013 and the EU AI Act as backdrop, model cards are becoming documents written with the assumption that they will be audited.

Second, the MCG Toolkit generates Model Card++ from existing repositories and materials, and is built to fill in not just the Overview but also the accountability sections of bias, explainability, privacy, and safety from the outset.

Third, this tool can be used not only as a generator but as a way to identify documentation gaps. When information is insufficient, it returns "not found" rather than guessing, so you can see before release which areas lack explanation. The figures showing accuracy dropping from 76% to 28% simply by removing documentation reaffirm the obvious: good documentation makes good model cards.

Fourth, it is containerized and can run entirely on-premises or in a private cloud, and because models, templates, and guides can be swapped, it can be grown to fit internal AI governance and review processes.

Situations where you need to judge "is it okay to adopt this model?" or "is it safe to release our own model?" will certainly increase going forward. In those situations, having a mechanism to organize explanations in a consistent format and systematically identify what's missing should make discussions considerably easier to advance. This was an article that gave me a sense that not just the development of AI models themselves, but also the process of releasing them has become a target for automation. Once the tool becomes more widely available, I'd love to actually run it and see for myself.


生成AI活用はクラスメソッドにお任せ

過去に支援してきた生成AIの支援実績100+を元にホワイトペーパーを作成しました。御社が抱えている課題のうち、どれが解決できて、どのようなサービスが受けられるのか?4つのフェーズに分けてまとめています。どうぞお気軽にご覧ください。

生成AI資料イメージ

無料でダウンロードする

Share this article