Detecting Documentation Gaps Before AI Model Release? I Organized How the NVIDIA MCG Toolkit Works

I organized how to automate model documentation generation with the NVIDIA MCG Toolkit, approaching it from the perspective of "a tool for finding gaps in documentation before publication" rather than as a generation tool. Here I summarize how to systematize AI model accountability, from regulatory compliance to practical operations.
森茂洋 / Hiroshi Morishige
2026.05.30
This page has been translated by machine translation. View original
Hello, I'm Morishige from Classmethod's Manufacturing Business Technology Department.
A article titled How to Automate AI Model Documentation with the NVIDIA MCG Toolkit was published on NVIDIA's technical blog. It introduces a tool for automatically generating AI model documentation — so-called model cards — and as I read through it, I thought "this is interesting not just as a generation tool, but as a way to find gaps in documentation before publication," so I've organized my thoughts here.
https://developer.nvidia.com/blog/how-to-automate-ai-model-documentation-with-the-nvidia-mcg-toolkit/
Since the MCG Toolkit itself is still in early access and not yet something I can run locally, this time I've summarized things from a slightly broader perspective than a tool introduction: "how do we systematize accountability when publishing or adopting AI models?"
 The Era Where AI Models Need Pre-Release DocumentationWriting a README or API documentation when releasing software has become standard practice. The same is now being demanded of AI models.
The driving force behind this is regulatory movement. California's AB-2013 (Generative Artificial Intelligence: Training Data Transparency Act), which takes effect on January 1, 2026, requires generative AI developers to disclose information such as an overview of training data, the number of data points, the presence of copyrighted works, and how personal information is handled. The EU AI Act also requires transparency obligations and publication of copyright summaries for training data, and the trend that "just building and releasing a model" isn't enough is solidifying.
The items that need to be explainable include roughly the following:
What the model is for (intended use)
What data it was trained on
What use cases it is and isn't suited for
What limitations and risks exist
How bias, privacy, and safety are handled
A document that organizes these items has traditionally been called a "model card." NVIDIA has been advancing transparency efforts through model cards for some time, and the MCG Toolkit is an extension of that work. The intended audience isn't just developers either. It includes procurement managers, risk assessors, and policy officers — people who decide whether to adopt a model. This is where AI model documentation differs somewhat from a software README: AI model documentation is becoming "documentation written with the assumption it will be audited."
 Model Cards and the Model Card++ FormatTraditional model cards were documents that summarized a model's use cases, performance, constraints, and license on a single page. What the MCG Toolkit generates is an expanded format called Model Card++.
Model Card++ has an Overview plus four subcards.
The key point is that it doesn't stop at a simple model overview — it has bias, explainability, privacy, and safety as independent sections covering risk and accountability. Since these are exactly what regulations care about, I think it's rational that the format is built with the expectation of filling in these four sections from the start.
Another small but important detail is that the output goes through CycloneDX-compliant structured JSON before being rendered as Markdown. CycloneDX is a standard used in SBOM (Software Bill of Materials), making it easier to integrate AI model documentation with other systems in a machine-readable form. The idea is that both human-readable Markdown and tool-processable JSON can be produced.
 What Is the NVIDIA MCG Toolkit?MCG stands for Model Card Generator, and as the name suggests, it's a tool that generates model cards. Its biggest feature is that rather than writing from scratch by hand, it reads directly from existing sources to generate content.
It accepts two types of input:
URL specification: GitHub / GitLab / Hugging Face / any public web page
File upload: ZIP / PDF / DOCX / Markdown
In other words, the idea is that you can feed in an existing repository or materials as-is, and it will draft a model card from them. In addition to an interactive UI, a REST API is also available, making it usable for both manual review and CI/CD integration.
While the MCG Toolkit itself is currently in early access, the Model Card++ templates and various transparency cards that serve as generation blueprints are publicly available as open source in the NVIDIA/Trustworthy-AI repository. So you can try out the templates themselves without waiting for the tool.
 Reading the Mechanism as a Three-Stage PipelineThe MCG Toolkit is a containerized pipeline consisting of three stages: Ingestion → Extraction → Rendering. A central orchestrator receives requests for URLs or files and calls each stage in sequence.
The official NVIDIA architecture diagram is summarized in Figure 1 of the original article, so looking at that alongside this should help you grasp the overall flow. Here's a rough breakdown of each stage.
In Ingestion, inputs are fetched, split into chunks, and classified into types: documents, configuration files, and code. In Extraction, those classified chunks are passed through a RAG (Retrieval-Augmented Generation) pipeline. Here, Nemotron RAG running on NVIDIA's inference microservices (NIM) handles embedding (llama-nemotron-embed-1b-v2) and reranking (llama-nemotron-rerank-500m-v2), while GPT-OSS-120B handles the core of extraction. A subtle design choice is that separate retrievers are used for code, configuration, and documents, prioritizing sources with richer signals.
What resonated with me most here is the validation step inserted before accepting extraction results. Before finalizing output as JSON, a check is performed, and fields that cannot be filled with confidence are not guessed at. They are explicitly marked as "not found" or "information not available." Finally, in Rendering, the structured JSON is fed into a Markdown template to produce the finished output containing the Overview and four subcards.
 Less a Generation Tool, More a Tool for Finding Documentation GapsThe MCG Toolkit is "a tool that writes model cards" and at the same time can be used as "a tool that visualizes how insufficient pre-publication documentation is."
This is clearly shown in experimental results published by NVIDIA itself. First, here are the MC++ Overview generation results when publicly available model repositories were run through the standard test.


Model
Generation Time
Completion
Accuracy


NVIDIA Nemotron Nano 8B
56 sec
97%
92%

NVIDIA Cosmos Reason 2
86 sec
94%
82%

NVIDIA Parakeet
65 sec
92%
87%

NVIDIA Proteina
52 sec
94%
82%

Third-party models (DeepSeek-V3 / Evo2 / Gemma / Llama)
~80 sec
~89%
~80%

Completion is the proportion of fields filled with meaningful content, and Accuracy is the proportion of non-placeholder answers that were correct. Most repositories completed in around one minute, with completion at 92–97% for NVIDIA models and accuracy in the 80–92% range, with an overall completion of 91% and accuracy of 76%.
What's interesting here is the change when documentation is removed. When all documentation (.pdf / .md / .txt) was deleted from the same repositories, leaving only code, and the repositories were reprocessed, the average completion rate across 5 models fell from 91% to 61%, and the strict accuracy measured only on verifiable fields fell from 76% to 28%.
I think how you read these numbers matters. The reason completion still remains at 61% even without documentation is that the tool can still pick up a reasonable amount of meaningful information from just code, configuration files, and repository structure. On the other hand, the large drop in accuracy shows that documentation makes a big contribution to correctly filling fields. In other words, good READMEs and configuration files make good model cards.
And this is where the validation behavior described earlier comes into play. If information is insufficient, rather than guessing to fill the gap, it presents it as a hole — so looking at the completed model card makes it immediately clear "where things are being published with insufficient explanation." In practical terms, some possible applications include:
Pre-release checks to identify documentation gaps just before publication
Incorporating into CI/CD release gates to halt when too many fields are empty
Using as screening material when adopting third-party models to measure how thorough their explanations are
Using as a draft basis for customer-facing documentation
The fact that it "confronts you with what's missing" rather than just "generating and finishing" is valuable for teams in the middle of writing documentation.
 Viewing Through the Lens of AI Governance and AccountabilityLet's take another look at the four subcards (Bias / Explainability / Privacy / Safety & Security) in the context of regulatory compliance.
What AB-2013 and the EU AI Act mentioned at the beginning broadly require is the ability to explain "what this model was trained on, what risks it carries, and how to use it safely." The Model Card++ subcards correspond almost one-to-one to these questions. Privacy covers data handling and personal information management, Bias covers analysis of biases, Safety & Security covers safety and security, and Explainability covers explainability of decision rationale. When you're unsure what to write to meet regulatory requirements, the section structure itself functions as a checklist.
Another thing I think is important from a practical standpoint is that this is not just a document for the development team. In manufacturing projects too, the decision to incorporate AI involves not just field engineers but quality assurance, procurement, and sometimes legal. Structured model cards work as a common language for people like these to judge "is it okay to use this model?" Hand-written READMEs vary in granularity depending on the person, but when everything follows a fixed format, comparison and auditing become much easier.
 A Design That Can Run in Your Own Environment and How to Extend ItSince we're dealing with governance documents, where data is processed cannot be ignored. Sending model code or internal materials to an external SaaS service is something you'd often want to avoid from a confidentiality standpoint.
The MCG Toolkit is built with this consideration in mind. It's provided as a containerized service that can be set up with a single command. The orchestrator, Ingestion, Extraction, and subcard generation stages each run as separate containers, with a database and task queue included. There is no lock-in to a specific cloud, and it's stated that it can run on-premises, on your own cloud, or on Kubernetes. Being able to handle confidential models and internal code entirely within your own control is a major advantage.
As a real-world example, Oracle is one of the first partners to integrate it into production infrastructure. The configuration involves placing MCG pods and NIM pods on OCI's Kubernetes (OKE), adopting Llama-3.3-Nemotron-Super-49B-v1 for the extraction model, having Nemotron RAG handle embedding and reranking, and GPT-OSS-120B being hosted and validated on both a dedicated AI cluster (2×H100) and on-demand. It's quite a full-scale configuration, conveying that it's designed for actual production use.
What's interesting about extending its applications is that replacements are possible along three axes:
Model: Language model, embedding, and reranking endpoints can be swapped out. You can specify different NIM or compatible APIs to match performance, cost, and data residency requirements
Template: Since output is determined by Markdown templates, you can swap them to match not just Model Card++ but also internal standards or new regulatory formats, without touching the extraction logic
Guide: Field-level guides for what to pick up and how to write can be updated as a knowledge base. When regulations or industry requirements change, you can keep up without touching the core code
With this level of separation, you can see uses beyond just using Model Card++ as-is — such as creating templates for your own AI review process or industry-specific checklists. The design philosophy that when new disclosure requirements emerge you update the templates and guides rather than the pipeline strikes me as very practical-friendly.
 SummaryI've organized the NVIDIA MCG Toolkit not as a tool introduction but from the perspective of "how do we systematize the process of publishing and auditing AI models?" Here are four key takeaways.
First, we are entering an era where AI models require pre-release documentation. Against the backdrop of AB-2013 and the EU AI Act, model cards are becoming documents written with the assumption they will be audited.
Second, the MCG Toolkit generates Model Card++ from existing repositories and materials, and is built to fill in accountability sections for bias, explainability, privacy, and safety from the start, in addition to the Overview.
Third, this tool can be used not only as a generator but also as a tool for finding documentation gaps. When information is insufficient, it returns "not found" rather than guessing, so you can see before publication where explanations are lacking. The figure showing accuracy dropping from 76% to 28% just by removing documentation reaffirms the obvious: good documentation makes good model cards.
Fourth, since it's containerized and can run entirely on-premises or in your own cloud, and since models, templates, and guides can be swapped out, it can be grown to fit your organization's AI governance and review processes.
Situations where you need to judge "is it okay to adopt this model?" or "is it safe to publish our own model?" will definitely increase going forward. In those situations, having a mechanism that aligns explanations in a fixed format and mechanically identifies what's missing should make discussions considerably easier to advance. This was an article that gave me a sense that not just the development of AI models themselves but also the publication process has become a target for automation. When the tool becomes more widely available, I'd like to actually run it and verify things for myself.
 Reference LinksHow to Automate AI Model Documentation with the NVIDIA MCG Toolkit | NVIDIA Technical Blog
NVIDIA/Trustworthy-AI (Model Card++ Templates / GitHub)
Enhancing AI Transparency and Ethical Considerations with Model Card | NVIDIA Technical Blog
NVIDIA Nemotron RAG (Hugging Face Collection)
CycloneDX
California AB-2013 (Generative AI: Training Data Transparency)
Detecting Documentation Gaps Before AI Model Release? I Organized How the NVIDIA MCG Toolkit Works

The Era Where AI Models Need Pre-Release Documentation

Model Cards and the Model Card++ Format

What Is the NVIDIA MCG Toolkit?

Reading the Mechanism as a Three-Stage Pipeline

Less a Generation Tool, More a Tool for Finding Documentation Gaps

Viewing Through the Lens of AI Governance and Accountability

A Design That Can Run in Your Own Environment and How to Extend It

Summary

Reference Links

AI白書2026 配布中

AWS Topics

Trending Topics

Products & Services

Features and Series

Model	Generation Time	Completion	Accuracy
NVIDIA Nemotron Nano 8B	56 sec	97%	92%
NVIDIA Cosmos Reason 2	86 sec	94%	82%
NVIDIA Parakeet	65 sec	92%	87%
NVIDIA Proteina	52 sec	94%	82%
Third-party models (DeepSeek-V3 / Evo2 / Gemma / Llama)	~80 sec	~89%	~80%