I tried text generation with the Foundation Models framework

I tried text generation with the Foundation Models framework

2026.06.16

This page has been translated by machine translation. View original

The other day, I published an article on image analysis using the multimodal capabilities of Foundation Models.

https://dev.classmethod.jp/articles/foundation-models-multimodal-image-analysis/

However, looking back, I realized I had not yet written an article about the basic usage of text generation. I decided to write this article because I wanted to cover the basics of text generation before moving on to multimodal.

In this article, I will introduce the basic text generation features of Foundation Models step by step. I hope it will be helpful for those who want to try similar experiments.

Verification Environment

  • MacBook Pro (16-inch, 2023), Apple M2 Pro
  • macOS Tahoe 26.5.1
  • Xcode 27.0 Beta
  • iPhone 17 Pro Simulator (iOS 27.0 Beta)
  • iPhone 16e physical device (iOS 27.0 Beta)

About Text Generation with Foundation Models

The Foundation Models framework is a framework that enables on-device inference on devices equipped with Apple Intelligence, which appeared at WWDC25. Since it does not communicate with external servers, it can be used for privacy-conscious app development.

Liro Ossa presented a method for replacing diary content with emoji using Foundation Models at "try! Swift Tokyo 2026."

https://dev.classmethod.jp/articles/please-save-genmoji/#foundation-models

The main use cases for text generation include the following.

  • Text summarization, paraphrasing, and proofreading
  • In-app chat and question answering
  • Text classification and analysis of user input
  • Automatic generation of templated content

However, operation requires a device compatible with Apple Intelligence. Please refer to the Apple official page for compatible devices.

Implementation Steps

Step 1: Project Setup

Create a new iOS project in Xcode and use the FoundationModels framework. No additional SPM dependencies are required, as it can be used as a system framework.

No special configuration is needed in Info.plist.

First, add a simple screen that executes a process and displays the result as text when a button is tapped to run the sample code. The action1() part is intended to have the processes described below added to it.

import SwiftUI
import FoundationModels

struct ContentView: View {
    @State private var text: String = ""

    var body: some View {
        ScrollView {
            VStack(spacing: 16) {
                Text(text)
                    .frame(maxWidth: .infinity, alignment: .leading)
                    .padding()
                Button("Run", action: action1)
            }
        }
    }

    func action1() {
        // Add Foundation Models processing here
    }
}

Step 2: Model Availability Check and Session Creation

Use SystemLanguageModel.default to get the device's default model. Make sure to check with isAvailable whether it is available before using it. All subsequent code will be added inside action1().

// Check if the device supports Apple Intelligence
guard SystemLanguageModel.default.isAvailable else {
    text = "Apple Intelligence is not available"
    return
}
let session = LanguageModelSession()

Step 3: Basic Text Generation

By passing a prompt to session.respond(to:), you can retrieve the generated text with response.content.

Task {
    do {
        let response = try await session.respond(to: "Please tell me the appeal of iOS app development in one sentence")
        text = response.content
        print(response.content)
    } catch {
        text = "Error: \(error.localizedDescription)"
        print("Error: \(error)\n\(String(reflecting: error))")
    }
}

To check for variation in output, the same prompt was run 5 times. The time in parentheses is the processing time measured as the difference between Date() before and after execution.

Response Processing Time
The appeal of iOS app development is that it can enrich the user experience with a simple and intuitive interface. 3347.9 ms
The appeal of iOS app development is that it can improve the user experience with a simple and intuitive interface. 2953.1 ms
The appeal of iOS app development is that it can enrich the user experience with a simple and intuitive interface. 3385.4 ms
The appeal of iOS app development is that it can achieve intuitive design and smooth performance tailored to user needs. 2122.7 ms
The appeal of iOS app development is that it can leverage a simple and intuitive design along with a powerful ecosystem. 2097.5 ms

For the same prompt, both cases were observed: where the exact same expression was returned and where different expressions were produced. It was confirmed that the behavior is probabilistic, similar to cloud LLMs, and is not completely deterministic.

Full source code for Steps 1–3
import SwiftUI
import FoundationModels

struct ContentView: View {
    @State private var text: String = ""

    var body: some View {
        ScrollView {
            VStack(spacing: 16) {
                Text(text)
                    .frame(maxWidth: .infinity, alignment: .leading)
                    .padding()
                Button("Run", action: action1)
            }
        }
    }

    func action1() {
        // Check if the device supports Apple Intelligence
        guard SystemLanguageModel.default.isAvailable else {
            text = "Apple Intelligence is not available"
            return
        }
        let session = LanguageModelSession()

        Task {
            do {
                let response = try await session.respond(to: "Please tell me the appeal of iOS app development in one sentence")
                text = response.content
                print(response.content)
            } catch {
                text = "Error: \(error.localizedDescription)"
                print("Error: \(error)\n\(String(reflecting: error))")
            }
        }
    }
}

Step 4: Real-time Display with Streaming

While respond(to:) returns the full text after generation is complete, using streamResponse(to:) allows you to receive the text incrementally as it is being generated. This is particularly effective for improving UX during long text generation.

Add action2() to ContentView and switch the button action from action1 to action2 to verify.

func action2() {
    guard SystemLanguageModel.default.isAvailable else {
        text = "Apple Intelligence is not available"
        return
    }
    let session = LanguageModelSession()

    Task {
        do {
            text = ""
            let stream = session.streamResponse(to: "Please list 3 benefits of learning Swift")
            for try await partial in stream {
                text = partial.content
            }
        } catch {
            text = "Error: \(error.localizedDescription)"
            print("Error: \(error)\n\(String(reflecting: error))")
        }
    }
}

Since partial.content is the cumulative value of the text generated so far, real-time display can be achieved simply by overwriting with text =.

Step 5: Setting a System Prompt with instructions

Using LanguageModelSession(instructions:), you can configure character settings equivalent to a system prompt in cloud LLMs. By changing instructions, you can control the tone and granularity of responses to the same prompt.

Add action3() to ContentView to verify the behavior. First, here is the result of asking a question without instructions.

// Without instructions
let session = LanguageModelSession()
let r1 = try await session.respond(to: "What is AutoLayout? Please explain in 2-3 sentences.")
// → AutoLayout is a framework for automatically adjusting the position of UI elements in iOS and macOS app development.
//    This allows you to achieve designs that accommodate different device sizes and OS versions.
//    It saves the effort of manually adjusting layouts in code.

Next is the result of asking the same question with instructions configured.

// With instructions
let session = LanguageModelSession(
    instructions: "You are an iOS development expert. Answer concisely using technical terminology."
)
let r2 = try await session.respond(to: "What is AutoLayout? Please explain in 2-3 sentences.")
// → AutoLayout is a framework for automatically adjusting the layout of UI elements.
//    It is used with the Swift language and Objective-C, minimizing device dependency.
//    It uses Constraints to control the distance and placement between elements.

Both responses had a 3-sentence structure with similar length, but the response with instructions included technical terms such as "Constraints," "device dependency," and "Swift language and Objective-C." While the change was not as dramatic as with cloud LLMs, a difference at the vocabulary level was confirmed.

Here I introduced an example of setting a "iOS development expert" persona, but a roleplay chat app can be realized with the same feature. For example, setting instructions to a samurai from the Sengoku period will return responses in a tone consistent with that character.

let session = LanguageModelSession(
    instructions: "You are a samurai living in the Sengoku period. You have been granted permission to speak directly to your lord. Answer the question concisely."
)
let response = try await session.respond(to: "What is AutoLayout? Please explain in 2-3 sentences.")
// → This one does not possess such knowledge as a samurai of the Sengoku period. However, if you wish to know,
//    it is a design technique used for smartphones and computers.

The full picture of action3() is as follows.

func action3() {
    guard SystemLanguageModel.default.isAvailable else {
        text = "Apple Intelligence is not available"
        return
    }
    let session = LanguageModelSession(
        instructions: "You are an iOS development expert. Answer concisely using technical terminology."
    )

    Task {
        do {
            let response = try await session.respond(to: "What is AutoLayout? Please explain in 2-3 sentences.")
            text = response.content
        } catch {
            text = "Error: \(error.localizedDescription)"
            print("Error: \(error)\n\(String(reflecting: error))")
        }
    }
}

Step 6: Structured Output with @Generable

By attaching the @Generable macro to a Swift struct, you can receive the model's output as an instance of that type. The framework converts the type information into a JSON schema and passes it to the model.

Here is an example of analyzing app review text and organizing it by category. The @Guide macro is used to communicate the meaning of properties to the model in natural language; it is not required but can be used when you want to improve output quality. The description of @Guide is written in English following the official documentation samples.

AppReviewAnalysis is defined at the top level of the file (outside ContentView).

@Generable
struct AppReviewAnalysis {
    @Guide(description: "Overall sentiment: positive, negative, or neutral")
    var sentiment: String
    @Guide(description: "Key positive points mentioned in the review")
    var positivePoints: [String]
    @Guide(description: "Issues or complaints mentioned in the review")
    var issues: [String]
}

Add action4() to ContentView to verify the behavior.

func action4() {
    guard SystemLanguageModel.default.isAvailable else {
        text = "Apple Intelligence is not available"
        return
    }
    let session = LanguageModelSession()

    Task {
        let reviewText = """
            It starts up fast and is easy to use. I also like the design.
            However, there were too many notifications and it was hard to find the settings.
            """
        do {
            let response = try await session.respond(
                generating: AppReviewAnalysis.self
            ) {
                reviewText
            }
            print(response.content.sentiment)       // → positive
            print(response.content.positivePoints)  // → ["Fast startup", "Easy to use", "Nice design"]
            print(response.content.issues)          // → ["Too many notifications and hard to find settings"]
        } catch {
            text = "Error: \(error.localizedDescription)"
            print("Error: \(error)\n\(String(reflecting: error))")
        }
    }
}

The following analysis results were obtained. Running it multiple times, sentiment and issues were completely consistent, with only minor variation in the expression of positivePoints.

positive
["Fast startup", "Easy to use", "Design is to my liking"]
["Too many notifications and hard to find settings"]

The output is more stable compared to free-form text generation (Step 3). This is due to the mechanism where @Generable converts type information into a JSON schema and uses guided generation (Guided Generation) to keep the model's output within the type's constraints.

Passing a negative review with the same code changes the content of sentiment and positivePoints.

let reviewText = """
    It is completely useless. It crashes every time it starts up,
    and my entered data disappears too. I hope for improvements.
    """
// → negative
// → []
// → ["Crashes every time it starts up", "Entered data disappears"]

positivePoints became an empty array, and crashes and data loss were listed in issues. It was confirmed that the content of the structured data switches according to the tone of the input text.

Since unstructured text can be extracted as Swift types, it is easy to incorporate into subsequent processing. Note that @Generable type information consumes the context window. The more properties there are and the longer the @Guide descriptions, the more is consumed, so it is advisable to omit unnecessary properties.

Step 7: Multi-turn Conversation

LanguageModelSession automatically retains the conversation history within the session. By continuing to send requests to the same session, you can achieve a conversation that carries over context.

Add action5() to ContentView to verify the behavior.

func action5() {
    guard SystemLanguageModel.default.isAvailable else {
        text = "Apple Intelligence is not available"
        return
    }
    let session = LanguageModelSession()

    Task {
        do {
            // First question
            let r1 = try await session.respond(to: "Please tell me the difference between SwiftUI and UIKit")
            print(r1.content)

            // Second question: ask while inheriting the previous context
            let r2 = try await session.respond(to: "Then, which should I choose for a new project?")
            print(r2.content)
            text = r2.content
        } catch {
            text = "Error: \(error.localizedDescription)"
            print("Error: \(error)\n\(String(reflecting: error))")
        }
    }
}

The first output was as follows. A comparison covering 4 items — developer experience, flexibility, dependencies, and learning curve — was returned.

The main differences between SwiftUI and UIKit are as follows.

1. **Developer Experience**
   ...(abbreviated below)

The second output was as follows.

For a new project, I recommend choosing **SwiftUI**. SwiftUI improves development efficiency and enables intuitive UI design. However, if complex features or customization are required, it is also worth considering UIKit. Please choose based on the scale and needs of your project.

In response to the second question starting with "Then, ~," an answer recommending SwiftUI was returned based on the comparison content from the first question. It was confirmed that the context within the session was carried over. However, if a new session is created, the context is reset.

In addition, combining this with the character configuration via instructions introduced in Step 5 makes it easier to create a roleplay chat app where a consistent persona continues throughout the conversation.

Operation Verification

Confirm in advance that Apple Intelligence is enabled on the physical device.

  1. Settings app → "Apple Intelligence & Siri" → Turn on "Apple Intelligence"
  2. Confirm that the language and region are set to a supported language such as English (US)
  3. Wait until the model download is complete

Once the above preparations are done, tapping the button will return a response after a few seconds. Since processing is done on-device, no communication with external networks occurs.

Notes

Context Window

The context window is smaller compared to cloud LLMs. Errors may occur if you continue a long conversation or pass a large amount of text at once. Since @Generable type definitions also consume context, it is best to limit properties to the minimum necessary.

Japanese Prompts

Japanese prompts often return Japanese responses, but the language is not guaranteed. If you want responses in Japanese, it is most reliable to explicitly state "Please answer in Japanese."

Troubleshooting

LanguageModelError Occurs

The following type of error may occur.

Error Domain=FoundationModels.LanguageModelError

The main causes and remedies are as follows.

Cause Remedy
Apple Intelligence is disabled Turn on Apple Intelligence from Settings
Model download not complete Wait for the download to complete and retry
Context window exceeded Shorten the prompt or conversation history

SensitiveContentAnalysisML Error Occurs in Simulator

When running in the simulator, the following type of error may occur.

End sanitizeText with error: Error Domain=com.apple.SensitiveContentAnalysisML Code=15
  └─ SafetyGuardrailTextSanitizerBackend: Resource (Local Model Asset) unavailable error.
     └─ GenerativeError Code=1020000 "Resource (Local Model Asset) unavailable error."

Error Domain=FoundationModels.LanguageModelError Code=-1
  "The operation couldn't be completed. (com.apple.SensitiveContentAnalysisML error 15.)"

This error can occur even when isAvailable returns true. While isAvailable only checks the readiness state of the main language model, based on behavior inferred from the error log, all text generation passes through a safety filtering sub-model (SafetyGuardrailTextSanitizerBackend), and this error appears to occur when that model asset cannot be found.

Since the simulator's model assets use those of the host Mac, if the versions of Xcode, iOS Simulator, and macOS do not match, some components may be in a missing state.

This was resolved by testing on a physical device.

Summary

Using the Foundation Models framework, the basic text generation features were verified.

  • Simple text generation with respond(to:)
  • Real-time display with streamResponse(to:)
  • System prompt configuration with LanguageModelSession(instructions:)
  • Structured output with @Generable + @Guide
  • Multi-turn conversation within the same session

As an overall impression from actually trying these out, while there are limitations compared to cloud LLMs such as not being able to select the model and a smaller context window, I found it appealing that this level of functionality is available without any external communication.

Additionally, at WWDC26, AFM 3 Core Advanced, a 20B parameter on-device model, and AFM 3 Cloud Pro for cloud inference were also announced. All verifications in this article were done with a 3B model, but even so, it provided practical responses and I felt it was sufficiently promising. I look forward to the day when I can try the 20B model through the Foundation Models API.

As a next step, for those who want to try the multimodal feature that allows images in addition to text as input, please also refer to the following article.

https://dev.classmethod.jp/articles/foundation-models-multimodal-image-analysis/

References


国内企業 AI活用実態調査2026 配布中

クラスメソッドが独自に行なったAI診断調査をもとに、企業のAI活用の現在地を調査レポートとしてまとめました。企業規模別の活用度傾向に加え、規模を超えてAI活用を進める企業に共通する取り組みまで、自社の現在地を捉えるためのヒントにぜひ。

国内企業 AI活用実態調査2026

無料でダウンロードする

Share this article