
I tried text generation with the Foundation Models framework
This page has been translated by machine translation. View original
The other day, I published an article on image analysis using the multimodal capabilities of Foundation Models.
However, looking back, I realized I had not yet written an article about the basic usage of text generation. I decided to write this article because I wanted to cover the basics of text generation before moving on to multimodal.
In this article, I will introduce the basic text generation features of Foundation Models step by step. I hope it will be helpful for those who want to try similar experiments.
Verification Environment
- MacBook Pro (16-inch, 2023), Apple M2 Pro
- macOS Tahoe 26.5.1
- Xcode 27.0 Beta
- iPhone 17 Pro Simulator (iOS 27.0 Beta)
- iPhone 16e physical device (iOS 27.0 Beta)
About Text Generation with Foundation Models
The Foundation Models framework is a framework that enables on-device inference on devices equipped with Apple Intelligence, which appeared at WWDC25. Since it does not communicate with external servers, it can be used for privacy-conscious app development.
Liro Ossa presented a method for replacing diary content with emoji using Foundation Models at "try! Swift Tokyo 2026."
The main use cases for text generation include the following.
- Text summarization, paraphrasing, and proofreading
- In-app chat and question answering
- Text classification and analysis of user input
- Automatic generation of templated content
However, operation requires a device compatible with Apple Intelligence. Please refer to the Apple official page for compatible devices.
Implementation Steps
Step 1: Project Setup
Create a new iOS project in Xcode and use the FoundationModels framework. No additional SPM dependencies are required, as it can be used as a system framework.
No special configuration is needed in Info.plist.
First, add a simple screen that executes a process and displays the result as text when a button is tapped to run the sample code. The action1() part is intended to have the processes described below added to it.
import SwiftUI
import FoundationModels
struct ContentView: View {
@State private var text: String = ""
var body: some View {
ScrollView {
VStack(spacing: 16) {
Text(text)
.frame(maxWidth: .infinity, alignment: .leading)
.padding()
Button("Run", action: action1)
}
}
}
func action1() {
// Add Foundation Models processing here
}
}
Step 2: Model Availability Check and Session Creation
Use SystemLanguageModel.default to get the device's default model. Make sure to check with isAvailable whether it is available before using it. All subsequent code will be added inside action1().
// Check if the device supports Apple Intelligence
guard SystemLanguageModel.default.isAvailable else {
text = "Apple Intelligence is not available"
return
}
let session = LanguageModelSession()
Step 3: Basic Text Generation
By passing a prompt to session.respond(to:), you can retrieve the generated text with response.content.
Task {
do {
let response = try await session.respond(to: "Please tell me the appeal of iOS app development in one sentence")
text = response.content
print(response.content)
} catch {
text = "Error: \(error.localizedDescription)"
print("Error: \(error)\n\(String(reflecting: error))")
}
}
To check for variation in output, the same prompt was run 5 times. The time in parentheses is the processing time measured as the difference between Date() before and after execution.
| Response | Processing Time |
|---|---|
| The appeal of iOS app development is that it can enrich the user experience with a simple and intuitive interface. | 3347.9 ms |
| The appeal of iOS app development is that it can improve the user experience with a simple and intuitive interface. | 2953.1 ms |
| The appeal of iOS app development is that it can enrich the user experience with a simple and intuitive interface. | 3385.4 ms |
| The appeal of iOS app development is that it can achieve intuitive design and smooth performance tailored to user needs. | 2122.7 ms |
| The appeal of iOS app development is that it can leverage a simple and intuitive design along with a powerful ecosystem. | 2097.5 ms |
For the same prompt, both cases were observed: where the exact same expression was returned and where different expressions were produced. It was confirmed that the behavior is probabilistic, similar to cloud LLMs, and is not completely deterministic.
Full source code for Steps 1–3
import SwiftUI
import FoundationModels
struct ContentView: View {
@State private var text: String = ""
var body: some View {
ScrollView {
VStack(spacing: 16) {
Text(text)
.frame(maxWidth: .infinity, alignment: .leading)
.padding()
Button("Run", action: action1)
}
}
}
func action1() {
// Check if the device supports Apple Intelligence
guard SystemLanguageModel.default.isAvailable else {
text = "Apple Intelligence is not available"
return
}
let session = LanguageModelSession()
Task {
do {
let response = try await session.respond(to: "Please tell me the appeal of iOS app development in one sentence")
text = response.content
print(response.content)
} catch {
text = "Error: \(error.localizedDescription)"
print("Error: \(error)\n\(String(reflecting: error))")
}
}
}
}
Step 4: Real-time Display with Streaming
While respond(to:) returns the full text after generation is complete, using streamResponse(to:) allows you to receive the text incrementally as it is being generated. This is particularly effective for improving UX during long text generation.
Add action2() to ContentView and switch the button action from action1 to action2 to verify.
func action2() {
guard SystemLanguageModel.default.isAvailable else {
text = "Apple Intelligence is not available"
return
}
let session = LanguageModelSession()
Task {
do {
text = ""
let stream = session.streamResponse(to: "Please list 3 benefits of learning Swift")
for try await partial in stream {
text = partial.content
}
} catch {
text = "Error: \(error.localizedDescription)"
print("Error: \(error)\n\(String(reflecting: error))")
}
}
}
Since partial.content is the cumulative value of the text generated so far, real-time display can be achieved simply by overwriting with text =.
Step 5: Setting a System Prompt with instructions
Using LanguageModelSession(instructions:), you can configure character settings equivalent to a system prompt in cloud LLMs. By changing instructions, you can control the tone and granularity of responses to the same prompt.
Add action3() to ContentView to verify the behavior. First, here is the result of asking a question without instructions.
// Without instructions
let session = LanguageModelSession()
let r1 = try await session.respond(to: "What is AutoLayout? Please explain in 2-3 sentences.")
// → AutoLayout is a framework for automatically adjusting the position of UI elements in iOS and macOS app development.
// This allows you to achieve designs that accommodate different device sizes and OS versions.
// It saves the effort of manually adjusting layouts in code.
Next is the result of asking the same question with instructions configured.
// With instructions
let session = LanguageModelSession(
instructions: "You are an iOS development expert. Answer concisely using technical terminology."
)
let r2 = try await session.respond(to: "What is AutoLayout? Please explain in 2-3 sentences.")
// → AutoLayout is a framework for automatically adjusting the layout of UI elements.
// It is used with the Swift language and Objective-C, minimizing device dependency.
// It uses Constraints to control the distance and placement between elements.
Both responses had a 3-sentence structure with similar length, but the response with instructions included technical terms such as "Constraints," "device dependency," and "Swift language and Objective-C." While the change was not as dramatic as with cloud LLMs, a difference at the vocabulary level was confirmed.
Here I introduced an example of setting a "iOS development expert" persona, but a roleplay chat app can be realized with the same feature. For example, setting instructions to a samurai from the Sengoku period will return responses in a tone consistent with that character.
let session = LanguageModelSession(
instructions: "You are a samurai living in the Sengoku period. You have been granted permission to speak directly to your lord. Answer the question concisely."
)
let response = try await session.respond(to: "What is AutoLayout? Please explain in 2-3 sentences.")
// → This one does not possess such knowledge as a samurai of the Sengoku period. However, if you wish to know,
// it is a design technique used for smartphones and computers.
The full picture of action3() is as follows.
func action3() {
guard SystemLanguageModel.default.isAvailable else {
text = "Apple Intelligence is not available"
return
}
let session = LanguageModelSession(
instructions: "You are an iOS development expert. Answer concisely using technical terminology."
)
Task {
do {
let response = try await session.respond(to: "What is AutoLayout? Please explain in 2-3 sentences.")
text = response.content
} catch {
text = "Error: \(error.localizedDescription)"
print("Error: \(error)\n\(String(reflecting: error))")
}
}
}
Step 6: Structured Output with @Generable
By attaching the @Generable macro to a Swift struct, you can receive the model's output as an instance of that type. The framework converts the type information into a JSON schema and passes it to the model.
Here is an example of analyzing app review text and organizing it by category. The @Guide macro is used to communicate the meaning of properties to the model in natural language; it is not required but can be used when you want to improve output quality. The description of @Guide is written in English following the official documentation samples.
AppReviewAnalysis is defined at the top level of the file (outside ContentView).
@Generable
struct AppReviewAnalysis {
@Guide(description: "Overall sentiment: positive, negative, or neutral")
var sentiment: String
@Guide(description: "Key positive points mentioned in the review")
var positivePoints: [String]
@Guide(description: "Issues or complaints mentioned in the review")
var issues: [String]
}
Add action4() to ContentView to verify the behavior.
func action4() {
guard SystemLanguageModel.default.isAvailable else {
text = "Apple Intelligence is not available"
return
}
let session = LanguageModelSession()
Task {
let reviewText = """
It starts up fast and is easy to use. I also like the design.
However, there were too many notifications and it was hard to find the settings.
"""
do {
let response = try await session.respond(
generating: AppReviewAnalysis.self
) {
reviewText
}
print(response.content.sentiment) // → positive
print(response.content.positivePoints) // → ["Fast startup", "Easy to use", "Nice design"]
print(response.content.issues) // → ["Too many notifications and hard to find settings"]
} catch {
text = "Error: \(error.localizedDescription)"
print("Error: \(error)\n\(String(reflecting: error))")
}
}
}
The following analysis results were obtained. Running it multiple times, sentiment and issues were completely consistent, with only minor variation in the expression of positivePoints.
positive
["Fast startup", "Easy to use", "Design is to my liking"]
["Too many notifications and hard to find settings"]
The output is more stable compared to free-form text generation (Step 3). This is due to the mechanism where @Generable converts type information into a JSON schema and uses guided generation (Guided Generation) to keep the model's output within the type's constraints.
Passing a negative review with the same code changes the content of sentiment and positivePoints.
let reviewText = """
It is completely useless. It crashes every time it starts up,
and my entered data disappears too. I hope for improvements.
"""
// → negative
// → []
// → ["Crashes every time it starts up", "Entered data disappears"]
positivePoints became an empty array, and crashes and data loss were listed in issues. It was confirmed that the content of the structured data switches according to the tone of the input text.
Since unstructured text can be extracted as Swift types, it is easy to incorporate into subsequent processing. Note that @Generable type information consumes the context window. The more properties there are and the longer the @Guide descriptions, the more is consumed, so it is advisable to omit unnecessary properties.
Step 7: Multi-turn Conversation
LanguageModelSession automatically retains the conversation history within the session. By continuing to send requests to the same session, you can achieve a conversation that carries over context.
Add action5() to ContentView to verify the behavior.
func action5() {
guard SystemLanguageModel.default.isAvailable else {
text = "Apple Intelligence is not available"
return
}
let session = LanguageModelSession()
Task {
do {
// First question
let r1 = try await session.respond(to: "Please tell me the difference between SwiftUI and UIKit")
print(r1.content)
// Second question: ask while inheriting the previous context
let r2 = try await session.respond(to: "Then, which should I choose for a new project?")
print(r2.content)
text = r2.content
} catch {
text = "Error: \(error.localizedDescription)"
print("Error: \(error)\n\(String(reflecting: error))")
}
}
}
The first output was as follows. A comparison covering 4 items — developer experience, flexibility, dependencies, and learning curve — was returned.
The main differences between SwiftUI and UIKit are as follows.
1. **Developer Experience**
...(abbreviated below)
The second output was as follows.
For a new project, I recommend choosing **SwiftUI**. SwiftUI improves development efficiency and enables intuitive UI design. However, if complex features or customization are required, it is also worth considering UIKit. Please choose based on the scale and needs of your project.
In response to the second question starting with "Then, ~," an answer recommending SwiftUI was returned based on the comparison content from the first question. It was confirmed that the context within the session was carried over. However, if a new session is created, the context is reset.
In addition, combining this with the character configuration via instructions introduced in Step 5 makes it easier to create a roleplay chat app where a consistent persona continues throughout the conversation.
Operation Verification
Confirm in advance that Apple Intelligence is enabled on the physical device.
- Settings app → "Apple Intelligence & Siri" → Turn on "Apple Intelligence"
- Confirm that the language and region are set to a supported language such as English (US)
- Wait until the model download is complete
Once the above preparations are done, tapping the button will return a response after a few seconds. Since processing is done on-device, no communication with external networks occurs.
Notes
Context Window
The context window is smaller compared to cloud LLMs. Errors may occur if you continue a long conversation or pass a large amount of text at once. Since @Generable type definitions also consume context, it is best to limit properties to the minimum necessary.
Japanese Prompts
Japanese prompts often return Japanese responses, but the language is not guaranteed. If you want responses in Japanese, it is most reliable to explicitly state "Please answer in Japanese."
Troubleshooting
LanguageModelError Occurs
The following type of error may occur.
Error Domain=FoundationModels.LanguageModelError
The main causes and remedies are as follows.
| Cause | Remedy |
|---|---|
| Apple Intelligence is disabled | Turn on Apple Intelligence from Settings |
| Model download not complete | Wait for the download to complete and retry |
| Context window exceeded | Shorten the prompt or conversation history |
SensitiveContentAnalysisML Error Occurs in Simulator
When running in the simulator, the following type of error may occur.
End sanitizeText with error: Error Domain=com.apple.SensitiveContentAnalysisML Code=15
└─ SafetyGuardrailTextSanitizerBackend: Resource (Local Model Asset) unavailable error.
└─ GenerativeError Code=1020000 "Resource (Local Model Asset) unavailable error."
Error Domain=FoundationModels.LanguageModelError Code=-1
"The operation couldn't be completed. (com.apple.SensitiveContentAnalysisML error 15.)"
This error can occur even when isAvailable returns true. While isAvailable only checks the readiness state of the main language model, based on behavior inferred from the error log, all text generation passes through a safety filtering sub-model (SafetyGuardrailTextSanitizerBackend), and this error appears to occur when that model asset cannot be found.
Since the simulator's model assets use those of the host Mac, if the versions of Xcode, iOS Simulator, and macOS do not match, some components may be in a missing state.
This was resolved by testing on a physical device.
Summary
Using the Foundation Models framework, the basic text generation features were verified.
- Simple text generation with
respond(to:) - Real-time display with
streamResponse(to:) - System prompt configuration with
LanguageModelSession(instructions:) - Structured output with
@Generable+@Guide - Multi-turn conversation within the same session
As an overall impression from actually trying these out, while there are limitations compared to cloud LLMs such as not being able to select the model and a smaller context window, I found it appealing that this level of functionality is available without any external communication.
Additionally, at WWDC26, AFM 3 Core Advanced, a 20B parameter on-device model, and AFM 3 Cloud Pro for cloud inference were also announced. All verifications in this article were done with a 3B model, but even so, it provided practical responses and I felt it was sufficiently promising. I look forward to the day when I can try the 20B model through the Foundation Models API.
As a next step, for those who want to try the multimodal feature that allows images in addition to text as input, please also refer to the following article.
