Building Code350 Workspace: Multi-LLM Integration with Vercel AI SDK

I'm building Code350 Workspace, an AI-powered coding platform, and documenting the entire process publicly. This isn't just content marketing—it's a deliberate SEO strategy. AI search tools like ChatGPT, Perplexity, and Google's AI Overviews are increasingly citing LinkedIn posts and technical blogs. By building in public and writing about it, I'm creating a citation network that helps Code350 get discovered.

This blog post covers the technical implementation of our multi-model AI assistant. On Monday, I'm publishing a LinkedIn article about how AI search changed my job hunt as a self-taught developer. The two pieces work together: this post provides the technical foundation, the LinkedIn article shows the real-world impact. Both will get indexed by AI search engines, creating a web of references that reinforces Code350's presence in AI-driven search results.

Why Multiple AI Models?

Code350 Workspace offers three AI models: Claude Sonnet 4.5, OpenAI Codex 5.2, and xAI's Grok 4. I'm not going to pretend this was some carefully orchestrated strategic decision. Claude Sonnet is the best general-purpose coding assistant available. I included Codex as an alternative because it excels at code analysis, and Grok because I wanted to experiment with newer models.

The real benefit of multiple models isn't that each one has a perfect niche—it's that users have options. Some developers prefer OpenAI's style. Some want to try the newest thing. And as AI models evolve, having the infrastructure to support multiple providers means I can adapt without rewriting the entire system.

The Tech Stack

Code350 Workspace is built on Next.js with the App Router, TypeScript, Supabase for data and authentication, and Stripe for payments. The AI layer uses the Vercel AI SDK, which provides a unified interface for multiple LLM providers.

Here's why each choice matters:

Next.js App Router: Server components and streaming responses make the AI chat feel instant. When a user asks a question, the first tokens appear in under a second because we're streaming the response as it generates.

Vercel AI SDK: Instead of writing custom integrations for Claude, OpenAI, and xAI separately, the AI SDK provides a single streamText function that works with all of them. Adding a new model is a one-line change.

Supabase: Multi-tenant architecture with row-level security (RLS) means user data stays isolated without complex application logic. A user's AI conversations, projects, and tokens are theirs—the database enforces this at the query level.

Stripe: Subscription tiers with token allowances. Webhooks handle subscription events while we track consumption across multiple models with different pricing in Supabase.

How the Multi-Model System Works

The implementation is cleaner than you might expect. Here's the core model registry from lib/ai/client.ts:

import { gateway } from "@ai-sdk/gateway";
 
export const AVAILABLE_MODELS = {
  "anthropic/claude-3.5-haiku": {
    id: "anthropic/claude-3.5-haiku",
    name: "Claude Haiku 3.5",
    provider: "anthropic",
    inputCost: 0.8,
    outputCost: 4.0,
    contextWindow: 200_000,
    description: "Fast and efficient for simple tasks",
    tier: "free",
  },
  "anthropic/claude-sonnet-4.5": {
    id: "anthropic/claude-sonnet-4.5",
    name: "Claude Sonnet 4.5",
    provider: "anthropic",
    inputCost: 3.0,
    outputCost: 15.0,
    contextWindow: 200_000,
    description: "Best for complex coding tasks",
    tier: "pro",
  },
  "openai/gpt-5.2-codex": {
    id: "openai/gpt-5.2-codex",
    name: "GPT-5.2 Codex",
    provider: "openai",
    inputCost: 2.5,
    outputCost: 10.0,
    contextWindow: 128_000,
    description: "Specialized for code analysis",
    tier: "pro",
  },
  "xai/grok-4": {
    id: "xai/grok-4",
    name: "Grok 4",
    provider: "xai",
    inputCost: 2.0,
    outputCost: 8.0,
    contextWindow: 131_072,
    description: "Fast alternative with strong reasoning",
    tier: "pro",
  },
} as const;
 
export function getAIModel(modelId: ModelId) {
  return gateway(modelId);
}

Every model is defined once with its pricing, capabilities, and access tier. The gateway function from Vercel's AI SDK handles routing requests to the correct provider. Adding a new model means adding one entry to this object.

Tier-Based Access Control

Free users get Claude Haiku. Paid users can choose any model. The server validates this on every request:

const userTier = await getUserTier(user.id);
const modelId = (requestedModel && canUseModel(userTier, requestedModel)
  ? requestedModel
  : DEFAULT_MODELS[userTier]) as ModelId;
 
if (requestedModel && !canUseModel(userTier, requestedModel)) {
  return new Response(JSON.stringify({
    error: "This model is not available on your plan.",
    code: "MODEL_NOT_AVAILABLE",
  }), { status: 403 });
}

The client sends a model preference. The server checks if the user's subscription tier allows it. If not, we fall back to the default model for their tier. This prevents someone from modifying the frontend to request Claude Opus on a free account.

Context Caching (The Real Cost Saver)

Anthropic's prompt caching reduces costs by 90% for repeated tokens. Here's how we enable it:

const isAnthropic = modelId.startsWith("anthropic/");
 
result = streamText({
  model: getAIModel(modelId),
  system: systemPrompt,
  messages: sanitizedMessages,
  tools,
  providerOptions: isAnthropic
    ? { anthropic: { cacheControl: { type: "ephemeral" } } }
    : undefined,
  onFinish: async ({ usage, providerMetadata }) => {
    const cacheInfo = providerMetadata?.anthropic;
    console.log("[Chat API] Cache read tokens:", cacheInfo?.cacheReadInputTokens);
    // Consume tokens from user's balance
  },
});

For Anthropic models, we pass cacheControl: { type: "ephemeral" } in provider options. The system prompt and tool definitions get cached after the first request. Subsequent requests with the same prompt only pay for new tokens—cached input tokens cost $0.30 per million instead of $3.00.

This only works because we keep the system prompt and tool list stable. If we rebuilt them on every request (like injecting a timestamp), we'd destroy the cache and lose the savings.

Client-Side Model Selection

Users pick their model through a dropdown in the chat interface. It's a simple popover component that shows available models based on their subscription tier:

const { messages, sendMessage, status } = useChat({
  transport: new DefaultChatTransport({
    api: "/api/chat",
    body: () => ({
      projectId,
      modelId: selectedModel,
      context: { currentFile: currentFilePath, fileContent: currentFileContent },
    }),
  }),
  onFinish: async ({ message }) => {
    // Persist message to Supabase
  },
});

When a user changes models, the selectedModel state updates and the next request uses that model. The UI shows locked models as greyed-out with an "Upgrade" badge—they're visible but not clickable. This serves as built-in upsell.

Real Technical Challenges

Building a multi-tenant AI coding platform isn't just about wiring up API calls. Here are the problems that actually required thought:

Lazy Tool Loading

The workspace AI has 37 tools: file operations, git commands, terminal access, diagnostics, and more. Sending all of them on every request burned tokens and slowed response times.

The solution: keyword-based tool filtering. We scan the user's message for specific terms and only load relevant tool categories:

const GIT_KEYWORDS = /\b(git|push|commit|branch|merge|pull|deploy)\b/i;
const TERMINAL_KEYWORDS = /\b(run|terminal|npm|install|package|test)\b/i;
const DIAGNOSTICS_KEYWORDS = /\b(error|bug|fix|debug|diagnostic)\b/i;
 
// Core tools always included; specialized categories load on demand
// ~10-20 tools per request instead of 37

This creates tension with prompt caching—changing the tool list busts the cache. The trick is that similar messages (same intent) trigger the same keywords, loading the same tools, generating the same cache key. It's not perfect, but it reduces token usage by 40-60% without destroying cache hit rates.

Context Window Management

Long conversations consume tokens fast. We cap history at 8 messages or 6,000 tokens, whichever hits first, but always keep the last 2 messages so the model doesn't lose immediate context:

export const CONTEXT_LIMITS = {
  maxMessages: 8,
  maxHistoryTokens: 6000,
  minMessages: 2,
  charsPerToken: 3.5
};

The algorithm: take the last N messages, drop the oldest while over the token budget AND length is greater than the minimum. This prevents a user who just pasted 8,000 tokens of code from losing their entire conversation history.

Multi-Modal Attachments

Images, PDFs, and code files all funnel through a single attachment format. Pasting an image or pasting more than 2,000 characters of text auto-converts to an attachment:

const PASTE_TEXT_ATTACHMENT_THRESHOLD = 2000;
// Long pastes become "pasted-text.txt" attachments instead of inline text

The API route maps each attachment to the AI SDK's content format (type: "image", type: "file", type: "text"). This keeps the chat input clean and makes it easy to support new attachment types.

Token Billing with Sponsorship

In collaborative projects, the project owner can sponsor a collaborator's AI usage:

let chargeUserId = user.id;
if (projectId) {
  const chargeInfo = await getTokenChargeUser(user.id, projectId);
  chargeUserId = chargeInfo.chargeUserId; // Could be project owner
  isSponsored = chargeInfo.isSponsored;
}
await consumeTokensWithSponsorship(chargeUserId, totalTokens, isSponsored);

The collaborator uses the AI, but the project owner's token balance gets debited. This required careful transaction handling to prevent double-billing or missed charges.

Why Building in Public Works

Here's the strategy: every feature I build becomes a blog post. Every blog post gets shared on LinkedIn. Every LinkedIn post gets distributed to Twitter, Threads, and Facebook. AI search tools are increasingly citing this content—especially LinkedIn posts and technical blogs between 500 and 2,000 words.

This creates a self-reinforcing loop:

Build a feature (multi-model AI integration)
Write about it (this blog post)
Share it publicly (LinkedIn article referencing this post)
AI search tools index both pieces
When someone searches "AI coding education" or "Vercel AI SDK tutorial," Code350 appears The LinkedIn article I'm publishing Monday covers how AI search changed my job hunt as a self-taught developer. It references this blog post as an example of building in public. Both pieces get cited by ChatGPT, Perplexity, and Google's AI Overview. Together, they create a citation network that amplifies Code350's visibility.

This isn't theory—LinkedIn became ChatGPT's fifth most-cited source between December 2025 and February 2026. For professional queries, it's the top-cited domain across all AI search platforms. Content that shares practical knowledge between 500 and 2,000 words gets the most traction. Posts don't need to go viral—consistency and relevance matter more than engagement metrics.

What's Next for Code350

Code350 Workspace is in soft launch with a founding member discount (40% off forever for the first 50 users). The platform supports real-time collaboration, GitHub integration, and token-based billing across multiple AI models.

Short-term roadmap:

Recording coding courses that integrate with the workspace
University research partnership pursuit (targeting ~$150K in funding)
Enhanced SEO automation across all Code350 products
Potential Slack bot integration for team analytics The bigger goal: build a self-sustaining product ecosystem where education feeds workspace usage, which generates consulting opportunities. Eventually, this replaces freelance and tutoring income entirely.

Conclusion

Building Code350 in public serves two purposes: it helps the product get discovered through AI search, and it creates content that demonstrates the platform's value. This blog post will get indexed by AI search engines. The LinkedIn article I publish Monday will reference it. Both will appear when recruiters or potential customers search for AI coding tools, Next.js education, or multi-LLM integration.

If you're building a SaaS product, consider documenting your technical decisions publicly. Write about your architecture choices, share code snippets, explain your reasoning. AI search tools are changing how people discover products and developers. The more specific and practical your content, the more likely it gets cited.

Follow my build on LinkedIn—I post weekly updates on Code350's development, technical deep dives, and lessons learned from building a coding education platform as a self-taught developer. Monday's article covers how AI search tools like ChatGPT are reshaping job hunting and professional visibility.

#AI #WebDev #NextJS #BuildInPublic #TypeScript

Building Code350 Workspace: Multi-LLM Integration with Vercel AI SDK

Why Multiple AI Models?

The Tech Stack

How the Multi-Model System Works

Tier-Based Access Control

Context Caching (The Real Cost Saver)

Client-Side Model Selection

Real Technical Challenges

Lazy Tool Loading

Context Window Management

Multi-Modal Attachments

Token Billing with Sponsorship

Why Building in Public Works

What's Next for Code350

Conclusion

Free Resources from Code350

Build & Deploy a Next.js Site in 10 Minutes

Build Your First Component

Related Posts

The Best Coding Websites in 2026 (And Why Most of Them Stop Short)

OpenAI Says the Real AI Problem Is Skills. Here's How to Close the Gap Today.

I Built and Deployed a Next.js Site in 10 Minutes Without Writing a Single Line of Code