LLMs

Implementation strategies for Large Language Models, focusing on practical business applications and AI integration solutions.

Implementation strategies for Large Language Models, focusing on practical business applications and AI integration solutions.

LLMs Posts

All Categories
Featured image for Cuttlesoft's "What GPT-5.5 Actually Changes for Custom Software Builds" post showing a clean horizontal bar chart on a dark midnight-zone background that benchmarks OpenAI's GPT-5.5 agentic coding performance, with a small monospace header reading "GPT-5.5 / Agentic Coding Benchmarks," a seafoam-green gradient bar reaching 82.7% on Terminal-Bench 2.0 to represent state-of-the-art accuracy on complex multi-step command-line workflows requiring planning and tool coordination, and a pacific-blue gradient bar reaching 58.6% on SWE-Bench Pro to represent end-to-end resolution of real-world GitHub issues, with both bars falling visibly short of a sand-colored dashed vertical threshold line at 95% labeled "where you can stop reviewing," visually arguing that GPT-5.5's agentic coding numbers are the most impressive any frontier LLM has produced yet still well below the autonomy threshold where engineering teams could safely skip human code review on production custom software builds.
May 7, 2026 • Frank Valcarcel

GPT-5.5 Is Here. Should You Pause Your Software Project?

Within days of the release, three clients asked the same question: should we pause the project and rebuild around this? The answer is almost never. Here is why.

Illustration of a small, determined knight in weathered medieval armor and a bucket helmet, wearing a tattered red cape, striding across barren cracked earth with sword drawn, surrounded by a swirling cloud of scattered wooden alphabet letters representing Token Guard, a GitHub Action that monitors and guards against token context bloat in AI coding agent workflows by counting the tokens in LLM instruction files committed to repositories."
February 10, 2026 • Frank Valcarcel

Token Guard: Keeping Your Agent Context Lean in CI

Token Guard is a GitHub Action that counts tokens in your agent context files and enforces limits in CI. Here’s why we check agent context into our repos, and why keeping it lean matters for team collaboration.

Code snippet showing a Python Pydantic MovieReview model with typed fields (title, rating, summary, pros, cons) and OpenAI's response_format parameter for structured outputs, syntax highlighted on a dark editor background
November 12, 2025 • Frank Valcarcel

How to Get Guaranteed JSON from LLMs with Structured Outputs

Tired of parsing flaky JSON from LLM responses? OpenAI’s Structured Outputs feature guarantees your responses match your schema exactly. Here’s how to use it with Pydantic, when to choose it over function calling, and the gotchas you’ll encounter in production.

Featured image for Cuttlesoft's "RAG Fundamentals: What It Is and When to Use It" post showing a stylized 2D projection of a vector embedding space against a dark midnight background, with seven color-coded constellations of dots representing topical clusters in a knowledge base — squid for Pricing, aquamarine for Onboarding, pacific blue for API Docs, sand-muted for Policies, urchin pink for Support, sunbeam gold for Release Notes, and a central seafoam cluster around the query — each constellation woven together with faint intra-cluster edges that suggest the local manifold structure of embeddings, plus a clockwise perimeter loop and four diagonal connectors hinting at the broader topology of the vector space, a bright seafoam query point at the center surrounded by a soft halo and a dashed search radius, four crisp seafoam edges connecting the query to its k=4 nearest neighbors with a small monospace "QUERY" callout above and "k = 4 NEAREST" label below, visually arguing that retrieval-augmented generation works because semantically related content lands in similar positions in vector space and a similarity search reliably pulls the right chunks out of a much larger corpus.
August 19, 2025 • Frank Valcarcel

RAG Fundamentals: What Is It and When to Use It

RAG is the most common pattern for putting an LLM in front of your own data, and the most commonly misunderstood. Here is what it is, when it is the right tool, and how the pieces fit together.

A software developer reviewing code and test results on a large monitor in a modern office environment. The screen displays multiple lines of syntax-highlighted code alongside what appears to be an evaluation or testing panel with color-coded output logs, suggesting an active model comparison or performance benchmarking workflow. A laptop sits open in the foreground, reinforcing a multi-screen development setup typical of AI and machine learning engineering work. The shallow depth of field and over-the-shoulder perspective emphasize the deliberation involved in evaluating technical systems for production readiness.
July 22, 2025 • Frank Valcarcel

How to Choose an LLM When Every Model Claims State of the Art

Benchmark scores don’t tell you whether a model will work for your business. Here’s how to evaluate LLMs on the three axes that actually matter: quality, throughput, and cost.

The Pydantic.ai logo features a stylized pink starfish or sea star icon next to black text reading 'PydanticAI' against a gradient background that transitions from cyan to soft lavender. This elegant, minimal design reflects the framework's focus on clean, structured AI development. The vibrant gradient background suggests the dynamic and innovative nature of this new agent framework from the creators of Pydantic, while maintaining a professional tech aesthetic.
December 11, 2024 • Frank Valcarcel

Pydantic.ai: Building Smarter, Type-Safe AI Agents

The team that brought type safety to Python web development with Pydantic has just unveiled their take on AI development: Pydantic.ai. This new framework reimagines how we build AI applications by bringing Pydantic’s legendary validation capabilities to the world of Large Language Models.

A conceptual illustration shows a chat bubble icon at the center of a complex maze, representing the challenges of evaluating Large Language Models for commercial applications. The intricate blue-tinted labyrinth symbolizes the many considerations Cuttlesoft navigates when implementing AI solutions in enterprise software - from API integration and cost management to security compliance. This visual metaphor captures the complexity of choosing the right LLM technology for custom software development across healthcare, finance, and enterprise sectors. The centered message icon highlights Cuttlesoft's focus on practical communication AI applications while the maze's structure suggests the methodical evaluation process used to select appropriate AI tools and frameworks for client solutions.
September 12, 2024 • Frank Valcarcel

Benchmarking AI: Evaluating Large Language Models (LLMs)

Large Language Models like GPT-4 are revolutionizing AI, but their power demands rigorous assessment. How do we ensure these marvels perform as intended? Welcome to the crucial world of LLM evaluation.

Other Categories

Let's work together

Tell us about your project and how Cuttlesoft can help. Schedule a consultation with one of our experts today.

Contact Us