Session: Testing GenAI Applications
Software engineers are increasingly using Generative AI to build applications like chatbots, content generation tools and even agents. Testing can be tricky, especially if against a billable account. It gets even trickier when you want your application to perform reliably.
As OpenAI is the lingua franca, we’ll detail how its SDK gives developers reach beyond GPT, including Llama and DeepSeek, due to platform emulation and tools like Ollama. We’ll get into the trickiness of paid accounts and unpredictability of chat responses, and which techniques OpenTelemetry tests use to mitigate them. Finally, review some challenges higher up the stack in AI Assistants and Agents, such as tool hallucinations. We’ll briefly cover practice around quality, including evaluations.
You’ll leave with an overview of challenges GenAI developers encounter, and a few places to start bullet-proofing your codebase.