KAYAK AI Chat

Omnipresent AI experience boosting search and conversion

TL;DR

Problem

KAYAK's standalone AI chatbot had strong capabilities but near-zero retention and almost no organic traffic.

Solution

Pivoted to an embedded conversational AI experience, de-risked with a low-cost A/B test before scaling site-wide.

Team

Design: Tai H, Jeongmin K

PM: John L

Front end: Jokūbas S, Goda G

Back end: Elica F

Impact

+14% booking revenue

+123% engagement

At a Glance

Onboarding to AI Chat

Standalone page → embedded drawer, zero context switching

Multi-intent output

Single rich text response → simultaneous flight + hotel widgets with follow-ups

Responsive chat

Congested mobile modal → streamlined single-line input with curated prompts

👻 The AI Ghost Town

Why does no one come back?

In 2024, KAYAK launched a standalone AI chatbot: KAYAK.ai. As a separate product from the core booking platform, it could answer complex travel questions, match deals to niche preferences, and check live flight status, then hand users off to KAYAK.com to complete their booking.

However, even with heavy investment in marketing and AI infrastructure, KAYAK.ai gained very little organic traffic, and most critically - almost no user came back.

KAYAK.ai became a ghost town within a month of launch.

Signals of opportunity

We dug into the referral data between KAYAK.ai and the core platform. A small group of KAYAK.ai users did make it to core KAYAK, and their behavior stood out: 5x higher click-through rates than users from other referral channels.

The users who experienced AI-assisted search showed dramatically stronger intent. But the standalone model couldn't get enough of them there.

KAYAK.ai's technology wasn't the problem. Asking users to go somewhere new for it was.

🧪 Testing the waters: NLP Search

From finding to hypothesis

If the problem was asking users to go somewhere new — not the AI itself — then the fix wasn't improving KAYAK.ai. It was removing the destination barrier entirely.

That reframed the question from “How do we make KAYAK.ai better?“ to a testable hypothesis: users are more willing to adopt AI features and convert when the experience is embedded in a familiar flow.

AI Chat vs. AI search mode: Long term goal and quick test

We had a larger vision. But a persistent chat experience — multi-turn conversation, new UI components, new interaction patterns — carried significant dev cost and migration risk. If that experiment failed, the sunk cost would be high and the learnings hard to isolate.

We needed the cheapest possible test of the core thesis before making the expensive architectural commitment.

The obvious move was to let users type natural language directly into KAYAK's search form. But the existing form routes to structured queries — origin, destination, dates, passengers. Accepting freeform input would require a new backend pathway to constantly parse whether a user is typing a natural language prompt or a structured airport query. That's a much larger scope than adding a parallel mode.

A separate AI mode gave us two things: risk isolation and signal clarity. If the experiment broke, only the experiment broke — not the primary search funnel. And a distinct mode produced clean XP data, making it possible to attribute any conversion lift specifically to AI-assisted search.

The test

NLP Search — natural language input embedded directly in the core KAYAK search flow. Same site. Same results page. One new input type. The AI meets users where they already are, inside the flow they already trust.

Through this fast and low-cost MVP experiment, we were able to validate whether a larger vision was worth building.

One input field. Same search flow. Cheapest way to know if the thesis holds.

🦾 Prompt Engineering: Classify vs. Infer

Before shipping anything, we had to solve the hardest design problem — not what the AI looks like, but how it works in our product.

We explored two approaches to generate AI response to user prompts - a parametric classifier vs. an intent inference engine.

Parametric classifier

Parametric classifier maps every prompt by date and destination completeness into 9 fixed scenarios, it was precise but brittle. It assumed users think in search parameters, couldn't handle prompts outside the search paradigm, and would scale exponentially with every new dimension.

Intent inference engine

We decided to build an intent inference model with much more resilience: infer what the user is trying to do into rough bucket (search, explore, ask a question, plan across verticals), then resolve ambiguity within each category through follow up questions and clarification.

Simulate user flow in a custom GPT

To communicate the AI system behavior with ENG and PM, I built a custom GPT with a system prompt that simulated the intent inference engine's behavior.

Compared to a static figma file, stakeholders could type real prompts and see simulated responses immediately, thereby providing us a much more efficient way to iterate on the system prompt before committing any dev efforts.

This custom GPT simulation aligned ENG and PM as a shared behavioral contract before any design mocks were created, and became the playbook for design iteration.

🟢 Green Light to the Final Lap

We launched NLP Search as a tracer bullet experiment in weeks. Here's what the data came back with.

+14% increase in real booking revenue (excluding ads)
+2% lift in total conversion
Neutral impact on core metrics — no downside risk materialized

The revenue signal confirmed the thesis: embedding AI in a familiar flow converts. But the data also revealed a structural ceiling. Users who engaged with NLP Search submitted one query and stopped. There was no way to refine, follow up, or explore across verticals. The architecture was single-turn by design, and usage patterns reflected exactly that.

🚀 Scaling to Chat Drawer

NLP Search was a scoped experiment. What came next was a platform shift.

With the revenue signal and the single-turn ceiling identified, we built what we originally envisioned: an omnipresent AI chat experience across KAYAK.com. The Chat v1 drawer launched site-wide, accessible from the front door, results pages, and detail pages. For the first time, users could have a conversation with KAYAK, not just search it.

Left vs. Right?

In emerging AI chat interfaces, panel placement signals the AI's role to users.

Left panels (Lovable, ChatGPT Canvas) position AI as a creative partner. Users rely on multiple rounds of prompting and co-creation to fine tune the artifact generated by LLM.

Right panels, on the other hand, position AI as a contextual assistant. It allows users to stay focused on the main content while keeping the assistant easily discoverable and non-intrusive. KAYAK Chat is the latter: users focus on search results, chat assists on demand.

Integrating with existing architecture

KAYAK already had a right-panel drawer for Trips. Reusing it meant zero new interaction models, and co-locating Chat with Trips creates a natural handoff: plan in chat, save to Trips in one motion.

🧩 Designing for Multiple Intents

With the architecture in place, the design challenge moved inside the conversation.

When responsive design breaks

Early iterations displayed full search result lists directly in the chat thread. In a narrow right-panel, this made conversations unusably long and text-heavy. The chat panel was competing with the results page instead of complementing it.

We needed a response format that worked within the panel's constraints while keeping users focused on the core content area, where results render at full fidelity.

What users actually want from chat

We tested 3 widget design patterns to determine how much information should live in the chat panel vs. the results page. After a user testing session from 9 users (individual unmoderated tests via usertesting.com), Pattern B scored highest.

But the more valuable finding was a consistent split in how users think about chat:

The majority treated chat as a navigation layer, not a destination. That finding became the design constraint for everything that followed: the chat panel's job is to orient users and move them to the right page, not to replace it.

Building search vertical with an interactive component library

This constraint directly shaped the solution. Instead of rendering results inside the chat, each AI response generates a vertical widget: a compact visual anchor in the chat panel linked to a full results page on the core content area.

The widget serves two roles simultaneously. In the chat panel, it acts as a scannable navigation point instead of a wall of text. On the core content area, search results render in their production layout, where users already know how to evaluate and act on them. A viewing badge on each widget tracks which result pages a user has visited and which remain unviewed.

Each widget maps to one results page with a definitive URL. When a user applies a filter, that produces a new URL, so a new widget generates. This keeps the relationship between chat and results predictable: one widget, one page, one state.

For multi-vertical queries (“flights and hotels in Tokyo next week”), the system stacks widgets, each linked to its own result set. Users navigate between verticals without losing conversational context.

The strongest delight signal from research: this multi-query capability. “I don't think all chatbots would offer that.”

Reflections

Chat v1 Results

After launching Chat v1 as an embedded drawer on KAYAK.com, user engagement jumped +123% compared to NLP Search. Messages per user rose from 1.25 to 2.25 — an 80% increase — signaling that users found enough value to keep the conversation going.

Key takeaways

Ask before answering. One clarifying step costs seconds and saves users from dead-end sessions.
Classify intent, not query. A deal-hunter and a trip-planner typing the same prompt need entirely different responses.
Replace recaps with actions. 7/9 users called text summaries “noise” — proactive follow-ups outperformed every time.
Prototype with real data. Static mocks can't test intent-driven systems. Design for unknown outputs, not known inputs.

The most valuable work wasn't the UI — it was the behavioral specs, classifier logic, and research that made the UI possible. When those pieces clicked, the pixels followed.