KAYAK AI Chat

Omnipresent AI experience boosting search and conversion

TL;DR

Problem

KAYAK's standalone AI chatbot had strong capabilities but near-zero retention and almost no organic traffic.

Solution

Pivoted to an embedded conversational AI experience, de-risked with a low-cost A/B test before scaling site-wide.

Team

Design: Tai H, Jeongmin K

PM: John L

Front end: JokΕ«bas S, Goda G

Back end: Elica F

Impact

+14% booking revenue

+123% engagement

At a Glance

Onboarding to AI Chat

Standalone page β†’ embedded drawer, zero context switching

Multi-intent output

Single rich text response β†’ simultaneous flight + hotel widgets with follow-ups

Responsive chat

Congested mobile modal β†’ streamlined single-line input with curated prompts

πŸ‘» The AI Ghost Town

Why does no one come back?

In 2024, KAYAK launched a standalone AI chatbot: KAYAK.ai. As a separate product from the core booking platform, it could answer complex travel questions, match deals to niche preferences, and check live flight status, then hand users off to KAYAK.com to complete their booking.

However, even with heavy investment in marketing and AI infrastructure, KAYAK.ai gained very little organic traffic, and most critically - almost no user came back.

KAYAK.ai became a ghost town within a month of launch.

Signals of opportunity

We dug into the referral data between KAYAK.ai and the core platform. A small group of KAYAK.ai users did make it to core KAYAK, and their behavior stood out: 5x higher click-through rates than users from other referral channels.

The users who experienced AI-assisted search showed dramatically stronger intent. But the standalone model couldn't get enough of them there.

KAYAK.ai's technology wasn't the problem. Asking users to go somewhere new for it was.

🦾 Prompt Engineering: Classify vs. Infer

Before shipping anything, we had to solve the hardest design problem β€” not what the AI looks like, but how it works in our product.

We explored two approaches to generate AI response to user prompts - a parametric classifier vs. an intent inference engine.

Parametric classifier

Parametric classifier maps every prompt by date and destination completeness into 9 fixed scenarios, it was precise but brittle. It assumed users think in search parameters, couldn't handle prompts outside the search paradigm, and would scale exponentially with every new dimension.

Intent inference engine

We decided to build an intent inference model with much more resilience: infer what the user is trying to do into rough bucket (search, explore, ask a question, plan across verticals), then resolve ambiguity within each category through follow up questions and clarification.

Simulate user flow in a custom GPT

To communicate the AI system behavior with ENG and PM, I started in Figma, mapping the intent inference engine as a jobs-to-be-done flow: how the system parses a user's input into different query types, and where a vertical search branches off to a second API call that generates the search URL. I then translated that static flow into a custom GPT so the behavior could actually be run.

Compared to a static figma file, stakeholders could type real prompts and see simulated responses immediately, thereby providing us a much more efficient way to iterate on the system prompt before committing any dev efforts.

This custom GPT simulation aligned ENG and PM as a shared behavioral contract before any design mocks were created, and became the playbook for design iteration.

🟒 Green Light to the Final Lap

We launched NLP Search as a tracer bullet experiment in weeks. Here's what the data came back with.

  • +14% increase in real booking revenue (excluding ads)
  • +2% lift in total conversion
  • Neutral impact on core metrics β€” no downside risk materialized

The revenue signal confirmed the thesis: embedding AI in a familiar flow converts. But the data also revealed a structural ceiling. Users who engaged with NLP Search submitted one query and stopped. There was no way to refine, follow up, or explore across verticals. The architecture was single-turn by design, and usage patterns reflected exactly that.

πŸš€ Scaling to Chat Drawer

NLP Search was a scoped experiment. What came next was a platform shift.

With the revenue signal and the single-turn ceiling identified, we built what we originally envisioned: an omnipresent AI chat experience across KAYAK.com. The Chat v1 drawer launched site-wide, accessible from the front door, results pages, and detail pages. For the first time, users could have a conversation with KAYAK, not just search it.

Left vs. Right?

In emerging AI chat interfaces, panel placement signals the AI's role to users.

Left panels (Lovable, ChatGPT Canvas) position AI as a creative partner. Users rely on multiple rounds of prompting and co-creation to fine tune the artifact generated by LLM.

Right panels, on the other hand, position AI as a contextual assistant. It allows users to stay focused on the main content while keeping the assistant easily discoverable and non-intrusive. KAYAK Chat is the latter: users focus on search results, chat assists on demand.

Integrating with existing architecture

KAYAK already had a right-panel drawer for Trips. Reusing it meant zero new interaction models, and co-locating Chat with Trips creates a natural handoff: plan in chat, save to Trips in one motion.

🩹 Three Problems in One Reply

Chat v1 was live across the site. Then a single flight reply showed everything still wrong with it.

Three problems surfaced in one response:

  • Unstable layout. The summary is the last thing the model generates, yet it rendered at the top, so every answer reflowed and the hierarchy collapsed the moment results arrived.
  • Lost in the panel. The full results list was crammed into the narrow chat column. Fullstory replays showed users scrolling the results away and losing the page they came from.
  • Orphaned saving. The save control lived inside chat, cut off from the core booking flow where users actually manage trips.

The third was the easy call: saving functionality was removed from chat entirely and let the existing Trips drawer own it. The other two were real design problems to tackle: a response structure that stops reflowing, and a navigation model that keeps the page in view.

🧱 Defining a Coherent Response

First problem: when the AI can say anything, how does a response stay coherent?

Multi-turn chat made the output non-deterministic. Where NLP Search returned one structured result set, an open conversation could return anything: a list, a comparison, a clarifying question, a multi-vertical plan. With the summary landing last and jumping to the top, every response reflowed. So instead of designing one layout, I defined a fixed response anatomy, a constant order and lifecycle that any answer could pour into: a transient status and acknowledgement hold the wait, then a persistent block settles in fixed order, the result widget first, the summary pinned beneath it, then a follow-up. The summary still generates last, but it renders in its reserved slot, so the response stops jumping.

The lifecycle also controls cost. The status and acknowledgement are static, generic strings with no per-turn model call, while the widget and summary are the dynamic, generated parts. Holding the wait with a cheap static layer keeps the API cost flat no matter how many turns a user takes.

One slot stayed deliberately loose. The result widgets are variable and optional: a question returns none, a single search returns one, a multi-vertical prompt returns several. The anatomy held that flex, but what each widget should actually be, and how a user moves across several result pages without losing their place, was the next problem.

What users actually want from chat

I tested 3 widget patterns with 9 users (unmoderated, via usertesting.com) to find out how much of a result belongs in chat versus the page. Pattern B, a condensed summary with thumbnails, won.

The scorecard was the small win. The real finding was how differently users framed the panel:

  • Chat-as-Wayfinder (5/9): chat is the fastest way to the right page.
  • Chat-as-Destination (2/9): I can act right inside chat.
  • Results-Page Purist (2/9): the chat is an afterthought.
  • Reference Builder (theoretical): a thinking aid, not an interface. They front-load their trip, constraints, and preferences into chat, then hand that understanding off to conventional search to find and compare. They'd want a persistent, resizable panel they can return to, with filters and browsing left untouched.

The wayfinder majority settled the direction: chat is a navigation layer, not a destination, built to orient and route users, not replace the page.

Insights for after the /ai migration

  • Adaptive vertical widget by modality (for Chat-as-Destination): make widgets responsive to chat mode, more detail in fullscreen, a compact link block in the drawer.
  • Collapsible, resizable chat panel (for Reference Builder): a persistent panel they can reference without it intruding on results, with rich responses they can return to as saved context, and a way to hand off what the AI understood into a traditional search form.

Building search vertical with an interactive component library

This constraint directly shaped the solution. Instead of rendering results inside the chat, each AI response generates a vertical widget: a compact visual anchor in the chat panel linked to a full results page on the core content area.

The widget serves two roles simultaneously. In the chat panel, it acts as a scannable navigation point instead of a wall of text. On the core content area, search results render in their production layout, where users already know how to evaluate and act on them. Because one prompt can now generate several widgets, a viewing badge on each tracks which result pages a user has visited and which remain unviewed, so they can move between them without losing track.

Each widget maps to one results page with a definitive URL. When a user applies a filter, that produces a new URL, so a new widget generates. This keeps the relationship between chat and results predictable: one widget, one page, one state.

A mixed-intent prompt like β€œflights and hotels in Tokyo next week” stacks a widget per vertical, each linked to its own result set and carrying its own viewed state. Users navigate between verticals without losing conversational context.

The strongest delight signal from research: this multi-query capability. β€œI don't think all chatbots would offer that.”

Reflections

Chat v1 Results

After launching Chat v1 as an embedded drawer on KAYAK.com, user engagement jumped +123% compared to NLP Search. Messages per user increased by 80%, signaling that users found enough value to keep the conversation going.

Key takeaways

  • Ask before answering. One clarifying step costs seconds and saves users from dead-end sessions.
  • Classify intent, not query. A deal-hunter and a trip-planner typing the same prompt need entirely different responses.
  • Replace recaps with actions. 7/9 users called text summaries β€œnoise” β€” proactive follow-ups outperformed every time.
  • Prototype with real data. Static mocks can't test intent-driven systems. Design for unknown outputs, not known inputs.

The most valuable work wasn't the UI β€” it was the behavioral specs, classifier logic, and research that made the UI possible. When those pieces clicked, the pixels followed.