All posts

Further Thoughts on Measuring AEO

A deeper dive into the tools, gaps, and honest limitations of AEO analytics. Plus: what a dream measurement platform would look like if someone actually built it.

What is AEO analytics? AEO analytics is the practice of measuring how often, how accurately, and in what context AI answer engines cite your brand when users ask questions relevant to your business. It includes tracking citation frequency, share of voice across AI platforms, AI referral traffic and conversion rates, crawl activity from AI bots, and the sentiment of AI-generated brand mentions.


Our previous post on measuring AEO laid out the playbook: the core metrics, the tools, the report structure, the honest limitations. This post goes further. We have been living inside this measurement problem with clients for the past several months, and there is more to say about the current state of the tools, what the conversion data actually looks like, where the real attribution gaps are, and what we think should exist but does not.

This is not a recap. If you have not read the first post, start there. This one assumes you already know the basics and want to go deeper.


What Does the Conversion Data Actually Say?

The headline numbers are eye-catching: AI-referred visitors convert at 3 to 5 times the rate of organic search visitors. Multiple independent studies from different methodologies land in this range. Semrush found a 4.4x conversion rate advantage. Microsoft Clarity, analyzing over 1,200 publisher sites, found AI referrals converting at up to 3x the rate of traditional channels. Ahrefs reported that 0.5% of their total visitors came from AI but drove 12.1% of signups, which works out to roughly 23x.

Those numbers are real, but they need context.

First, the absolute volume is still small. AI referral traffic accounts for roughly 1% of total website traffic across most sites, according to Conductor’s 2026 benchmarks. The conversion premium is striking, but the denominator is tiny. That said, the growth rate is not tiny. Contentsquare measured 623% year-over-year growth in LLM referral traffic. Other analyses put the figure at 527% YoY. A channel growing at that pace does not stay small for long. But right now, you are not replacing your organic funnel with this. You are adding a high-quality channel on top of it.

Second, ChatGPT dominates the referral mix. About 87% of all AI referral traffic comes from ChatGPT specifically. Perplexity, Claude, and Gemini split the rest. This matters because each platform cites content differently and attracts users at different stages of the buying process. Optimizing equally across all AI engines means spending most of your effort on platforms that drive a fraction of the actual traffic.

Third, and this is the one most people miss: not everyone agrees the conversion advantage is real. An Amsive study analyzing first-party data across multiple sites found that the difference between organic and LLM conversion rates was not statistically significant (p = 0.794) when controlled for site-level variability. The average was close (4.87% LLM vs 4.60% organic), but the consistency across sites was not strong enough to call it meaningful.

This does not invalidate the other studies. Different methodologies, different sample compositions, different definitions of conversion. But it does mean you should present the conversion data to clients as promising and directional, not as settled science. The honest version is: AI-referred traffic appears to convert well, the theoretical reasons for that make sense (users arriving via AI recommendation are further down the funnel by default), and the data from multiple sources supports it, but the sample sizes are still small and the variance across sites is real.


What Tools Are Available Now?

Since our last post, the AEO measurement category has grown considerably. G2 reports that the Answer Engine Optimization software category grew 2,000% in a single year. Otterly, which we mentioned in the previous post, was named to G2’s 2026 Best Software Awards and has surpassed 20,000 users. The market has more options now, and the distinctions between them are getting clearer.

Here is where things stand across the major categories.

Dedicated AI visibility platforms are the core of the stack. These are the tools purpose-built to track citations across multiple LLMs.

ToolStarting PriceLLMs TrackedStandout Feature
Cairrot$39/moChatGPT, Perplexity, Claude, Gemini, DeepSeek, GrokOnly tool tracking 5+ LLMs under $100; free API on all plans
Otterly.AI$29/moChatGPT, Perplexity, AI Overviews, AI Mode, Gemini, CopilotPrompt research tool that surfaces real user queries
Scrunch AIEnterpriseChatGPT, Claude, Perplexity, Gemini, Meta AIData API for CRM and BI integration; 500+ brand clients
ProfoundEnterpriseMulti-LLM + ChatGPT ShoppingPersona-based journey simulation; e-commerce shopping optimization

Free tools have gotten better and are a legitimate starting point.

ToolWhat It Measures
HubSpot AEO GraderBrand visibility score across ChatGPT, Perplexity, and Gemini. Five dimensions: sentiment, presence quality, recognition, share of voice, market position.
Bing Webmaster Tools AI PerformancePage-level citation data for Copilot. The only official first-party citation dashboard from a search engine provider.
Microsoft ClarityAI bot crawl activity (server-side, not JS-dependent) and AI referral traffic segmentation.
GA4 Custom Channel GroupsAI referral traffic as its own channel with conversion tracking. Fifteen minutes to set up.

One paid tool worth calling out separately: Contentsquare now offers model-level AI referral attribution on its Growth plans and above, meaning you can see which specific LLM sent a visitor, not just that they came from "AI." If you already use Contentsquare for behavioral analytics, this is worth enabling.

Integrated SEO/AEO platforms have matured since our last post. Semrush’s AI Toolkit now provides visibility tracking and competitive benchmarking alongside traditional keyword data. Ahrefs Brand Radar monitors AI crawler traffic with a free tier. Frase scores content for both Google rankings and AI citations simultaneously and runs audits across eight AI platforms. If you already pay for one of these tools, check whether AEO features have been added since you last looked. Several shipped updates in Q1 2026.

One new entrant worth noting: HubSpot acquired Xfunnel, signaling that AEO data will eventually be a native part of the HubSpot marketing stack. For HubSpot customers, this likely means AI visibility metrics will appear alongside your existing channel reporting without needing a separate tool. That integration is not fully shipped yet, but the direction is clear.


Where Are the Real Attribution Gaps?

We covered the basics of the attribution problem in the last post. Here is where we have gotten more specific about what is actually missing.

There is no Google Search Console for LLMs. This is the single biggest gap in the entire category. There is no universal, free portal from OpenAI, Anthropic, or Google that shows impression-level query data to site owners. Bing’s AI Performance dashboard covers Copilot, and that is it. You cannot see how many times your brand appeared inside a ChatGPT conversation, what prompts triggered it, or whether the user found the answer helpful.

Every other gap follows from this one.

The dark search funnel is real and probably bigger than you think. Discovered Labs published a useful framework for this: buyers research inside ChatGPT and Perplexity, form vendor shortlists based on AI citations, then arrive at your site via branded or direct search. Your attribution software logs this as organic or direct and misses the moment demand was actually created. The fix they propose (and that we have started using with clients) is a three-layer approach: LLM leading indicators (mention rate, citation rate, share of voice), traffic signals (branded vs non-branded organic growth), and self-reported attribution collected at the point of conversion.

That last piece, self-reported attribution, is more important than most teams treat it. Adding "How did you first hear about us?" to your forms and sales scripts captures pipeline that no analytics tool can see. It is low-tech. It works.

You cannot measure why. Current tools tell you whether you were cited. They do not tell you why you were cited instead of a competitor, or why you were not cited when you should have been. Was it your schema markup? Your domain authority? The recency of your content? The structure of your FAQ? The fact that three other authoritative sites linked to your page? Nobody knows with certainty. This is the "citation causality" problem, and solving it would be the most valuable thing anyone could build in this space.

Cross-model inconsistency is invisible. Citation rates, sentiment, and brand mention patterns vary enormously across AI platforms. One analysis by Superlines found variation of up to 615x across different AI engines for the same queries, which is an extreme case, but even the typical variance is significant. A brand that dominates Perplexity might be invisible in ChatGPT. There is no unified framework for understanding why models diverge in their citation behavior, and most tools present an aggregate view that hides platform-level differences.


What Would a Dream AEO Measurement Platform Look Like?

We think about this a lot, partly because we want it to exist and partly because the gap between what is needed and what is available is large enough to be interesting.

Here is what we would build if we were building it.

A query-level citation feed. Not "you were cited 47 times this month" but a feed showing which prompts triggered your brand, across which models, in what context (primary recommendation, alternative, passing mention), and which specific page on your site was referenced. This is the Google Search Console equivalent for LLMs and it is the foundation everything else builds on.

Citation causality testing. A system that connects content changes to citation outcomes. You restructure an FAQ section, add a comparison table, or update your schema. Two weeks later, the tool shows whether your citation rate changed for related queries and correlates the specific changes to the shift. This would transform AEO from "make improvements and hope" into an actual optimization loop with feedback. Nobody has this. Whoever builds it first will own the category.

Dark funnel attribution modeling. Connect LLM citation data with CRM pipeline data. When a lead comes in through branded search, the system checks whether that company or contact was exposed to AI responses mentioning your brand in the preceding 30 days. Pair this with survey data and self-reported attribution and you have a probabilistic model that gives finance teams something to evaluate. Not perfect. Better than nothing, which is what most teams have today.

Entity gap analysis with content briefs. Show not just that competitors are getting cited where you are not, but what specific entity relationships, content structures, and authority signals differentiate their cited content from yours. Then generate a content brief: "To compete for this query cluster, you need a page that defines X, includes a comparison table covering Y and Z, and cites primary research on this topic." Turn competitive intelligence into a work order.

Brand accuracy monitoring. Real-time alerts for when AI models say something incorrect about your brand. Wrong pricing, discontinued products, outdated leadership, inaccurate founding story. Misinformation in AI responses compounds because other models may use those responses as training signals. Early detection matters.

We are not building this (we are a consultancy, not a SaaS company), but if someone reading this is, we would like to be early customers.


A Practical Note on Where This Leaves You

The gap between the value of AI visibility and our ability to measure that value is large. Only about 16% of brands systematically track AI search performance at all, per McKinsey. The tools are young. The attribution is imperfect. The data is incomplete.

None of that is a reason to wait.

Set up GA4 AI referral tracking this week. Run a baseline manual audit this month. Start one dedicated monitoring platform (Otterly or Cairrot are both solid entry points at under $40/month). Add self-reported attribution to your conversion forms. Measure what you can, acknowledge what you cannot, and build the institutional knowledge that comes from paying attention to a channel while it is still small enough to learn.

The brands doing this now are building two advantages that compound. The first is AI visibility itself, which appears to reward consistency and early presence. The second is measurement literacy: understanding what the data means, what it does not mean, and how to make decisions from imperfect signals. That second advantage is the one most people underestimate.


A Note on This Post

This is the third post in our AEO series, following SEO Is Dead. Long Live AEO. (the strategy) and How Do You Measure AEO? (the measurement playbook). Like those posts, this one is structured to follow the AEO principles it describes: question-based headings, answer-first paragraphs, structured tables with specific tool names and pricing, and citable definitions at the top. Specific data points are included with their sources because AI agents prefer content that makes their answers more credible. The limitations are stated plainly because that is how we write about everything.


Arrow & Bell helps technology companies build and scale AI-first digital operations, including AEO strategy and measurement.


Related reading: SEO Is Dead. Long Live AEO. | How Do You Measure AEO? | What Does AI See When It Looks at Your Website?