GEO Strategy

How to Audit Your Brand's AI Visibility: The Complete 50-Query Framework

Most brands have no idea where they stand in AI recommendations. This is the complete audit framework — 50 queries across ChatGPT, Claude, Perplexity, and Gemini, a scoring rubric that produces a real Visibility Score, and a 90-day action plan to close your competitive gap.

Airo TeamMarch 15, 202620 min read

In This Article

01Why You Need a Baseline 02The 50-Query Framework 03Scoring Rubric 04Platform-by-Platform Audit 05Competitive Benchmark 06Tracking Over Time 0790-Day Action Plan 08The 20-Point Checklist

A brand manager at a mid-market SaaS company recently told me something that stuck with me.

"We rank number one on Google for our three biggest keywords. We show up in every review roundup. Our NPS is 68. And then I asked ChatGPT for the best tools in our category — and it named four competitors and never mentioned us once. That was six months ago. I still don't know what to do about it."

She's not alone. The shift to AI-first research is fast, it's uneven, and it rewards brands that move quickly to understand where they stand. The problem is that most brand teams have no systematic way to measure AI visibility. They run one query, get one result, and either panic or dismiss it.

What they need is an audit — a structured, repeatable process that produces a real baseline score, a competitive map, and a clear set of priorities. That's what this guide builds.

By the end, you'll have: a 50-query framework covering all four major AI platforms, a scoring rubric that produces a single 0–100 Visibility Score, step-by-step audit protocols for ChatGPT, Claude, Perplexity, and Gemini, a competitive benchmarking methodology, and a 90-day action plan template. Run this once and you'll know exactly where you stand. Run it quarterly and you'll know whether your GEO investments are working.

Why You Need a Baseline (Before You Do Anything Else)

Before you build a single piece of content for GEO, optimize a single page, or pursue a single citation, you need to know your starting point. Without a baseline, you're flying blind — you can't tell whether your efforts are working, where your biggest gaps are, or which platform deserves your attention first.

A proper baseline gives you three things. First, a quantified score — not "we're doing okay" but "our Visibility Score is 31/100 on ChatGPT and 58/100 on Perplexity." Second, a competitive position — not "we're behind our competitors" but "Competitor A scores 72/100 and we score 31/100 on the same query set, a 41-point gap." Third, a prioritized action list — not "we need to do more content" but "we're missing from nine category recommendation queries where Competitor B is named first, and those nine queries are where we should focus next quarter."

Every GEO decision that comes after this audit should be traceable back to specific gaps and specific scores. That's what turns brand building from guesswork into a measurable program.

Mention Rate

What % of relevant queries result in your brand being named? This is your top-line visibility number.

Sentiment Score

When you are mentioned, is the framing positive, neutral, or negative? Neutral mentions are not wins.

Competitive Position

Where do you rank relative to your top 3 competitors across the same query set on each platform?

The 50-Query Framework

Fifty queries sounds like a lot. It isn't — not when you're running across four platforms and trying to build a statistically meaningful baseline. The queries break into five categories of ten. Each category tests a different dimension of your AI visibility.

The 50-query total is not arbitrary. Below 30 queries, variance from individual AI responses creates too much noise. Above 80, you hit diminishing returns. Fifty gives you meaningful coverage without the audit becoming a multi-day project.

Direct Brand Queries

→"Tell me about [Your Brand]"
→"Is [Your Brand] legit?"
→"[Your Brand] vs [Competitor]"
→"[Your Brand] reviews"
→"What do people think of [Your Brand]?"

These tell you your base entity recognition — does the AI know you exist at all?

Category Recommendation

→"Best [category] tools in 2026"
→"Top [category] platforms for [use case]"
→"What should I use for [job to be done]?"
→"Recommend a [category] solution"
→"Which [category] tool is worth paying for?"

This is where most purchase decisions happen. Being absent here is invisible lost revenue.

Problem-Based Queries

→"How do I solve [pain point your product addresses]?"
→"I'm struggling with [problem] — what helps?"
→"What's the fastest way to [desired outcome]?"
→"Tools to help with [specific challenge]"
→"[Problem] — what are my options?"

Buyers deep in problem mode — highest purchase intent, often blind to specific brand names.

Comparison & Evaluation

→"[Your Brand] vs [Competitor A] vs [Competitor B]"
→"Pros and cons of [Your Brand]"
→"Who uses [Your Brand]?"
→"Is [Your Brand] good for [specific use case]?"
→"Alternatives to [Competitor] — what else is out there?"

Evaluation-stage queries. Positive framing here moves buyers from consideration to decision.

Expert/Authoritative

→"Who are the experts in [your niche]?"
→"Which [category] brands do professionals use?"
→"Best [category] for enterprise teams"
→"What does [industry] think about [relevant topic]?"
→"Who should I follow to learn about [your topic area]?"

These reveal your authority positioning — whether AI sees you as a credible voice, not just a product.

The Incognito Rule

Every single audit query must be run in a fresh, logged-out or incognito session. AI platforms personalize responses based on your prior conversation history, your account profile, and your geographic location. A query you run while logged in may return a completely different result than the same query run by a potential customer. Your audit needs to reflect what a stranger sees — not what the algorithm shows you because it knows you've been researching this topic for weeks. Set up a dedicated audit profile once and use it every time.

Scoring Rubric: The Visibility Score Calculator

Every query gets scored on a 0–5 scale. Across 50 queries, your maximum raw score is 250 per platform. We normalize that to 0–100. Here's the scoring table:

Score	Condition	Notes
0	Brand not mentioned at all	Complete invisibility on this query
1	Brand mentioned as a footnote or afterthought	"There are others like [Brand]" — low weight
2	Brand mentioned in a list (3rd position or lower)	Presence, but not leading
3	Brand mentioned in a list (1st or 2nd position)	Strong placement baseline
+1 bonus	Mention includes a direct link or citation	Platform trusts your source material
+1 bonus	Mention includes specific positive language	"highly rated", "trusted by", "widely used"
−1 penalty	Mention includes negative or warning language	"some users report issues", "mixed reviews"

Benchmark tiers:

0–20

No Presence

AI platforms have essentially no usable data on your brand

21–40

Emerging

Mentioned occasionally but not reliably — training data is thin

41–60

Developing

Present in some categories, absent in others — clear gaps visible

61–80

Established

Consistent mentions across most query types — above-average visibility

81–100

Dominant

Named first on most queries, positive framing, cited with sources

Platform-by-Platform Audit Protocols

Each platform needs to be audited differently. The mechanics of how they retrieve and present information are different enough that copy-pasting a single protocol across all four will produce misleading results. Here's the exact step-by-step for each.

💬ChatGPT

1.Open a new chat — never reuse conversations for audit queries (memory and context contaminate results)
2.Use GPT-4o model — this is what the majority of paying users interact with
3.Disable memory if you have it enabled: Settings → Personalization → Memory → Off
4.Paste your query exactly. Do not rephrase or add context.
5.Record: brand mentioned (Y/N), position in list, exact phrasing used, any citations or links provided
6.Close and open a new chat before the next query

🤖Claude

1.Use claude.ai — new conversation for each query, no Projects context
2.Default model (Claude 3.5 Sonnet or whatever is current) — do not select specialized modes
3.Claude does not pull live web data by default — responses reflect training data, making it a pure entity-recognition test
4.Claude is notably more conservative with brand mentions than ChatGPT — a mention here carries high authority weight
5.Note whether Claude qualifies its recommendation (e.g., "based on my training data") — this language reveals confidence level
6.Pay attention to which sources Claude would cite if it could — often mentioned in passing as "according to [publication]"

🔍Perplexity

1.Use perplexity.ai — default search mode (not Focus modes like Academic or YouTube)
2.Perplexity cites sources in real time — this is the most citation-rich audit you will run
3.For each response, record not just whether you're mentioned, but which sources Perplexity cites that mention you
4.If you're not cited: note which competitor sources ARE cited — those are your highest-priority citation targets
5.Run queries twice at different times of day — Perplexity's live search means results can vary by news cycle
6.Check whether your own website appears as a source — direct citation is the strongest possible signal

✨Gemini

1.Use gemini.google.com — standard Gemini 1.5 Pro (not Gemini Advanced if possible, for standardization)
2.Gemini integrates Google Search signals — brands with strong Google presence have a natural advantage here
3.Note whether Gemini links out to sources — these citations reveal the Google index signals driving its answers
4.Gemini tends to favor review aggregators (G2, Capterra, Trustpilot) — audit your presence on these platforms separately
5.If Gemini mentions you with a Google Business Profile card, your local/entity signals are strong — note this
6.Gemini is more likely than Claude to hedge with "I'd recommend checking recent reviews" — factor this into sentiment scoring

Platform Quirks You Need to Know

Competitive Benchmark

Your Visibility Score only becomes actionable when you can compare it to someone else's. A score of 45/100 might mean you're the category leader in an emerging space, or it might mean you're 40 points behind every major competitor. Without the benchmark, you don't know which.

Run the identical 50-query set for your top 3 competitors. Score them using the same rubric. Then build a gap analysis matrix: for every query where a competitor is mentioned and you are not, that is a direct, prioritized content and citation opportunity.

Pay particular attention to the sources cited in competitor mentions. If ChatGPT says "Competitor A is widely recommended — sources like TechCrunch, G2, and r/SaaS consistently praise their [feature]" — those three platforms are exactly where you need to build your own presence. The AI is showing you its evidence. Use it.

Query	Platform	Your Brand	Comp A	Comp B	Gap
"Best [category] tool"	ChatGPT	2/5	5/5	3/5	−3
"Best [category] tool"	Perplexity	4/5	5/5	4/5	−1
"Best [category] tool"	Claude	0/5	4/5	2/5	−4
"[Problem] solution"	ChatGPT	3/5	3/5	1/5	0
"[Problem] solution"	Gemini	1/5	5/5	3/5	−4

Example gap analysis — negative gap scores are your highest-priority improvement targets.

Tracking Over Time: The Six Metrics That Matter

A single audit is a snapshot. The value compounds when you run it consistently and track the delta. Here are the six metrics to log every time you run your audit:

Overall Visibility Score

Your aggregate 0–100 score across all 50 queries and all 4 platforms. This is your north-star headline number.

Per-Platform Score

Separate scores for ChatGPT, Claude, Perplexity, Gemini. Platform divergence tells you where to prioritize effort.

First-Mention Rate

What % of queries where you're mentioned are you named first? First mention is disproportionately more valuable.

Sentiment Ratio

Positive vs. neutral vs. negative mentions as a percentage. A rising score with rising negative sentiment is not a win.

Citation Rate

What % of your mentions include a direct citation or link? Citations are evidence of source trust, not just name recognition.

Competitive Gap

Your score minus the top competitor's score, per platform. A closing gap is your leading indicator that GEO is working.

90-Day Action Plan

Once you have your baseline scores and competitive gap matrix, the 90-day plan almost writes itself. Prioritize actions that close your largest gaps on the platforms with the most purchase-intent traffic first.

Month 1Close the Entity Gap

•Run the full 50-query audit and record your baseline scores
•Identify the 10 queries with the largest competitive gap (you not mentioned, competitor mentioned)
•Audit your brand entity completeness: Wikipedia, Google Knowledge Panel, schema markup, Wikidata
•Fix all structural gaps in schema markup — add Organization, Product, and BreadcrumbList at minimum
•Submit or update your Wikipedia page if one does not exist or is thin

Month 2Build Citation Velocity

•For each of your 10 priority queries, identify which sources competitors are cited from
•Prioritize getting coverage on those exact sources: Reddit threads, review platforms, industry publications
•Publish 4–6 authoritative pieces targeting your highest-gap query categories
•Begin systematic review acquisition on G2, Capterra, Trustpilot, or category-appropriate platforms
•Pitch 2–3 original data stories or contributed articles to industry publications that appear in Perplexity citations

Month 3Amplify and Measure

•Re-run the full 50-query audit and compare scores to your Month 1 baseline
•Calculate delta scores per platform and per query category
•Double down on the platforms and query types showing the most improvement
•Identify any new queries where competitors have grown and you haven't — add to priority list
•Schedule quarterly audit cadence and assign ownership — GEO without tracking decays fast

Prioritization Framework

When you have more gap-closing opportunities than bandwidth, use this tiebreaker: prioritize queries with the highest buyer intent (comparison and category recommendation queries beat problem-based queries beat brand queries), then prioritize the platforms with the largest audience reach for your category. For most B2B brands that order is: Perplexity first (high research intent, live citations), ChatGPT second (largest user base), Claude third (most conservative, so wins compound), Gemini fourth. For consumer brands, Gemini often moves higher because of its Google Shopping and local integration.

The 20-Point AI Brand Visibility Audit Checklist

Your Progress0 / 20

Progress auto-saved to your browser

Setup

Running the Audit

Competitive Benchmark

Tracking System

Recommended Tools

Best AI Rank Tracker Tools in 2026 — automate your audit with the right tool →AI Search Monitoring Guide — set up ongoing monitoring after your audit →

Skip the Spreadsheet

Run This Audit Automatically with Airo

Airo runs your 50-query audit across all four AI platforms every week, scores every mention automatically, tracks your competitive gap over time, and surfaces the specific queries and platforms where you're falling behind — no spreadsheets, no manual testing, no guesswork.

Start Your Free Audit