The tool trap every audit guide falls into
Open any technical SEO audit guide and you hit the same recommendation within the first 200 words: Screaming Frog. It is a good crawler. Its free tier caps at 500 URLs — and that limit counts every image, stylesheet, script, and page the crawler discovers. A modest blog with 300 pages and 400 images will exhaust the free limit before finishing the crawl.
The solution every guide recommends: buy a Screaming Frog license (£199/year), or upgrade to Semrush ($120+/month), or use Ahrefs ($129+/month). That is not a free audit — that is an expensive one with a free preamble.
There are three things no current top-ranking technical SEO audit guide delivers together:
- A complete audit with no URL cap and no subscription
- A GEO/AI search visibility audit integrated into the standard technical workflow — not treated as a separate discipline
- Log file analysis without a $500+/month enterprise tool
This guide covers all three. The primary crawler used throughout is Vexifa SEO — a free Windows desktop application with no crawl limit. Every other tool referenced is free. No subscription is required at any step.
What a technical SEO audit actually covers
Before crawling a single page, define the scope. A technical SEO audit covers the infrastructure layer — the signals that determine whether search engines and AI crawlers can find, interpret, and rank your content. It does not cover:
- Backlink analysis (a separate audit using tools like Ahrefs Webmaster Tools)
- Content quality or keyword targeting (a content audit)
- Conversion rate or UX issues (a CRO audit)
A technical SEO audit covers seven areas:
| # | Area | What you're checking | Primary tool |
|---|---|---|---|
| 1 | Crawlability | Can crawlers reach your pages? Redirect chains, robots.txt, broken links | Vexifa SEO |
| 2 | Indexation | Are the right pages indexed? Noindex issues, canonical errors, sitemap accuracy | GSC + Vexifa SEO |
| 3 | Core Web Vitals | LCP, INP, CLS thresholds — field data and lab data | GSC + PageSpeed Insights |
| 4 | Structured data | Schema implementation, rich result eligibility, errors | Rich Results Test + Vexifa SEO |
| 5 | AI/GEO visibility | AI crawler access, citation-ready content, GEO audit | Vexifa SEO (GEO audit) |
| 6 | Site architecture | Crawl depth, orphaned pages, internal link distribution | Vexifa SEO |
| 7 | Log file analysis | Actual Googlebot and AI bot behaviour, crawl distribution | GoAccess (free) |
Tools you'll need (all free)
- Vexifa SEO — Windows desktop app, free, unlimited crawl, AI audit, GEO visibility audit. Download here.
- Google Search Console — free, first-party indexation and CWV data. Requires site ownership verification.
- Google PageSpeed Insights — free, Core Web Vitals lab data with field data overlay.
- Google Rich Results Test — free structured data validator at search.google.com/test/rich-results
- GoAccess — free open-source log analyser for Step 7. Optional but recommended for larger sites.
- Chrome DevTools — built into Chrome, no install needed. Used for rendering and JS audit.
Total cost: $0. Time to complete a full audit for a typical small business site: approximately 4-6 hours for the first pass, with subsequent quarterly audits taking 2-3 hours once you have a baseline.
Crawl your entire site — no URL cap
The crawl is the foundation. Every finding in steps 2, 4, and 6 comes from crawl data. If your crawler stops at 500 URLs, your audit has blind spots.
Set up the crawl in Vexifa SEO
Open Vexifa SEO, enter your root domain (e.g. https://example.com), and start a full site crawl. Vexifa crawls all pages, images, stylesheets, scripts, and linked resources, logging the HTTP status code, response time, redirect chain, title, description, heading structure, canonical tag, noindex directive, and internal/external link data for each URL. There is no page limit.
For a site with 5,000 pages, a full crawl typically takes 15-30 minutes depending on server response times. Vexifa respects your server's crawl rate — you can configure the crawl delay in settings to avoid putting load on production during business hours.
What to look for in the crawl report
HTTP status codes. Export the full URL list and filter by status. Every 4xx response is a broken page. Every 3xx is a redirect — follow the chain to identify loops (A → B → A) and long chains (A → B → C → D) that slow crawling and dilute link equity. Best practice: no more than one redirect hop between any two URLs.
Redirect chains. A redirect chain occurs when a URL redirects to a URL that also redirects. Googlebot has a redirect hop limit — long chains cause pages to be crawled later or not at all. Fix: update all internal links pointing to the original URL to point directly to the final destination.
Orphaned pages. These are pages that exist on the server and are indexed, but have no internal links pointing to them. Googlebot discovers them through the sitemap but cannot reach them through normal navigation — which means they receive no internal link equity. Fix: add relevant internal links from related content, or consolidate if the pages have no strategic value.
Filter your crawl report by click depth from the homepage. Any page that requires more than 4 clicks to reach from the homepage is a crawl depth problem. Important commercial pages (product pages, key service pages) should be reachable in 2-3 clicks. Vexifa SEO's link graph visualisation makes structural depth problems immediately visible.
Indexation audit: what Google has and what it's missing
Crawlability tells you whether bots can reach your pages. Indexation tells you whether Google has actually indexed them and whether the right pages are in the index.
GSC Page Indexing report
In Google Search Console, open the Indexing → Pages report. You will see two categories that require different responses:
- Crawled — currently not indexed: Google reached the page but decided not to index it. Causes: thin content, duplicate content, low PageRank signal, soft 404. Fix: improve content quality, consolidate thin pages, or add internal links from higher-authority pages.
- Discovered — currently not indexed: Google knows the URL exists (via sitemap or link) but has not crawled it yet. Often caused by crawl budget constraints or a deep crawl depth. Fix: add internal links from the homepage or high-authority pages, and ensure the URL is in your sitemap.
Noindex audit
The most expensive technical SEO error is accidentally noindexing a revenue page. Cross-reference your crawl data (which pages have a noindex directive) against your GSC indexed pages list. Any page that should be indexed but has <meta name="robots" content="noindex"> or an X-Robots-Tag: noindex header needs immediate attention.
XML sitemap audit
Your sitemap should contain only URLs that return a 200 response and do not have a noindex directive. Submit your sitemap in GSC and check the coverage report for errors. Common issues:
- Redirected URLs in the sitemap (list final destinations only)
- Noindexed pages in the sitemap (contradictory signal)
- 404 pages listed in the sitemap (broken after a migration)
Canonical tag audit
Vexifa SEO surfaces canonical tag data in the crawl report. Check for:
- Missing canonicals — especially on paginated pages, category pages, and faceted navigation URLs
- Canonical pointing to a different domain — legitimate for syndicated content but a bug if unintentional
- Canonical pointing to a redirected URL — the canonical should always point to the final destination URL
- Soft 404s — pages that return a 200 but display "no results found" content; Google treats these like 404s after enough signals
Core Web Vitals: the 2026 thresholds
Core Web Vitals are the performance metrics Google uses as a ranking factor in its Page Experience signal. The 2026 thresholds are:
| Metric | What it measures | Good | Needs improvement | Poor |
|---|---|---|---|---|
| LCP — Largest Contentful Paint | How quickly the main content loads | < 2.5s | 2.5–4s | > 4s |
| INP — Interaction to Next Paint | Responsiveness to user input | < 200ms | 200–500ms | > 500ms |
| CLS — Cumulative Layout Shift | Visual stability during load | < 0.1 | 0.1–0.25 | > 0.25 |
Field data vs. lab data
There are two types of CWV data and the distinction matters for diagnosis:
Field data (CrUX) is collected from real Chrome users browsing your site over the past 28 days. This is what Google uses for ranking. Find it in GSC under Experience → Core Web Vitals. If your site has low traffic, CrUX data may be unavailable — in this case, lab data is your proxy.
Lab data is generated by running a synthetic test in a simulated environment. Google PageSpeed Insights shows both. Lab data is useful for diagnosing the cause of a poor metric, even if the numbers differ from field data.
Common CWV failure causes
LCP failures are most commonly caused by: large unoptimised hero images, render-blocking resources (JavaScript or CSS that blocks the main thread before the largest element can render), and slow server response times (TTFB over 600ms). Fix: compress images, use modern formats (WebP/AVIF), add fetchpriority="high" to your LCP element's image tag, and audit third-party scripts.
INP failures are caused by long JavaScript tasks that block the main thread. Fix: defer non-critical JavaScript, split long tasks, reduce third-party script load.
CLS failures are caused by elements that load after the page has rendered and push other elements around. Fix: define explicit width and height on all images and video embeds; avoid inserting DOM elements above existing content after load; use CSS contain for ad containers.
Structured data audit: schema that works for search and AI
Structured data has shifted from an optional enhancement to a foundational requirement in 2026. AI-generated search answers actively use schema markup to identify authoritative, citation-ready content. A page without properly implemented schema is structurally less likely to be cited in AI Overviews or Perplexity answers, regardless of its content quality.
Schema types to audit
Check for the presence and correctness of these schema types on relevant pages:
- Organization — on every page via sitewide include (or at minimum on homepage and about page). Must include
name,url, andsameAslinks to your verified social profiles. - Article — on all blog posts and guides. Must include
headline,datePublished,dateModified, andauthor. - FAQPage — on pages with FAQ sections. Each question and answer is directly usable by AI search systems for cited answers.
- BreadcrumbList — on all non-homepage pages. Improves sitelinks and signals page hierarchy.
- Product and Offer — on all product or pricing pages. Required for rich results in e-commerce.
- HowTo — on step-by-step guide pages like this one.
- SoftwareApplication — on software product pages.
Validation
Test each important page template using Google Rich Results Test. Check for both errors (which prevent rich results) and warnings (which reduce eligibility). Common errors: missing required fields, incorrect property types, schema pointing to a different URL than the page's canonical.
FAQPage schema is particularly valuable for AI citation eligibility. AI search systems prefer content that provides explicit question-and-answer pairings in structured form. Every page with a FAQ section should have FAQPage JSON-LD — it takes under 10 minutes per page and directly improves your citation surface area in AI-generated answers.
AI crawler access & GEO visibility audit
This is the section missing from every other technical SEO audit guide. Traditional crawlability audits check whether Googlebot can access your pages. In 2026, you also need to check whether AI crawlers can access your pages — and whether your content is structured to be cited by AI-generated answers.
Understanding the three bot tiers
Not all AI bots are the same. There are three distinct categories, each with different implications for your audit:
- Training crawlers (GPTBot, CCBot, Common Crawl) — these build AI training datasets. Blocking them is reasonable if you do not want your content in future model training data. They do not affect current AI search visibility.
- AI search crawlers (OAI-SearchBot, PerplexityBot, Googlebot-Extended) — these power AI search answers in real time. Blocking these reduces your chances of appearing in AI search results.
- User-triggered AI crawlers (ChatGPT-User, Claude-SearchBot) — these fire when a user asks an AI assistant to browse the web for real-time information. Blocking these means your site cannot be cited when users ask AI tools to research your products or services.
Audit your robots.txt
Open /robots.txt on your domain and check for any rules that block AI crawlers. A common error is an overly broad User-agent: * Disallow: / rule, or a Disallow: / directive for crawlers added post-pandemic whose relevance was not considered when added.
Specifically check for blocks on:
GPTBot— affects OpenAI training (acceptable to block)OAI-SearchBot— affects ChatGPT search answers (consider unblocking)PerplexityBot— affects Perplexity AI answers (consider unblocking)Claude-SearchBot,anthropic-ai— affects Claude web search citations (consider unblocking)
JavaScript rendering audit
AI crawlers are typically less sophisticated than Googlebot at rendering JavaScript. If your key content — product descriptions, article bodies, FAQ sections — is rendered client-side via JavaScript, AI crawlers may be indexing a blank page.
Check this quickly: open Chrome DevTools (F12), go to Network → Disable JS (in the Network conditions tab), and reload your page. If the core content disappears, it is client-side rendered and invisible to most AI crawlers. Fix: use server-side rendering (SSR) or static generation for content pages.
Citation-ready content audit
Beyond crawler access, your content needs to be structured in a way that makes it easy for AI systems to extract and cite. Run the following checks:
- BLUF format (Bottom Line Up Front): does each page lead with a direct answer before expanding? AI systems strongly prefer content that answers the question in the first paragraph.
- Comparison tables: structured comparison tables are frequently cited verbatim in AI Overviews. Ensure key comparison content is in HTML table format, not images.
- FAQ sections with FAQPage schema: see Step 4.
- Explicit answer formatting: questions phrased as headings with direct answers in the paragraph below are more likely to be cited than flowing prose.
GEO visibility audit in Vexifa SEO
Vexifa SEO's GEO/AI search visibility audit automates the above checks and also tests actual AI citation presence — querying AI search systems for your brand name and core topic areas to verify whether your content is surfacing in AI-generated answers. Run this after fixing any robots.txt or rendering issues found manually.
The llms.txt file
A llms.txt file, analogous to robots.txt, signals to AI systems which pages contain your primary content and how to interpret your site structure. It is not yet a ranking factor, but its adoption is growing among sites that actively want AI citation. If you have structured, authoritative content, adding a basic llms.txt is a low-effort signal worth considering.
Site architecture & internal linking
Site architecture determines how link equity flows through your site and how efficiently crawlers move through your pages. Poor architecture means important pages receive less crawl attention and less internal link equity — both of which suppress rankings.
Crawl depth
Googlebot has a finite crawl budget for every site. Pages buried deep in the architecture — more than 4 clicks from the homepage — are crawled infrequently. Check the crawl depth map in Vexifa SEO:
- Homepage and top-level pages: 1 click
- Main category/section pages: 2 clicks
- Individual posts, products, service pages: 2-3 clicks
- Nothing important beyond: 4 clicks
If revenue pages (product pages, key landing pages) are 5+ clicks deep, add internal links from the homepage, navigation, or high-authority content pages to bring them within 3 clicks.
Internal link equity distribution
Vexifa SEO's link graph shows the internal link structure as a node graph. Pages with many incoming internal links receive more crawl attention and PageRank signal. Check whether your most important pages have the most internal links — not just any internal links, but contextual links from relevant, high-authority pages.
Anchor text
Review the anchor text distribution on internal links pointing to your key pages. Over-optimised anchor text (the same exact-match keyword on every internal link) is a quality signal concern. Varied, natural anchor text that describes the linked page accurately is correct.
Log file analysis without enterprise tools
Log file analysis is the most underused technique in technical SEO — and the one that separates shallow audits from thorough ones. Server logs show you what actually happened: which pages Googlebot visited, how often, whether it encountered errors, and (increasingly important) which AI crawlers are hitting your site.
This used to require enterprise tools like Botify or OnCrawl ($500-3,000+/month). It does not have to.
Accessing your server logs
How you access logs depends on your hosting:
- Apache/Nginx shared hosting: access log files via your hosting control panel (cPanel, Plesk) under Logs. Look for
access_logoraccess.log. - Cloudflare (with Logpush): Cloudflare Enterprise includes Logpush. On free/Pro plans, use Cloudflare's analytics as a proxy.
- VPS/Dedicated: SSH into your server and access
/var/log/apache2/access.logor/var/log/nginx/access.log.
Analysing with GoAccess (free)
GoAccess is a free, open-source log analyser that runs in the terminal or generates an HTML report. Install it via your package manager (sudo apt install goaccess on Ubuntu) and run:
goaccess access.log --log-format=COMBINED -o report.html
The resulting HTML report shows: top requested URLs, HTTP status codes, visitor agents (including bot user agents), response times, and bandwidth. Filter by user agent to isolate Googlebot, OAI-SearchBot, PerplexityBot, and other crawlers.
What to look for in your logs
- Crawl distribution imbalance: if Googlebot is spending 80% of its crawl budget on low-value pages (e.g., tag archives, parameter URLs) and ignoring your core content pages, this signals a crawl budget problem. Fix: noindex the low-value pages or block them in robots.txt.
- 4xx errors in Googlebot logs: pages that return 404 or 410 when Googlebot visits them but appear fine in your browser (often a mobile vs. desktop rendering difference, or a geolocation issue). Fix: investigate and resolve the server-level discrepancy.
- AI bot crawl patterns: check whether OAI-SearchBot, PerplexityBot, and similar are crawling your site at all. If they are not appearing in your logs despite not being blocked in robots.txt, your site may have technical barriers (JavaScript rendering, authentication, slow response times) that prevent AI crawler access.
- Crawl frequency vs. content update frequency: if you publish new content daily but Googlebot only visits weekly, your crawl rate is too low. Add the new content to your sitemap's lastmod dates to signal freshness.
Prioritising and acting on findings
A thorough technical audit surfaces dozens of issues. Not all of them matter equally. Use this triage framework to work through findings in the order of impact:
| Priority | Issue type | Examples | Urgency |
|---|---|---|---|
| P0 — Critical | Blocks crawling or indexation entirely | Entire site noindexed, robots.txt blocking Googlebot, homepage returning 500 | Fix immediately |
| P1 — High | Revenue pages missing from index; CWV failures | Key product pages crawled-not-indexed, LCP > 4s, broken canonical on homepage | Fix within days |
| P2 — Medium | Schema errors; AI crawler blocks; orphaned high-value pages | FAQPage schema failing validation, PerplexityBot blocked, 20 orphaned blog posts | Fix within 2-4 weeks |
| P3 — Low | Architecture improvements; minor schema warnings; cosmetic issues | Category pages 5 clicks deep, missing BreadcrumbList schema, missing meta descriptions on low-traffic pages | Fix in next quarter |
Vexifa SEO's AI audit module automatically categorises findings by priority and generates a fix recommendation for each issue. For complex sites, run the AI audit report first to get the prioritised list before diving into the raw crawl data.
Complete technical SEO audit checklist
Crawlability
- Run unlimited crawl — no tool-imposed URL cap
- Identify all 4xx pages and document which have internal links pointing to them
- Map redirect chains — resolve any chain longer than one hop
- Identify redirect loops
- Review robots.txt — confirm no accidental blocks on revenue pages
- Check for orphaned pages (no internal links)
- Verify XML sitemap contains only indexable 200-status URLs
- Submit sitemap to GSC and review coverage report
Indexation
- Review GSC Page Indexing — crawled-not-indexed list
- Review GSC Page Indexing — discovered-not-indexed list
- Audit all noindex directives — confirm only low-value pages are excluded
- Verify canonical tags point to correct final destination URLs
- Check for duplicate canonicals across paginated pages
- Identify and fix soft 404s
- Check GSC for manual actions
Core Web Vitals
- Check GSC Core Web Vitals — identify failing URLs
- Run PageSpeed Insights on homepage, key landing pages, highest-traffic pages
- LCP: identify LCP element and audit its load path
- INP: identify long JavaScript tasks; audit third-party scripts
- CLS: check for images/embeds without explicit dimensions
- Verify TTFB is under 600ms from primary geographies
- Check mobile CWV separately from desktop
Structured Data
- Validate Organization schema on homepage with Rich Results Test
- Validate Article schema on all blog/guide pages
- Validate FAQPage schema on all FAQ sections
- Validate BreadcrumbList on all non-homepage pages
- Validate Product/Offer schema on all product pages
- Check GSC Rich Results report for errors and impressions
- Confirm sameAs fields contain verified social profile URLs
AI / GEO Visibility
- Review robots.txt for AI search crawler blocks (OAI-SearchBot, PerplexityBot, Claude-SearchBot)
- Test JavaScript rendering — verify core content loads without JS enabled
- Check BLUF content format — key pages answer the question in paragraph one
- Verify comparison tables are in HTML format, not images
- Run Vexifa SEO GEO visibility audit
- Manually query ChatGPT and Perplexity for brand name + core service queries
- Consider adding llms.txt to signal AI-accessible content structure
Site Architecture
- No revenue page deeper than 3 clicks from homepage
- Identify all pages deeper than 4 clicks — evaluate for consolidation or link addition
- Check internal link distribution — most links pointing to most important pages
- Audit anchor text on internal links to key pages — varied and descriptive
- Identify topic clusters — related content interlinked correctly
Log File Analysis
- Access server logs for last 30 days
- Run GoAccess to generate crawl report
- Filter by Googlebot — identify which pages are crawled most and least frequently
- Filter by AI crawlers — verify OAI-SearchBot, PerplexityBot are accessing the site
- Identify 4xx errors Googlebot encountered (may differ from crawl-tool findings)
- Check crawl distribution — low-value pages consuming disproportionate crawl budget
Frequently asked questions
Can I do a complete technical SEO audit without paying for tools?
Yes. A complete audit — unlimited site crawl, indexation review, Core Web Vitals, structured data validation, GEO visibility, and log file analysis — can be done using Google Search Console (free), Google PageSpeed Insights (free), Google Rich Results Test (free), Vexifa SEO (free Windows desktop app with unlimited crawl and GEO audit), and GoAccess (free open-source log analyser). No SaaS subscription is needed for any step in this guide.
What is the difference between Screaming Frog free and Vexifa SEO?
Screaming Frog's free tier crawls a maximum of 500 URLs — counting every image, script, and stylesheet discovered, not just pages. A 200-page site with standard assets can exhaust this limit mid-crawl. Vexifa SEO is a free Windows desktop application with no crawl limit. It also includes a GEO/AI search visibility audit and AI SEO assistant, which Screaming Frog does not offer at any price tier.
How often should I run a technical SEO audit?
Run a full technical SEO audit quarterly. Check Core Web Vitals monthly via GSC's Experience report. Monitor index coverage weekly through GSC's Pages report. After any major site change — new theme, URL migration, CMS switch — run a full audit immediately before changes go live and again within two weeks of deployment.
Does a technical SEO audit cover backlinks and content?
No. A technical SEO audit covers the infrastructure layer: crawlability, indexation, performance, structured data, and site architecture. Backlink analysis is a separate process requiring a dedicated tool (Ahrefs Webmaster Tools is free for your own site). Content quality audit is also separate — assessing whether your pages match user intent and provide depth relative to competing pages.
What is a GEO visibility audit in SEO?
A GEO (Generative Engine Optimization) visibility audit checks whether your pages are cited in AI-generated search answers from Google AI Overviews, ChatGPT, and Perplexity. It covers: whether AI crawlers are blocked in your robots.txt, whether your content uses citation-ready formats (structured answers, FAQ schema, comparison tables), whether JavaScript rendering blocks AI indexing, and whether your brand and core topics appear in AI search results. Vexifa SEO includes a GEO visibility audit as part of its AI audit module.
Bottom line
A complete technical SEO audit — one that covers crawlability, indexation, Core Web Vitals, structured data, AI visibility, site architecture, and log files — does not require a SaaS subscription or a per-URL crawler limit.
The three capabilities that most audit tools cannot provide without payment:
- Unlimited crawl: Vexifa SEO, free, no cap
- GEO/AI visibility audit: Vexifa SEO's audit module, integrated into the same workflow
- Log file analysis without enterprise tools: GoAccess, free and open source
Run this audit quarterly. The first pass will surface issues. The second pass will show whether your fixes held. The third pass will start showing ranking and visibility improvements from the work you have done.