The tool trap every audit guide falls into

Open any technical SEO audit guide and you hit the same recommendation within the first 200 words: Screaming Frog. It is a good crawler. Its free tier caps at 500 URLs — and that limit counts every image, stylesheet, script, and page the crawler discovers. A modest blog with 300 pages and 400 images will exhaust the free limit before finishing the crawl.

The solution every guide recommends: buy a Screaming Frog license (£199/year), or upgrade to Semrush ($120+/month), or use Ahrefs ($129+/month). That is not a free audit — that is an expensive one with a free preamble.

There are three things no current top-ranking technical SEO audit guide delivers together:

  1. A complete audit with no URL cap and no subscription
  2. A GEO/AI search visibility audit integrated into the standard technical workflow — not treated as a separate discipline
  3. Log file analysis without a $500+/month enterprise tool

This guide covers all three. The primary crawler used throughout is Vexifa SEO — a free Windows desktop application with no crawl limit. Every other tool referenced is free. No subscription is required at any step.

What a technical SEO audit actually covers

Before crawling a single page, define the scope. A technical SEO audit covers the infrastructure layer — the signals that determine whether search engines and AI crawlers can find, interpret, and rank your content. It does not cover:

A technical SEO audit covers seven areas:

# Area What you're checking Primary tool
1 Crawlability Can crawlers reach your pages? Redirect chains, robots.txt, broken links Vexifa SEO
2 Indexation Are the right pages indexed? Noindex issues, canonical errors, sitemap accuracy GSC + Vexifa SEO
3 Core Web Vitals LCP, INP, CLS thresholds — field data and lab data GSC + PageSpeed Insights
4 Structured data Schema implementation, rich result eligibility, errors Rich Results Test + Vexifa SEO
5 AI/GEO visibility AI crawler access, citation-ready content, GEO audit Vexifa SEO (GEO audit)
6 Site architecture Crawl depth, orphaned pages, internal link distribution Vexifa SEO
7 Log file analysis Actual Googlebot and AI bot behaviour, crawl distribution GoAccess (free)

Tools you'll need (all free)

Free tools used in this guide
  • Vexifa SEO — Windows desktop app, free, unlimited crawl, AI audit, GEO visibility audit. Download here.
  • Google Search Console — free, first-party indexation and CWV data. Requires site ownership verification.
  • Google PageSpeed Insights — free, Core Web Vitals lab data with field data overlay.
  • Google Rich Results Test — free structured data validator at search.google.com/test/rich-results
  • GoAccess — free open-source log analyser for Step 7. Optional but recommended for larger sites.
  • Chrome DevTools — built into Chrome, no install needed. Used for rendering and JS audit.

Total cost: $0. Time to complete a full audit for a typical small business site: approximately 4-6 hours for the first pass, with subsequent quarterly audits taking 2-3 hours once you have a baseline.

1

Crawl your entire site — no URL cap

The crawl is the foundation. Every finding in steps 2, 4, and 6 comes from crawl data. If your crawler stops at 500 URLs, your audit has blind spots.

Set up the crawl in Vexifa SEO

Open Vexifa SEO, enter your root domain (e.g. https://example.com), and start a full site crawl. Vexifa crawls all pages, images, stylesheets, scripts, and linked resources, logging the HTTP status code, response time, redirect chain, title, description, heading structure, canonical tag, noindex directive, and internal/external link data for each URL. There is no page limit.

For a site with 5,000 pages, a full crawl typically takes 15-30 minutes depending on server response times. Vexifa respects your server's crawl rate — you can configure the crawl delay in settings to avoid putting load on production during business hours.

What to look for in the crawl report

HTTP status codes. Export the full URL list and filter by status. Every 4xx response is a broken page. Every 3xx is a redirect — follow the chain to identify loops (A → B → A) and long chains (A → B → C → D) that slow crawling and dilute link equity. Best practice: no more than one redirect hop between any two URLs.

Redirect chains. A redirect chain occurs when a URL redirects to a URL that also redirects. Googlebot has a redirect hop limit — long chains cause pages to be crawled later or not at all. Fix: update all internal links pointing to the original URL to point directly to the final destination.

Orphaned pages. These are pages that exist on the server and are indexed, but have no internal links pointing to them. Googlebot discovers them through the sitemap but cannot reach them through normal navigation — which means they receive no internal link equity. Fix: add relevant internal links from related content, or consolidate if the pages have no strategic value.

Crawl depth check

Filter your crawl report by click depth from the homepage. Any page that requires more than 4 clicks to reach from the homepage is a crawl depth problem. Important commercial pages (product pages, key service pages) should be reachable in 2-3 clicks. Vexifa SEO's link graph visualisation makes structural depth problems immediately visible.

2

Indexation audit: what Google has and what it's missing

Crawlability tells you whether bots can reach your pages. Indexation tells you whether Google has actually indexed them and whether the right pages are in the index.

GSC Page Indexing report

In Google Search Console, open the Indexing → Pages report. You will see two categories that require different responses:

Noindex audit

The most expensive technical SEO error is accidentally noindexing a revenue page. Cross-reference your crawl data (which pages have a noindex directive) against your GSC indexed pages list. Any page that should be indexed but has <meta name="robots" content="noindex"> or an X-Robots-Tag: noindex header needs immediate attention.

XML sitemap audit

Your sitemap should contain only URLs that return a 200 response and do not have a noindex directive. Submit your sitemap in GSC and check the coverage report for errors. Common issues:

Canonical tag audit

Vexifa SEO surfaces canonical tag data in the crawl report. Check for:

3

Core Web Vitals: the 2026 thresholds

Core Web Vitals are the performance metrics Google uses as a ranking factor in its Page Experience signal. The 2026 thresholds are:

Metric What it measures Good Needs improvement Poor
LCP — Largest Contentful Paint How quickly the main content loads < 2.5s 2.5–4s > 4s
INP — Interaction to Next Paint Responsiveness to user input < 200ms 200–500ms > 500ms
CLS — Cumulative Layout Shift Visual stability during load < 0.1 0.1–0.25 > 0.25

Field data vs. lab data

There are two types of CWV data and the distinction matters for diagnosis:

Field data (CrUX) is collected from real Chrome users browsing your site over the past 28 days. This is what Google uses for ranking. Find it in GSC under Experience → Core Web Vitals. If your site has low traffic, CrUX data may be unavailable — in this case, lab data is your proxy.

Lab data is generated by running a synthetic test in a simulated environment. Google PageSpeed Insights shows both. Lab data is useful for diagnosing the cause of a poor metric, even if the numbers differ from field data.

Common CWV failure causes

LCP failures are most commonly caused by: large unoptimised hero images, render-blocking resources (JavaScript or CSS that blocks the main thread before the largest element can render), and slow server response times (TTFB over 600ms). Fix: compress images, use modern formats (WebP/AVIF), add fetchpriority="high" to your LCP element's image tag, and audit third-party scripts.

INP failures are caused by long JavaScript tasks that block the main thread. Fix: defer non-critical JavaScript, split long tasks, reduce third-party script load.

CLS failures are caused by elements that load after the page has rendered and push other elements around. Fix: define explicit width and height on all images and video embeds; avoid inserting DOM elements above existing content after load; use CSS contain for ad containers.

4

Structured data audit: schema that works for search and AI

Structured data has shifted from an optional enhancement to a foundational requirement in 2026. AI-generated search answers actively use schema markup to identify authoritative, citation-ready content. A page without properly implemented schema is structurally less likely to be cited in AI Overviews or Perplexity answers, regardless of its content quality.

Schema types to audit

Check for the presence and correctness of these schema types on relevant pages:

Validation

Test each important page template using Google Rich Results Test. Check for both errors (which prevent rich results) and warnings (which reduce eligibility). Common errors: missing required fields, incorrect property types, schema pointing to a different URL than the page's canonical.

AI citation note

FAQPage schema is particularly valuable for AI citation eligibility. AI search systems prefer content that provides explicit question-and-answer pairings in structured form. Every page with a FAQ section should have FAQPage JSON-LD — it takes under 10 minutes per page and directly improves your citation surface area in AI-generated answers.

5

AI crawler access & GEO visibility audit

This is the section missing from every other technical SEO audit guide. Traditional crawlability audits check whether Googlebot can access your pages. In 2026, you also need to check whether AI crawlers can access your pages — and whether your content is structured to be cited by AI-generated answers.

Understanding the three bot tiers

Not all AI bots are the same. There are three distinct categories, each with different implications for your audit:

Audit your robots.txt

Open /robots.txt on your domain and check for any rules that block AI crawlers. A common error is an overly broad User-agent: * Disallow: / rule, or a Disallow: / directive for crawlers added post-pandemic whose relevance was not considered when added.

Specifically check for blocks on:

JavaScript rendering audit

AI crawlers are typically less sophisticated than Googlebot at rendering JavaScript. If your key content — product descriptions, article bodies, FAQ sections — is rendered client-side via JavaScript, AI crawlers may be indexing a blank page.

Check this quickly: open Chrome DevTools (F12), go to Network → Disable JS (in the Network conditions tab), and reload your page. If the core content disappears, it is client-side rendered and invisible to most AI crawlers. Fix: use server-side rendering (SSR) or static generation for content pages.

Citation-ready content audit

Beyond crawler access, your content needs to be structured in a way that makes it easy for AI systems to extract and cite. Run the following checks:

GEO visibility audit in Vexifa SEO

Vexifa SEO's GEO/AI search visibility audit automates the above checks and also tests actual AI citation presence — querying AI search systems for your brand name and core topic areas to verify whether your content is surfacing in AI-generated answers. Run this after fixing any robots.txt or rendering issues found manually.

The llms.txt file

A llms.txt file, analogous to robots.txt, signals to AI systems which pages contain your primary content and how to interpret your site structure. It is not yet a ranking factor, but its adoption is growing among sites that actively want AI citation. If you have structured, authoritative content, adding a basic llms.txt is a low-effort signal worth considering.

6

Site architecture & internal linking

Site architecture determines how link equity flows through your site and how efficiently crawlers move through your pages. Poor architecture means important pages receive less crawl attention and less internal link equity — both of which suppress rankings.

Crawl depth

Googlebot has a finite crawl budget for every site. Pages buried deep in the architecture — more than 4 clicks from the homepage — are crawled infrequently. Check the crawl depth map in Vexifa SEO:

If revenue pages (product pages, key landing pages) are 5+ clicks deep, add internal links from the homepage, navigation, or high-authority content pages to bring them within 3 clicks.

Internal link equity distribution

Vexifa SEO's link graph shows the internal link structure as a node graph. Pages with many incoming internal links receive more crawl attention and PageRank signal. Check whether your most important pages have the most internal links — not just any internal links, but contextual links from relevant, high-authority pages.

Anchor text

Review the anchor text distribution on internal links pointing to your key pages. Over-optimised anchor text (the same exact-match keyword on every internal link) is a quality signal concern. Varied, natural anchor text that describes the linked page accurately is correct.

7

Log file analysis without enterprise tools

Log file analysis is the most underused technique in technical SEO — and the one that separates shallow audits from thorough ones. Server logs show you what actually happened: which pages Googlebot visited, how often, whether it encountered errors, and (increasingly important) which AI crawlers are hitting your site.

This used to require enterprise tools like Botify or OnCrawl ($500-3,000+/month). It does not have to.

Accessing your server logs

How you access logs depends on your hosting:

Analysing with GoAccess (free)

GoAccess is a free, open-source log analyser that runs in the terminal or generates an HTML report. Install it via your package manager (sudo apt install goaccess on Ubuntu) and run:

goaccess access.log --log-format=COMBINED -o report.html

The resulting HTML report shows: top requested URLs, HTTP status codes, visitor agents (including bot user agents), response times, and bandwidth. Filter by user agent to isolate Googlebot, OAI-SearchBot, PerplexityBot, and other crawlers.

What to look for in your logs

Prioritising and acting on findings

A thorough technical audit surfaces dozens of issues. Not all of them matter equally. Use this triage framework to work through findings in the order of impact:

Priority Issue type Examples Urgency
P0 — Critical Blocks crawling or indexation entirely Entire site noindexed, robots.txt blocking Googlebot, homepage returning 500 Fix immediately
P1 — High Revenue pages missing from index; CWV failures Key product pages crawled-not-indexed, LCP > 4s, broken canonical on homepage Fix within days
P2 — Medium Schema errors; AI crawler blocks; orphaned high-value pages FAQPage schema failing validation, PerplexityBot blocked, 20 orphaned blog posts Fix within 2-4 weeks
P3 — Low Architecture improvements; minor schema warnings; cosmetic issues Category pages 5 clicks deep, missing BreadcrumbList schema, missing meta descriptions on low-traffic pages Fix in next quarter

Vexifa SEO's AI audit module automatically categorises findings by priority and generates a fix recommendation for each issue. For complex sites, run the AI audit report first to get the prioritised list before diving into the raw crawl data.

Complete technical SEO audit checklist

Crawlability

  • Run unlimited crawl — no tool-imposed URL cap
  • Identify all 4xx pages and document which have internal links pointing to them
  • Map redirect chains — resolve any chain longer than one hop
  • Identify redirect loops
  • Review robots.txt — confirm no accidental blocks on revenue pages
  • Check for orphaned pages (no internal links)
  • Verify XML sitemap contains only indexable 200-status URLs
  • Submit sitemap to GSC and review coverage report

Indexation

  • Review GSC Page Indexing — crawled-not-indexed list
  • Review GSC Page Indexing — discovered-not-indexed list
  • Audit all noindex directives — confirm only low-value pages are excluded
  • Verify canonical tags point to correct final destination URLs
  • Check for duplicate canonicals across paginated pages
  • Identify and fix soft 404s
  • Check GSC for manual actions

Core Web Vitals

  • Check GSC Core Web Vitals — identify failing URLs
  • Run PageSpeed Insights on homepage, key landing pages, highest-traffic pages
  • LCP: identify LCP element and audit its load path
  • INP: identify long JavaScript tasks; audit third-party scripts
  • CLS: check for images/embeds without explicit dimensions
  • Verify TTFB is under 600ms from primary geographies
  • Check mobile CWV separately from desktop

Structured Data

  • Validate Organization schema on homepage with Rich Results Test
  • Validate Article schema on all blog/guide pages
  • Validate FAQPage schema on all FAQ sections
  • Validate BreadcrumbList on all non-homepage pages
  • Validate Product/Offer schema on all product pages
  • Check GSC Rich Results report for errors and impressions
  • Confirm sameAs fields contain verified social profile URLs

AI / GEO Visibility

  • Review robots.txt for AI search crawler blocks (OAI-SearchBot, PerplexityBot, Claude-SearchBot)
  • Test JavaScript rendering — verify core content loads without JS enabled
  • Check BLUF content format — key pages answer the question in paragraph one
  • Verify comparison tables are in HTML format, not images
  • Run Vexifa SEO GEO visibility audit
  • Manually query ChatGPT and Perplexity for brand name + core service queries
  • Consider adding llms.txt to signal AI-accessible content structure

Site Architecture

  • No revenue page deeper than 3 clicks from homepage
  • Identify all pages deeper than 4 clicks — evaluate for consolidation or link addition
  • Check internal link distribution — most links pointing to most important pages
  • Audit anchor text on internal links to key pages — varied and descriptive
  • Identify topic clusters — related content interlinked correctly

Log File Analysis

  • Access server logs for last 30 days
  • Run GoAccess to generate crawl report
  • Filter by Googlebot — identify which pages are crawled most and least frequently
  • Filter by AI crawlers — verify OAI-SearchBot, PerplexityBot are accessing the site
  • Identify 4xx errors Googlebot encountered (may differ from crawl-tool findings)
  • Check crawl distribution — low-value pages consuming disproportionate crawl budget

Frequently asked questions

Can I do a complete technical SEO audit without paying for tools?

Yes. A complete audit — unlimited site crawl, indexation review, Core Web Vitals, structured data validation, GEO visibility, and log file analysis — can be done using Google Search Console (free), Google PageSpeed Insights (free), Google Rich Results Test (free), Vexifa SEO (free Windows desktop app with unlimited crawl and GEO audit), and GoAccess (free open-source log analyser). No SaaS subscription is needed for any step in this guide.

What is the difference between Screaming Frog free and Vexifa SEO?

Screaming Frog's free tier crawls a maximum of 500 URLs — counting every image, script, and stylesheet discovered, not just pages. A 200-page site with standard assets can exhaust this limit mid-crawl. Vexifa SEO is a free Windows desktop application with no crawl limit. It also includes a GEO/AI search visibility audit and AI SEO assistant, which Screaming Frog does not offer at any price tier.

How often should I run a technical SEO audit?

Run a full technical SEO audit quarterly. Check Core Web Vitals monthly via GSC's Experience report. Monitor index coverage weekly through GSC's Pages report. After any major site change — new theme, URL migration, CMS switch — run a full audit immediately before changes go live and again within two weeks of deployment.

Does a technical SEO audit cover backlinks and content?

No. A technical SEO audit covers the infrastructure layer: crawlability, indexation, performance, structured data, and site architecture. Backlink analysis is a separate process requiring a dedicated tool (Ahrefs Webmaster Tools is free for your own site). Content quality audit is also separate — assessing whether your pages match user intent and provide depth relative to competing pages.

What is a GEO visibility audit in SEO?

A GEO (Generative Engine Optimization) visibility audit checks whether your pages are cited in AI-generated search answers from Google AI Overviews, ChatGPT, and Perplexity. It covers: whether AI crawlers are blocked in your robots.txt, whether your content uses citation-ready formats (structured answers, FAQ schema, comparison tables), whether JavaScript rendering blocks AI indexing, and whether your brand and core topics appear in AI search results. Vexifa SEO includes a GEO visibility audit as part of its AI audit module.

Bottom line

A complete technical SEO audit — one that covers crawlability, indexation, Core Web Vitals, structured data, AI visibility, site architecture, and log files — does not require a SaaS subscription or a per-URL crawler limit.

The three capabilities that most audit tools cannot provide without payment:

Run this audit quarterly. The first pass will surface issues. The second pass will show whether your fixes held. The third pass will start showing ranking and visibility improvements from the work you have done.

Dave Rupe

Founder of Vexifa. Built Vexifa SEO after spending years running technical SEO audits with a combination of Screaming Frog, Semrush, and spreadsheets — and wanting a single tool that handled the full workflow without a monthly subscription.