Technical SEO Audit Without SaaS: 2026 Guide

Q: Can I do a complete technical SEO audit without paying for tools?

Yes. A complete technical SEO audit — including unlimited site crawl, indexation review, Core Web Vitals, structured data validation, GEO/AI visibility, and log file analysis — can be done using Google Search Console (free), Google PageSpeed Insights (free), Google Rich Results Test (free), Vexifa SEO (free Windows desktop app with unlimited crawl and GEO audit), and GoAccess (free open-source log analyser). No SaaS subscription required.

Q: What is the difference between Screaming Frog free and Vexifa SEO?

Screaming Frog's free tier crawls a maximum of 500 URLs. This includes every image, script, stylesheet, and page — so a site with 200 pages can exhaust the free limit before crawling the full site. Vexifa SEO is a free Windows desktop application with no crawl limit — it crawls your entire site regardless of size. Vexifa SEO also includes a GEO/AI search visibility audit and AI SEO assistant, which Screaming Frog does not offer.

Q: How often should I run a technical SEO audit?

For most small and medium-sized websites, run a full technical SEO audit quarterly. Check Core Web Vitals monthly using Google Search Console's Experience report. Monitor index coverage weekly via GSC's Pages report. After any major site update (new theme, migration, URL restructure), run a full audit immediately. If you use Vexifa SEO, recurring scheduled crawls can automate the ongoing monitoring layer.

Q: Does a technical SEO audit cover backlinks and content?

No. A technical SEO audit covers crawlability, indexation, site performance, structured data, and site architecture — the infrastructure layer. Backlink auditing and content auditing are separate processes. This guide covers technical SEO only. Backlink analysis requires a dedicated tool such as Ahrefs Webmaster Tools (free for your own site) or a paid backlink tool.

The tool trap every audit guide falls into

Open any technical SEO audit guide and you hit the same recommendation within the first 200 words: Screaming Frog. It is a good crawler. Its free tier caps at 500 URLs — and that limit counts every image, stylesheet, script, and page the crawler discovers. A modest blog with 300 pages and 400 images will exhaust the free limit before finishing the crawl.

The solution every guide recommends: buy a Screaming Frog license (£199/year), or upgrade to Semrush ($120+/month), or use Ahrefs ($129+/month). That is not a free audit — that is an expensive one with a free preamble.

There are three things no current top-ranking technical SEO audit guide delivers together:

A complete audit with no URL cap and no subscription
A GEO/AI search visibility audit integrated into the standard technical workflow — not treated as a separate discipline
Log file analysis without a $500+/month enterprise tool

This guide covers all three. The primary crawler used throughout is Vexifa SEO — a free Windows desktop application with no crawl limit. Every other tool referenced is free. No subscription is required at any step.

What a technical SEO audit actually covers

Before crawling a single page, define the scope. A technical SEO audit covers the infrastructure layer — the signals that determine whether search engines and AI crawlers can find, interpret, and rank your content. It does not cover:

Backlink analysis (a separate audit using tools like Ahrefs Webmaster Tools)
Content quality or keyword targeting (a content audit)
Conversion rate or UX issues (a CRO audit)

A technical SEO audit covers seven areas:

#	Area	What you're checking	Primary tool
1	Crawlability	Can crawlers reach your pages? Redirect chains, robots.txt, broken links	Vexifa SEO
2	Indexation	Are the right pages indexed? Noindex issues, canonical errors, sitemap accuracy	GSC + Vexifa SEO
3	Core Web Vitals	LCP, INP, CLS thresholds — field data and lab data	GSC + PageSpeed Insights
4	Structured data	Schema implementation, rich result eligibility, errors	Rich Results Test + Vexifa SEO
5	AI/GEO visibility	AI crawler access, citation-ready content, GEO audit	Vexifa SEO (GEO audit)
6	Site architecture	Crawl depth, orphaned pages, internal link distribution	Vexifa SEO
7	Log file analysis	Actual Googlebot and AI bot behaviour, crawl distribution	GoAccess (free)

Tools you'll need (all free)

Free tools used in this guide

Vexifa SEO — Windows desktop app, free, unlimited crawl, AI audit, GEO visibility audit. Download here.
Google Search Console — free, first-party indexation and CWV data. Requires site ownership verification.
Google PageSpeed Insights — free, Core Web Vitals lab data with field data overlay.
Google Rich Results Test — free structured data validator at search.google.com/test/rich-results
GoAccess — free open-source log analyser for Step 7. Optional but recommended for larger sites.
Chrome DevTools — built into Chrome, no install needed. Used for rendering and JS audit.

Total cost: $0. Time to complete a full audit for a typical small business site: approximately 4-6 hours for the first pass, with subsequent quarterly audits taking 2-3 hours once you have a baseline.

Crawl your entire site — no URL cap

The crawl is the foundation. Every finding in steps 2, 4, and 6 comes from crawl data. If your crawler stops at 500 URLs, your audit has blind spots.

Set up the crawl in Vexifa SEO

Open Vexifa SEO, enter your root domain (e.g. https://example.com), and start a full site crawl. Vexifa crawls all pages, images, stylesheets, scripts, and linked resources, logging the HTTP status code, response time, redirect chain, title, description, heading structure, canonical tag, noindex directive, and internal/external link data for each URL. There is no page limit.

For a site with 5,000 pages, a full crawl typically takes 15-30 minutes depending on server response times. Vexifa respects your server's crawl rate — you can configure the crawl delay in settings to avoid putting load on production during business hours.

What to look for in the crawl report

HTTP status codes. Export the full URL list and filter by status. Every 4xx response is a broken page. Every 3xx is a redirect — follow the chain to identify loops (A → B → A) and long chains (A → B → C → D) that slow crawling and dilute link equity. Best practice: no more than one redirect hop between any two URLs.

Redirect chains. A redirect chain occurs when a URL redirects to a URL that also redirects. Googlebot has a redirect hop limit — long chains cause pages to be crawled later or not at all. Fix: update all internal links pointing to the original URL to point directly to the final destination.

Orphaned pages. These are pages that exist on the server and are indexed, but have no internal links pointing to them. Googlebot discovers them through the sitemap but cannot reach them through normal navigation — which means they receive no internal link equity. Fix: add relevant internal links from related content, or consolidate if the pages have no strategic value.

Crawl depth check

Filter your crawl report by click depth from the homepage. Any page that requires more than 4 clicks to reach from the homepage is a crawl depth problem. Important commercial pages (product pages, key service pages) should be reachable in 2-3 clicks. Vexifa SEO's link graph visualisation makes structural depth problems immediately visible.

Indexation audit: what Google has and what it's missing

Crawlability tells you whether bots can reach your pages. Indexation tells you whether Google has actually indexed them and whether the right pages are in the index.

GSC Page Indexing report

In Google Search Console, open the Indexing → Pages report. You will see two categories that require different responses:

Crawled — currently not indexed: Google reached the page but decided not to index it. Causes: thin content, duplicate content, low PageRank signal, soft 404. Fix: improve content quality, consolidate thin pages, or add internal links from higher-authority pages.
Discovered — currently not indexed: Google knows the URL exists (via sitemap or link) but has not crawled it yet. Often caused by crawl budget constraints or a deep crawl depth. Fix: add internal links from the homepage or high-authority pages, and ensure the URL is in your sitemap.

Noindex audit

The most expensive technical SEO error is accidentally noindexing a revenue page. Cross-reference your crawl data (which pages have a noindex directive) against your GSC indexed pages list. Any page that should be indexed but has <meta name="robots" content="noindex"> or an X-Robots-Tag: noindex header needs immediate attention.

XML sitemap audit

Your sitemap should contain only URLs that return a 200 response and do not have a noindex directive. Submit your sitemap in GSC and check the coverage report for errors. Common issues:

Redirected URLs in the sitemap (list final destinations only)
Noindexed pages in the sitemap (contradictory signal)
404 pages listed in the sitemap (broken after a migration)

Canonical tag audit

Vexifa SEO surfaces canonical tag data in the crawl report. Check for:

Missing canonicals — especially on paginated pages, category pages, and faceted navigation URLs
Canonical pointing to a different domain — legitimate for syndicated content but a bug if unintentional
Canonical pointing to a redirected URL — the canonical should always point to the final destination URL
Soft 404s — pages that return a 200 but display "no results found" content; Google treats these like 404s after enough signals

Core Web Vitals: the 2026 thresholds

Core Web Vitals are the performance metrics Google uses as a ranking factor in its Page Experience signal. The 2026 thresholds are:

Metric	What it measures	Good	Needs improvement	Poor
LCP — Largest Contentful Paint	How quickly the main content loads	< 2.5s	2.5–4s	> 4s
INP — Interaction to Next Paint	Responsiveness to user input	< 200ms	200–500ms	> 500ms
CLS — Cumulative Layout Shift	Visual stability during load	< 0.1	0.1–0.25	> 0.25

Field data vs. lab data

There are two types of CWV data and the distinction matters for diagnosis:

Field data (CrUX) is collected from real Chrome users browsing your site over the past 28 days. This is what Google uses for ranking. Find it in GSC under Experience → Core Web Vitals. If your site has low traffic, CrUX data may be unavailable — in this case, lab data is your proxy.

Lab data is generated by running a synthetic test in a simulated environment. Google PageSpeed Insights shows both. Lab data is useful for diagnosing the cause of a poor metric, even if the numbers differ from field data.

Common CWV failure causes

LCP failures are most commonly caused by: large unoptimised hero images, render-blocking resources (JavaScript or CSS that blocks the main thread before the largest element can render), and slow server response times (TTFB over 600ms). Fix: compress images, use modern formats (WebP/AVIF), add fetchpriority="high" to your LCP element's image tag, and audit third-party scripts.

INP failures are caused by long JavaScript tasks that block the main thread. Fix: defer non-critical JavaScript, split long tasks, reduce third-party script load.

CLS failures are caused by elements that load after the page has rendered and push other elements around. Fix: define explicit width and height on all images and video embeds; avoid inserting DOM elements above existing content after load; use CSS contain for ad containers.

Structured data audit: schema that works for search and AI

Structured data has shifted from an optional enhancement to a foundational requirement in 2026. AI-generated search answers actively use schema markup to identify authoritative, citation-ready content. A page without properly implemented schema is structurally less likely to be cited in AI Overviews or Perplexity answers, regardless of its content quality.

Schema types to audit

Check for the presence and correctness of these schema types on relevant pages:

Organization — on every page via sitewide include (or at minimum on homepage and about page). Must include name, url, and sameAs links to your verified social profiles.
Article — on all blog posts and guides. Must include headline, datePublished, dateModified, and author.
FAQPage — on pages with FAQ sections. Each question and answer is directly usable by AI search systems for cited answers.
BreadcrumbList — on all non-homepage pages. Improves sitelinks and signals page hierarchy.
Product and Offer — on all product or pricing pages. Required for rich results in e-commerce.
HowTo — on step-by-step guide pages like this one.
SoftwareApplication — on software product pages.

Validation

Test each important page template using Google Rich Results Test. Check for both errors (which prevent rich results) and warnings (which reduce eligibility). Common errors: missing required fields, incorrect property types, schema pointing to a different URL than the page's canonical.

AI citation note

FAQPage schema is particularly valuable for AI citation eligibility. AI search systems prefer content that provides explicit question-and-answer pairings in structured form. Every page with a FAQ section should have FAQPage JSON-LD — it takes under 10 minutes per page and directly improves your citation surface area in AI-generated answers.

AI crawler access & GEO visibility audit

This is the section missing from every other technical SEO audit guide. Traditional crawlability audits check whether Googlebot can access your pages. In 2026, you also need to check whether AI crawlers can access your pages — and whether your content is structured to be cited by AI-generated answers.

Understanding the three bot tiers

Not all AI bots are the same. There are three distinct categories, each with different implications for your audit:

Training crawlers (GPTBot, CCBot, Common Crawl) — these build AI training datasets. Blocking them is reasonable if you do not want your content in future model training data. They do not affect current AI search visibility.
AI search crawlers (OAI-SearchBot, PerplexityBot, Googlebot-Extended) — these power AI search answers in real time. Blocking these reduces your chances of appearing in AI search results.
User-triggered AI crawlers (ChatGPT-User, Claude-SearchBot) — these fire when a user asks an AI assistant to browse the web for real-time information. Blocking these means your site cannot be cited when users ask AI tools to research your products or services.

Audit your robots.txt

Open /robots.txt on your domain and check for any rules that block AI crawlers. A common error is an overly broad User-agent: * Disallow: / rule, or a Disallow: / directive for crawlers added post-pandemic whose relevance was not considered when added.

Specifically check for blocks on:

GPTBot — affects OpenAI training (acceptable to block)
OAI-SearchBot — affects ChatGPT search answers (consider unblocking)
PerplexityBot — affects Perplexity AI answers (consider unblocking)
Claude-SearchBot, anthropic-ai — affects Claude web search citations (consider unblocking)

JavaScript rendering audit

AI crawlers are typically less sophisticated than Googlebot at rendering JavaScript. If your key content — product descriptions, article bodies, FAQ sections — is rendered client-side via JavaScript, AI crawlers may be indexing a blank page.

Check this quickly: open Chrome DevTools (F12), go to Network → Disable JS (in the Network conditions tab), and reload your page. If the core content disappears, it is client-side rendered and invisible to most AI crawlers. Fix: use server-side rendering (SSR) or static generation for content pages.

Citation-ready content audit

Beyond crawler access, your content needs to be structured in a way that makes it easy for AI systems to extract and cite. Run the following checks:

BLUF format (Bottom Line Up Front): does each page lead with a direct answer before expanding? AI systems strongly prefer content that answers the question in the first paragraph.
Comparison tables: structured comparison tables are frequently cited verbatim in AI Overviews. Ensure key comparison content is in HTML table format, not images.
FAQ sections with FAQPage schema: see Step 4.
Explicit answer formatting: questions phrased as headings with direct answers in the paragraph below are more likely to be cited than flowing prose.

GEO visibility audit in Vexifa SEO

Vexifa SEO's GEO/AI search visibility audit automates the above checks and also tests actual AI citation presence — querying AI search systems for your brand name and core topic areas to verify whether your content is surfacing in AI-generated answers. Run this after fixing any robots.txt or rendering issues found manually.

The llms.txt file

A llms.txt file, analogous to robots.txt, signals to AI systems which pages contain your primary content and how to interpret your site structure. It is not yet a ranking factor, but its adoption is growing among sites that actively want AI citation. If you have structured, authoritative content, adding a basic llms.txt is a low-effort signal worth considering.

Site architecture & internal linking

Site architecture determines how link equity flows through your site and how efficiently crawlers move through your pages. Poor architecture means important pages receive less crawl attention and less internal link equity — both of which suppress rankings.

Crawl depth

Googlebot has a finite crawl budget for every site. Pages buried deep in the architecture — more than 4 clicks from the homepage — are crawled infrequently. Check the crawl depth map in Vexifa SEO:

Homepage and top-level pages: 1 click
Main category/section pages: 2 clicks
Individual posts, products, service pages: 2-3 clicks
Nothing important beyond: 4 clicks

If revenue pages (product pages, key landing pages) are 5+ clicks deep, add internal links from the homepage, navigation, or high-authority content pages to bring them within 3 clicks.

Internal link equity distribution

Vexifa SEO's link graph shows the internal link structure as a node graph. Pages with many incoming internal links receive more crawl attention and PageRank signal. Check whether your most important pages have the most internal links — not just any internal links, but contextual links from relevant, high-authority pages.

Anchor text

Review the anchor text distribution on internal links pointing to your key pages. Over-optimised anchor text (the same exact-match keyword on every internal link) is a quality signal concern. Varied, natural anchor text that describes the linked page accurately is correct.

Log file analysis without enterprise tools

Log file analysis is the most underused technique in technical SEO — and the one that separates shallow audits from thorough ones. Server logs show you what actually happened: which pages Googlebot visited, how often, whether it encountered errors, and (increasingly important) which AI crawlers are hitting your site.

This used to require enterprise tools like Botify or OnCrawl ($500-3,000+/month). It does not have to.

Accessing your server logs

How you access logs depends on your hosting:

Apache/Nginx shared hosting: access log files via your hosting control panel (cPanel, Plesk) under Logs. Look for access_log or access.log.
Cloudflare (with Logpush): Cloudflare Enterprise includes Logpush. On free/Pro plans, use Cloudflare's analytics as a proxy.
VPS/Dedicated: SSH into your server and access /var/log/apache2/access.log or /var/log/nginx/access.log.

Analysing with GoAccess (free)

GoAccess is a free, open-source log analyser that runs in the terminal or generates an HTML report. Install it via your package manager (sudo apt install goaccess on Ubuntu) and run:

goaccess access.log --log-format=COMBINED -o report.html

The resulting HTML report shows: top requested URLs, HTTP status codes, visitor agents (including bot user agents), response times, and bandwidth. Filter by user agent to isolate Googlebot, OAI-SearchBot, PerplexityBot, and other crawlers.

What to look for in your logs

Crawl distribution imbalance: if Googlebot is spending 80% of its crawl budget on low-value pages (e.g., tag archives, parameter URLs) and ignoring your core content pages, this signals a crawl budget problem. Fix: noindex the low-value pages or block them in robots.txt.
4xx errors in Googlebot logs: pages that return 404 or 410 when Googlebot visits them but appear fine in your browser (often a mobile vs. desktop rendering difference, or a geolocation issue). Fix: investigate and resolve the server-level discrepancy.
AI bot crawl patterns: check whether OAI-SearchBot, PerplexityBot, and similar are crawling your site at all. If they are not appearing in your logs despite not being blocked in robots.txt, your site may have technical barriers (JavaScript rendering, authentication, slow response times) that prevent AI crawler access.
Crawl frequency vs. content update frequency: if you publish new content daily but Googlebot only visits weekly, your crawl rate is too low. Add the new content to your sitemap's lastmod dates to signal freshness.

Prioritising and acting on findings

A thorough technical audit surfaces dozens of issues. Not all of them matter equally. Use this triage framework to work through findings in the order of impact:

Priority	Issue type	Examples	Urgency
P0 — Critical	Blocks crawling or indexation entirely	Entire site noindexed, robots.txt blocking Googlebot, homepage returning 500	Fix immediately
P1 — High	Revenue pages missing from index; CWV failures	Key product pages crawled-not-indexed, LCP > 4s, broken canonical on homepage	Fix within days
P2 — Medium	Schema errors; AI crawler blocks; orphaned high-value pages	FAQPage schema failing validation, PerplexityBot blocked, 20 orphaned blog posts	Fix within 2-4 weeks
P3 — Low	Architecture improvements; minor schema warnings; cosmetic issues	Category pages 5 clicks deep, missing BreadcrumbList schema, missing meta descriptions on low-traffic pages	Fix in next quarter

Vexifa SEO's AI audit module automatically categorises findings by priority and generates a fix recommendation for each issue. For complex sites, run the AI audit report first to get the prioritised list before diving into the raw crawl data.

Complete technical SEO audit checklist

Crawlability

Run unlimited crawl — no tool-imposed URL cap
Identify all 4xx pages and document which have internal links pointing to them
Map redirect chains — resolve any chain longer than one hop
Identify redirect loops
Review robots.txt — confirm no accidental blocks on revenue pages
Check for orphaned pages (no internal links)
Verify XML sitemap contains only indexable 200-status URLs
Submit sitemap to GSC and review coverage report

Indexation

Review GSC Page Indexing — crawled-not-indexed list
Review GSC Page Indexing — discovered-not-indexed list
Audit all noindex directives — confirm only low-value pages are excluded
Verify canonical tags point to correct final destination URLs
Check for duplicate canonicals across paginated pages
Identify and fix soft 404s
Check GSC for manual actions

Core Web Vitals

Check GSC Core Web Vitals — identify failing URLs
Run PageSpeed Insights on homepage, key landing pages, highest-traffic pages
LCP: identify LCP element and audit its load path
INP: identify long JavaScript tasks; audit third-party scripts
CLS: check for images/embeds without explicit dimensions
Verify TTFB is under 600ms from primary geographies
Check mobile CWV separately from desktop

Structured Data

Validate Organization schema on homepage with Rich Results Test
Validate Article schema on all blog/guide pages
Validate FAQPage schema on all FAQ sections
Validate BreadcrumbList on all non-homepage pages
Validate Product/Offer schema on all product pages
Check GSC Rich Results report for errors and impressions
Confirm sameAs fields contain verified social profile URLs

AI / GEO Visibility

Review robots.txt for AI search crawler blocks (OAI-SearchBot, PerplexityBot, Claude-SearchBot)
Test JavaScript rendering — verify core content loads without JS enabled
Check BLUF content format — key pages answer the question in paragraph one
Verify comparison tables are in HTML format, not images
Run Vexifa SEO GEO visibility audit
Manually query ChatGPT and Perplexity for brand name + core service queries
Consider adding llms.txt to signal AI-accessible content structure

Site Architecture

No revenue page deeper than 3 clicks from homepage
Identify all pages deeper than 4 clicks — evaluate for consolidation or link addition
Check internal link distribution — most links pointing to most important pages
Audit anchor text on internal links to key pages — varied and descriptive
Identify topic clusters — related content interlinked correctly

Log File Analysis

Access server logs for last 30 days
Run GoAccess to generate crawl report
Filter by Googlebot — identify which pages are crawled most and least frequently
Filter by AI crawlers — verify OAI-SearchBot, PerplexityBot are accessing the site
Identify 4xx errors Googlebot encountered (may differ from crawl-tool findings)
Check crawl distribution — low-value pages consuming disproportionate crawl budget

Frequently asked questions

Can I do a complete technical SEO audit without paying for tools?

Yes. A complete audit — unlimited site crawl, indexation review, Core Web Vitals, structured data validation, GEO visibility, and log file analysis — can be done using Google Search Console (free), Google PageSpeed Insights (free), Google Rich Results Test (free), Vexifa SEO (free Windows desktop app with unlimited crawl and GEO audit), and GoAccess (free open-source log analyser). No SaaS subscription is needed for any step in this guide.

What is the difference between Screaming Frog free and Vexifa SEO?

Screaming Frog's free tier crawls a maximum of 500 URLs — counting every image, script, and stylesheet discovered, not just pages. A 200-page site with standard assets can exhaust this limit mid-crawl. Vexifa SEO is a free Windows desktop application with no crawl limit. It also includes a GEO/AI search visibility audit and AI SEO assistant, which Screaming Frog does not offer at any price tier.

How often should I run a technical SEO audit?

Run a full technical SEO audit quarterly. Check Core Web Vitals monthly via GSC's Experience report. Monitor index coverage weekly through GSC's Pages report. After any major site change — new theme, URL migration, CMS switch — run a full audit immediately before changes go live and again within two weeks of deployment.

Does a technical SEO audit cover backlinks and content?

No. A technical SEO audit covers the infrastructure layer: crawlability, indexation, performance, structured data, and site architecture. Backlink analysis is a separate process requiring a dedicated tool (Ahrefs Webmaster Tools is free for your own site). Content quality audit is also separate — assessing whether your pages match user intent and provide depth relative to competing pages.

What is a GEO visibility audit in SEO?

A GEO (Generative Engine Optimization) visibility audit checks whether your pages are cited in AI-generated search answers from Google AI Overviews, ChatGPT, and Perplexity. It covers: whether AI crawlers are blocked in your robots.txt, whether your content uses citation-ready formats (structured answers, FAQ schema, comparison tables), whether JavaScript rendering blocks AI indexing, and whether your brand and core topics appear in AI search results. Vexifa SEO includes a GEO visibility audit as part of its AI audit module.

Bottom line

A complete technical SEO audit — one that covers crawlability, indexation, Core Web Vitals, structured data, AI visibility, site architecture, and log files — does not require a SaaS subscription or a per-URL crawler limit.

The three capabilities that most audit tools cannot provide without payment:

Unlimited crawl: Vexifa SEO, free, no cap
GEO/AI visibility audit: Vexifa SEO's audit module, integrated into the same workflow
Log file analysis without enterprise tools: GoAccess, free and open source

Run this audit quarterly. The first pass will surface issues. The second pass will show whether your fixes held. The third pass will start showing ranking and visibility improvements from the work you have done.

Dave Rupe

Founder of Vexifa. Built Vexifa SEO after spending years running technical SEO audits with a combination of Screaming Frog, Semrush, and spreadsheets — and wanting a single tool that handled the full workflow without a monthly subscription.

About X / Twitter GitHub Try Vexifa SEO Free →