Skip to content

Content Checks

recon-web runs 5 content checks that analyze how the site communicates with search engines, social platforms, and users through its on-page structure and metadata.


Handler: robots-txt

Fetches and parses the /robots.txt file, which tells search engine crawlers which pages to index and which to skip. Extracts User-Agent rules, Disallow/Allow directives, Crawl-delay settings, and sitemap references.

Why it matters: A misconfigured robots.txt can accidentally hide an entire site from search engines or, conversely, expose paths that should remain private. Attackers also read robots.txt to discover admin panels and sensitive directories.

Good result: File present, not blocking important pages, and not inadvertently revealing sensitive paths. A Sitemap directive points to the XML sitemap.

Bad result: Disallow: / blocking all crawlers (sometimes set during development and never removed), or listing paths like /admin, /wp-login, /api that help attackers find entry points. Missing file means crawlers index everything, including staging content or internal API docs.


Handler: sitemap

Finds and parses the XML sitemap. First checks for a sitemap reference in robots.txt, then tries common paths (/sitemap.xml, /sitemap_index.xml). Counts the number of URLs and extracts the sitemap structure.

Why it matters: Sitemaps help search engines discover all pages on your site and understand its hierarchy. Without one, search engines rely solely on link crawling, which is slower and less complete.

Good result: Sitemap found, well-formed XML, and contains current URLs. Sitemap index present for large sites with multiple sitemap files.

Bad result: No sitemap found (search engines may miss pages, especially on large or JavaScript-rendered sites). Outdated sitemap listing removed pages causes crawl waste and 404 errors in search results.


Handler: social-tags

Extracts social sharing metadata from the page HTML:

Tag typePlatformKey properties
OpenGraph (og:)Facebook, LinkedIn, Discordog:title, og:description, og:image, og:url
Twitter Cards (twitter:)X (Twitter)twitter:card, twitter:title, twitter:description, twitter:image
Standard metaAll platforms (fallback)<title>, meta description

Why it matters: These tags control how the page appears when shared on social media. Missing tags mean platforms generate their own (often poor) preview, which reduces click-through rates.

Good result: Complete OpenGraph tags including og:image (with a large, high-quality image), og:title, og:description, and matching Twitter Card tags. Meta description present and concise (under 160 characters).

Bad result: Missing og:image (shared links show no preview image, dramatically lowering engagement), wrong og:title or og:description (misleading previews), or no meta description (search engines generate a random snippet).


Handler: linked-pages

Analyzes all links (<a href="...">) on the page. Counts internal links (same domain), external links (different domains), and identifies which external domains are linked to.

Why it matters: Link structure affects both SEO and security. Too many external links dilute page authority and can look spammy. Links to compromised or malicious sites can get your own domain penalized by search engines.

Good result: A healthy mix of internal links (good for navigation and SEO) with a reasonable number of external links to reputable domains. No broken links.

Bad result: Very high external link ratio (may indicate spam or a compromised page with injected links), links to known malicious domains, or an excessive total link count that suggests automated content.


Handler: seo

Performs a comprehensive on-page SEO audit by fetching the HTML and analyzing key ranking factors. Returns a score from 0 to 100 based on the issues found.

Factors checked:

FactorWhat it checks
Title tagPresent, correct length (50-60 chars)
Meta descriptionPresent, correct length (120-160 chars)
HeadingsExactly one H1, logical heading hierarchy (H1 > H2 > H3)
ImagesAlt text coverage on all <img> elements
Canonical URL<link rel="canonical"> present and correct
Viewport meta<meta name="viewport"> present for mobile support
Structured dataJSON-LD schema markup (enables rich snippets)
HreflangLanguage/region tags for multilingual sites
Meta robotsNo unintentional noindex or nofollow
Content qualityWord count, text-to-HTML ratio

Why it matters: On-page SEO directly affects search engine rankings. Missing titles, duplicate H1s, and images without alt text are among the most common issues that prevent pages from ranking well.

Good result: Score of 80 or above. Title and meta description present with correct length, single H1, all images have alt text, canonical URL set, structured data present.

Bad result: Score below 50. Typical issues include missing title tag, no H1 (or multiple H1s), images without alt text (accessibility and SEO issue), missing structured data (no rich snippets in search results), or thin content (fewer than 300 words).