How to Find Orphan Pages on a Website

How to Find Orphan Pages on a Website

Orphan pages are web pages on your site that lack internal links pointing to them, making them inaccessible through normal navigation or crawling. These pages can harm your SEO, as search engines may not index them, and they can frustrate users who can’t find valuable content. Identifying and fixing them is crucial for a well-connected website.

If you are not sure about what orphan pages are or how they are going to affect your SEO, make sure to visit our other blog first by clicking here.

Now if you are confident with what orphan pages are, let’s dive deep into how to find orphan pages on a website like a pro.

Tools for Identifying Orphan Pages

To pinpoint orphan pages effectively, you’ll need specialized tools designed to analyze your website’s structure and connectivity. Here are some of the most effective tools, along with details on how they help and examples of their use:

  • Website Crawlers: Tools like Screaming Frog SEO Spider, Ahrefs, and SEMrush are essential for mapping your site’s structure. They crawl your website, mimicking search engine bots, and generate reports on all pages, including those not linked internally. For example, Screaming Frog’s “Orphan URLs” report highlights pages that exist on your server but aren’t linked from any other page. You can export this list to prioritize fixes.
  • Sitemap Generators: Tools like XML-Sitemaps.com or the sitemap feature in Yoast SEO create comprehensive sitemaps of your site. By comparing the sitemap to the pages found during a crawl, you can identify discrepancies. For instance, if a page appears in your server’s file structure but not in the sitemap or crawl, it’s likely an orphan.
  • Analytics Analytics Platforms: Google Analytics and Google Search Console can reveal pages that receive no traffic or impressions, which may indicate orphan status. For example, in Google Analytics, navigate to Behavior > Site Content > All Pages and filter for pages with zero sessions. Cross-reference these with your crawl data to confirm if they’re unlinked.
  • Server Log Analyzers: Tools like Loggly or Splunk analyze server logs to show which pages are accessed by users or bots. Pages that never appear in logs, despite existing on the server, are potential orphans. For example, a blog post from an old campaign might exist but receive no requests, signaling it’s disconnected.

Using a combination of these tools ensures thorough detection. For instance, you might start with Screaming Frog to identify potential orphans, then cross-check with Google Analytics to confirm they’re not receiving traffic, and finally review server logs to ensure they’re not being accessed indirectly. This multi-tool approach minimizes false positives and gives you a clear picture of your site’s orphan pages.

Step-by-Step Process to Find Orphan Pages

Running a Website Crawl

Website crawlers are your first line of defense for finding orphan pages. Here’s how to use the mentioned tools effectively:

  • Screaming Frog SEO Spider:
    1. Download and install Screaming Frog (free version works for small sites; paid for larger ones).
    2. Enter your website’s URL in the input field and click “Start” to begin the crawl.
    3. Once the crawl is complete, go to the “Reports” menu and select “Orphan URLs.” This lists pages found on the server but not linked from any other page.
    4. Export the report as a CSV file for further analysis. For example, you might find an old product page (e.g., example.com/old-product) that’s no longer linked in your navigation or content.
    5. Check the “Internal” tab to verify if these pages are truly unlinked by ensuring they have zero inlinks.
  • Ahrefs:
    1. Log into Ahrefs and navigate to “Site Audit.”
    2. Set up a new project or select your website, then run a crawl.
    3. After the crawl, go to the “Issues” tab and look for “Orphan Pages” under the “Links” section.
    4. Ahrefs will list pages with no internal links pointing to them. For instance, you might discover a blog post like example.com/blog/2020-promo that was never linked after a site redesign.
    5. Use the “Export” feature to download the list for review.
  • SEMrush:
    1. Log into SEMrush and go to the “Site Audit” tool under the SEO Toolkit.
    2. Create a new project or select an existing one, then configure the crawl settings (ensure “Check broken links” and “Check internal links” are enabled).
    3. Run the audit, and once complete, navigate to the “Issues” tab. Look for the “Orphan Pages” issue, which lists pages with no incoming internal links.
    4. For example, you might find a landing page like example.com/spring-campaign that was created for a temporary promotion but never linked to the main site structure.
    5. Click on each issue to view details and export the list of orphan pages for further analysis or to share with your team.

Checking Sitemaps and Links

How to Find Orphan Pages on a Website - Checking Sitemaps and Links

Sitemaps and internal link audits help confirm orphan pages by revealing gaps in your site’s structure:

  • Using XML-Sitemaps.com:
    1. Visit XML-Sitemaps.com, enter your website URL, and generate a sitemap.
    2. Download the generated XML file and open it in a text editor or spreadsheet.
    3. Compare the URLs in the sitemap to the pages found by your crawler (e.g., Screaming Frog). Pages not listed in the sitemap but present on the server are potential orphans.
    4. For example, a page like example.com/hidden-page might appear in the server files but not in the sitemap, indicating it’s unlinked.
    5. Manually check your website’s navigation and content to ensure these pages aren’t linked elsewhere.
  • Yoast SEO (for WordPress Sites):
    1. In your WordPress dashboard, go to Yoast SEO > Tools > XML Sitemaps.
    2. Generate or view your existing sitemap.
    3. Download the sitemap and compare it with your crawler’s findings, similar to XML-Sitemaps.com.
    4. Additionally, use Yoast’s “Internal Linking” tool to identify pages with no incoming links. For instance, a page like example.com/standalone standalone-guide might show zero internal links.
    5. Cross-check by searching your site’s content for links to these pages using your CMS’s search function.

Reviewing Server Logs

Server logs offer a detailed, real-world view of which pages are accessed by users, bots, or crawlers, making them a powerful tool for identifying orphan pages that other methods might miss. Logs record every request made to your server, including the URL, timestamp, and source (user or bot). Pages that exist on your server but have no requests logged over a significant period are strong candidates for orphan status. 

Server logs are particularly useful for large or complex websites where crawlers might miss edge cases, such as pages only accessible via outdated external links or API endpoints. They also help validate findings from crawlers and analytics by grounding your analysis in actual server activity. To maximize accuracy, analyze logs over a longer period (e.g., 90 days) to account for seasonal traffic variations, and ensure your server logs are properly configured to capture all requests, including those from crawlers and bots.

Here’s how to use the mentioned tools, with additional details:

  • Using Loggly:
    1. Set up Loggly by integrating it with your server to collect logs (most web hosts provide access to raw log files, or you can configure Loggly to pull logs via API).
    2. In the Loggly dashboard, create a query to list all URLs accessed over a specific period, such as the last 30 or 90 days (e.g., use a query like http.status:200 to focus on successful page requests).
    3. Filter the results to show unique URLs and their request counts. Compare this list with the pages found by your crawler (e.g., Screaming Frog or SEMrush). Pages present in the crawler’s output but absent from Loggly’s results are likely orphans.
    4. For example, a page like example.com/deprecated-service might show zero requests over 90 days, confirming it’s not being accessed or linked. This could be a retired service page forgotten during a site update.
    5. Export the log data as a CSV for detailed comparison with other tools. You can also set up alerts in Loggly to monitor for pages with consistently zero requests, automating future orphan detection.
    6. Bonus tip: Use Loggly’s visualization tools to spot trends, such as entire directories (e.g., example.com/old-campaigns/) that receive no traffic, indicating multiple orphan pages.
  • Using Splunk:
    1. Install Splunk and upload your server log files (e.g., Apache or Nginx logs) or configure Splunk to collect logs in real-time from your server.
    2. Use Splunk’s Search Processing Language (SPL) to query all requested URLs. For example, run a query like index=web | stats count by uri to list all URLs and their request counts.
    3. Filter for pages with zero or very low request counts, then compare these with your crawler’s output. Pages that exist in the crawler’s data but have no requests in Splunk are potential orphans.
    4. For instance, an old campaign page like example.com/summer-sale-2019 might show no requests over the past six months, suggesting it’s unlinked and forgotten.
    5. Use Splunk’s visualization tools, such as charts or tables, to identify patterns, like entire subdirectories that are unaccessed. This can reveal bulk orphan pages, such as example.com/legacy-blog/*.
    6. Save the results as a report or export them for cross-referencing with sitemap and analytics data. Splunk’s advanced features, like machine learning add-ons, can also predict potential orphans by analyzing access trends over time.
    7. Additional step: Check for bot activity in logs (e.g., Googlebot requests) to ensure search engines aren’t accessing these pages indirectly, which could indicate a misconfigured sitemap or external link.

Conclusion

Finding and verifying orphan pages is a critical step in optimizing your website’s performance and SEO. By using tools like Screaming Frog, Ahrefs, SEMrush, sitemap generators, Google Analytics, and server log analyzers like Loggly or Splunk, you can systematically identify pages that lack internal links. Verifying these pages through manual checks, external link analysis, JavaScript inspections, indexing status, and analytics ensures you’re addressing true orphans without disrupting accessible content. Taking the time to fix or remove these disconnected pages will enhance your site’s crawlability, improve user experience, and boost search engine rankings, creating a seamless and efficient website structure that benefits both visitors and your SEO goals.

Meta description: To know how to find orphan pages on a website you need to use a variety of tools like sitemaps and SEMRush but it is easier than most think.

OC Digital Firm - Marketing Agency in Orange County - Contact

Like This Article?

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

More Articles