What is value-based bidding and how it can drive smarter marketing success
Value-based bidding tells Google Ads how much each conversion is worth so the algorithm spends more to win the customers who...







According to a study of 16 million pages by IndexCheckr, 61.94% of pages were not indexed by Google. For most of those pages, the problem was not that Google disliked the content. It was that the site gave Google a reason to skip it.
This guide covers the 10 most common reasons Google does not index pages and what to do about each one. Whether you manage a 20-page site or a 200,000-URL ecommerce platform, these are the issues that show up most often in audits. Fix them and you remove the barriers between your content and the search results that drive traffic, leads, and revenue.
Google Search Console is the starting point for every indexing investigation. Open the Page Indexing report under Indexing and look at the "Not indexed" section. Every entry in that list tells you a specific reason Google skipped a page. Use the URL Inspection Tool for individual pages when you need to see exactly how Googlebot last crawled and rendered a URL.
The Page Indexing report groups issues by type. The two status labels you will see most often are "Crawled, currently not indexed" and "Discovered, currently not indexed." The first means Google visited the page but chose not to add it. The second means Google found the URL but has not yet prioritized it for crawling.
Start with the issues affecting the most pages. Click each issue type in the report to see a list of affected URLs and a short description of the problem. Use the URL Inspection Tool on a representative URL from each group to understand what Googlebot actually sees.
For site-wide audits, tools like Screaming Frog SEO Spider, Ahrefs, or Semrush help you surface patterns across thousands of URLs at once, including redirect chains, orphan pages, and duplicate content that Search Console may not fully expose.
Fixing an indexing issue does not produce instant results. Pages that meet Google's quality and technical requirements after a fix typically re-enter the index within four to eight weeks. Requesting indexing via the URL Inspection Tool speeds up the process for individual pages, but it does not guarantee rapid crawling at scale. Pair every request with a fresh sitemap submission.

A noindex tag tells Googlebot not to include a page in the index. This is useful for admin pages and checkout flows. It is damaging when it appears on blog posts, product pages, or landing pages you want to rank. Developers often add noindex tags during staging and forget to remove them before launch.
This is one of the most common and most damaging indexing mistakes. A single noindex directive in a page template can silently remove thousands of pages from Google's index. The problem often goes unnoticed for weeks because traffic drops gradually rather than overnight.
In December 2024, Google updated its JavaScript SEO documentation to clarify that Googlebot may skip JavaScript execution entirely when it encounters a noindex tag in the original HTML. This means you cannot rely on JavaScript to remove a noindex tag dynamically. If the tag is present in the original page code, the page may never be indexed regardless of what your CMS or JavaScript logic does afterward.
Developers building on WordPress sometimes install SEO plugins that include a global "discourage search engines" setting. This adds a noindex to every page on the site. Always verify this setting is off before any site launch.
A robots.txt error prevents Googlebot from visiting pages at all. Unlike a noindex tag, which prevents indexing of pages Googlebot can reach, a robots.txt disallow rule stops the crawler before it even reads the page. The page will not appear in the "not indexed" section of Search Console because Google never saw it.
Robots.txt errors are particularly dangerous because they are silent. Search Console flags pages it tried to crawl and found blocked, but it cannot report on pages it never discovered. Tools like Screaming Frog or WebSite Auditor are needed to identify pages that exist on the site but are blocked at the crawl level.
| Situation | Use robots.txt disallow | Use noindex tag |
|---|---|---|
| Staging or dev environment | Yes | Yes |
| Admin or login pages | Yes | No |
| Thin pages you want to hide from search | No | Yes |
| Duplicate parameter URLs | No | Yes with canonical |
| Pages in active development | Yes | Yes |
When two or more URLs on your site return the same or very similar content, Google must decide which one to index. It often picks the wrong one, or skips both. Canonical tags solve this by explicitly telling Google which version is the authoritative page. Without them, duplicate content silently fragments your indexing and ranking signals.
Ecommerce and multi-location sites face this problem most often. A product page with filter parameters like /shoes?color=red and /shoes?color=blue may each return nearly identical content. Google's September 2025 Spam Update targeted this pattern specifically, flagging businesses that used identical location page templates across multiple cities, which led to direct indexing losses.
John Mueller has stated that "consistency is the biggest technical SEO factor." Mismatched canonical tags and conflicting internal links send opposing signals that confuse Google's indexing decisions.
Google uses content quality as a signal to decide whether to crawl and index a page. Pages with too little useful information, copied text, or content that does not serve a clear user need are deprioritized or removed from the index entirely. This is not a soft ranking preference. It is a hard filter.
Google's Gary Illyes confirmed this directly, stating: "The most important is quality. It's always quality. And I think externally, people don't necessarily want to believe it, but the quality, that's the biggest driver for most of the indexing and crawling decisions that we make." Google's combined core updates from 2023 and 2024 removed approximately 45% of low-quality content from search results.
"The fastest way to recover indexing coverage is to remove the pages dragging down your domain's quality signal. Merging five thin posts into one well-cited article consistently produces faster re-indexing than trying to pad each one individually."
Tanner Medina, Co-Founder and Chief Growth Officer, Launchcodex
Googlebot can render JavaScript, but it does so in a second pass that may take days or weeks. If your site relies on JavaScript to load the main content, key navigation, or important page text, Googlebot may index a nearly empty page on its first pass. Sites built on React, Vue, or Angular with client-side rendering are most at risk.
Google's Martin Splitt has explained the crawl impact directly: each JavaScript API request a page makes counts against the site's crawl budget. Sites that load content through multiple JavaScript API calls burn through crawl allocation faster than sites that serve the same content server-side.
In Google Search Console, use the URL Inspection Tool and click "Test Live URL." Then compare the rendered page to your actual page in a browser. If the Search Console version shows less content, JavaScript rendering is likely the blocker.
"When we audit SaaS products with indexing gaps, JavaScript rendering is the most overlooked root cause. Pricing pages, feature sections, and dynamic CTAs often load entirely through API calls, which means Googlebot sees a blank container and indexes nothing useful."
Derick Do, Co-Founder and Chief Product Officer, Launchcodex
Crawl budget is the number of URLs Googlebot will process from your site within a given period. When that budget is spent on duplicate or low-value URLs generated by filters, session IDs, or pagination, your most important pages get crawled less frequently or not at all. This is the primary indexing risk for large ecommerce and content-heavy sites.
Faceted navigation and URL parameters can multiply crawlable URLs tenfold. A clothing site with 5,000 products could generate 50,000 or more unique parameter combinations through color, size, and sort filters. Googlebot crawler traffic grew 96% from May 2024 to May 2025 according to Cloudflare data, and AI crawlers like GPTBot grew 305% over the same period. More bots competing for server resources makes crawl budget management more consequential than it was two years ago.

An orphan page is a page that no other pages on your site link to. Googlebot discovers most pages by following links. An orphan page may exist in your CMS, even in your sitemap, but Googlebot has no link path to reach it. Without discovery through internal links, it will sit at the bottom of the crawl queue indefinitely.
Orphan pages are common on large sites where content is published regularly without a structured internal linking strategy. Blog posts, campaign landing pages, and product pages added in bulk are the most frequent offenders.
A slow-loading page gives Googlebot fewer pages to crawl per session. Google has confirmed that page speed affects crawl rate. It also influences the Page Indexing report, where poor Core Web Vitals appear in the "improve page experience" section. While speed alone rarely prevents indexing, sites with persistently slow response times see lower crawl frequency and, on competitive sites, indexing delays.
Core Web Vitals cover three performance signals: Largest Contentful Paint (LCP) for loading speed, Interaction to Next Paint (INP) for responsiveness, and Cumulative Layout Shift (CLS) for visual stability. Poor scores in competitive niches can contribute to deindexing when combined with other quality signals.
Use Google PageSpeed Insights for page-level diagnostics and the Core Web Vitals report in Search Console for a site-wide view. Pay particular attention to LCP on mobile, as Google's mobile-first indexing means the mobile performance score carries more weight.
An XML sitemap tells Google which pages exist on your site and when they were last updated. Without a sitemap, Googlebot discovers pages only through links. A broken sitemap, or one that lists redirected, noindexed, or deleted URLs, can actively mislead Googlebot and slow down indexing of new and updated content.
Sitemaps are especially important for new sites, sites with large numbers of pages, and sites with frequently updated content. Google's crawl budget guidance recommends keeping sitemaps updated and free of URLs that return error codes, redirects, or non-canonical status.
A manual action is a penalty applied directly by a Google reviewer when a site violates Google's spam guidelines. Unlike algorithmic filtering, manual actions are explicit. They appear in Google Search Console under Security and Manual Actions and directly suppress or remove affected pages from the index until the issue is resolved.
Common triggers include paid link schemes, hidden text, cloaking, sneaky redirects, and spammy user-generated content. Manual actions can affect individual pages or an entire domain. Either way, no amount of technical optimization will restore indexing until the root violation is addressed and a reconsideration request is submitted.
These two status labels appear more than any other in the Page Indexing report, and they require completely different responses. "Discovered – currently not indexed" is a crawl prioritization problem. "Crawled – currently not indexed" is a content quality problem. Treating one as the other wastes time and produces no improvement.
Both statuses appear in the Not Indexed section of the Page Indexing report in Google Search Console. They are not errors in the technical sense — Google is not reporting a broken tag or a blocked URL. It is reporting a judgment call. Understanding what drove that judgment is the only way to resolve it.
This status means Google found the URL, typically through your sitemap or an internal link, but has not yet crawled it. The page has not been visited. Google knows it exists and has put it in a queue, but it has not been prioritized for crawling.
The underlying cause is almost always crawl budget pressure or weak page authority signals. Google allocates crawl capacity based on a site's perceived value and server health. Pages with few or no internal links pointing to them, pages on sites with large volumes of low-value URLs, and pages on new domains with limited external authority all sit lower in the crawl queue.
What causes it:
How to fix it:
A page stuck in "Discovered – currently not indexed" is usually a crawl access problem, not a content problem. Fix the internal linking and reduce crawl waste first. Content improvements will not move these pages if Googlebot has never visited them.
A page Google has not indexed cannot appear in AI Overviews, and it is largely invisible to other generative AI systems that rely on the Google index for source material. Fixing indexing issues is the first step in any strategy to appear in AI-generated answers.
AI Overviews appeared in nearly 20% of Google searches as of September 2025. Research shows that only 7.2% of domains appear in both Google AI Overviews and LLM results from platforms like ChatGPT or Perplexity. The gap is partly an indexing problem. Pages that are not in Google's index are not available as source material for AI systems that pull from it.
Structured data is a specific element that improves both indexing eligibility for rich results and visibility in AI-generated answers. Pages with valid schema markup show 30 to 40% higher visibility in AI-generated responses compared to pages without it. If your site does not use structured data for articles, products, FAQs, or local business information, adding it is a high-return fix that costs little to implement.
The practice of optimizing content to appear in AI-generated answers is called GEO (Generative Engine Optimization). Indexing is the prerequisite. No GEO tactic works on a page Google has never added to its index.

Most indexing problems follow predictable patterns. A robots.txt rule that was never cleaned up. A CMS template that left noindex tags on live pages. A product catalog generating thousands of near-identical parameter URLs. These are solvable problems that produce measurable results when fixed.
Start with Google Search Console. The Page Indexing report shows you exactly what is wrong and where. Work through each issue type from most pages affected to least. Prioritize technical blocks like robots.txt and noindex errors first, because they prevent Google from seeing your content at all. Then address content quality, crawl budget, and site structure.
John Mueller stated plainly in 2025: "Consistency is the biggest technical SEO factor." Pages that send conflicting signals through mismatched canonicals, inconsistent internal links, and mixed HTTP/HTTPS status are the hardest for Google to process. Align every signal and Google's job becomes straightforward.
If your site has large-scale indexing gaps across hundreds or thousands of pages, a structured technical SEO audit paired with a content quality review is the most efficient path to recovery. At Launchcodex, this is one of the first diagnostic steps we run for clients whose organic traffic is underperforming relative to the volume of content they publish.
Most pages that meet Google's quality and technical requirements are indexed within four to eight weeks after fixes are applied. Pages on high-authority domains with strong internal linking may be indexed faster. Requesting indexing via the URL Inspection Tool can accelerate the process for individual pages, but it does not guarantee timing.
"Crawled, currently not indexed" means Google visited the page but chose not to add it to the index, usually due to thin content or quality signals. "Discovered, currently not indexed" means Google found the URL but has not yet prioritized it for crawling, often due to crawl budget constraints or low page authority.
Sitemaps and internal linking work together. Good internal linking helps Googlebot discover pages through crawl paths. A sitemap ensures Google knows about every page, including those that may not be well-linked yet. Both are recommended for most sites, and required for sites with more than a few hundred pages.
Yes. Google evaluates site quality at a domain level. If a large portion of a site contains thin, duplicate, or low-value content, it can reduce how frequently Google crawls and indexes the rest of the site. John Mueller has confirmed this directly.
Yes. AI Overviews pull from Google's index. A page that is not indexed cannot appear as a source in AI-generated answers. Fixing indexing issues is the foundational step in any GEO strategy.
Use the URL Inspection Tool in Google Search Console. Click "Test Live URL" and review the rendered HTML. Compare it against what you see in your browser. If the Search Console version shows less content, JavaScript rendering is likely preventing Googlebot from seeing your full page.



Value-based bidding tells Google Ads how much each conversion is worth so the algorithm spends more to win the customers who...
A staged, practical guide mapping RAM, vCPU, and storage requirements to four website growth stages. Includes specific bench...
Nearly 62% of pages are never indexed by Google. Learn the 10 most common indexing issues, how to diagnose each one in Searc...


