How to properly index the pages of your site [ Part 1 ]

Anyone who has seen a scene like that?

Many pages sent to Google and few indexed. This is a more common situation than it looks and involves not only large sites but also smaller sites .

We begin today a series of 3 posts where we will demonstrate how to properly index your site , ensuring that everything that needs to be indexed by Google and is found which does not have search value is left out .

In today’s post we will focus on the theory behind this process and the reasons why a page to be indexed .

To begin , let’s answer a simple question:
Why Google is not indexing all my site?

You did your sitemap with all your relevant pages and sent to Google, and even then, only some of them were indexed. What could have happened?

In general , there are some factors that can take some off the indexing pages:

Absence of a direct path for the link within your site (or just for nofollow links)
internally duplicate content
Externally duplicate content
Content ” poor ”

Absence path

In general, the first time index occurs in crawling performed by Googlebot. At this time, Google scans your site, entering all of your pages from links on your site.

The first hypothesis that can generate a lack of indexing a page is the absence of a direct link to the page or the existence of a path only nofollow links. Google simply will never find that this page exists.

It notes that this is also the case for sites that only work through searches. Google is not able to understand what must be sought to reach any page.

It is therefore important that the navigation flow of your site, you can access any page just by dofollows links. Aside from the fact that it is important for your link juice. If you want to understand a little more about internal linking, it is worth checking our article on internal linking.

There are ways to index pages without a direct path link, is the case for example of landing pages of adwords campaigns, which often do not have a path, but Google finds them through paid traffic generated. However, these pages also but will not be drinking the potential authority of the main pages.

You can also ensure that a page is found creating custom robots.txt to indicate a clear path to the Google crawling. This form also makes the potential of juice link is wasted.

Having a page found by Google does not guarantee that it will be indexed.

internally duplicate content

which - and - content - duplicate - 1024x556
This is the most common problem that takes a page to be indexed .

You need not be 2 identical pages, but the lack of content in most equal to 2 pages can be interpreted by Google as in fact the same page.

This has two consequences. The first consists in the fact that one of them will not be indexed . The second is that the relevance of that content will be split between two pages , ie the ranking of indexed page will be very weak .

This issue of duplicate pages can often be a structural failure of the site . It is common sites create multiple versions of the same page. This is often reflected in the sitemap in urls indexed but will not be indexed . An example would be pages like:

These pages are essentially the same , differing only in the color of shuriken . Google will identify this as duplicate content and difficult to index the 2:03 url .

In some cases , the page does not have an internally duplicate content , but externally.
Externally duplicate content

This is also another common case of not indexing, but has some gotchas .

For externally duplicate content , it is understood simply content that was copied from other websites and posted on her . This may prove to be indexed or not.

It is worth reading a bit about it in this news Google’s own , where he states that in certain cases , even if the content is the same between two pages , there is a high probability that the index still rather happen.

This is the case for example of e- commerces . The vast majority of digital shopping using standard product descriptions . Clearly this possibly causes problems of a bad optimization on page but does not guarantee that the content will not be indexed , on the contrary , perhaps it will be indexed.

However , the case for example of great articles copied from other sites , it is likely that content will not be indexed, and even if it will have a bad ranking .

It’s different to use content pieces . For example , it is very common that several publish the same content when it is something rich as an infographic , and it will not cause the site is not indexed.

On a website, it is very common you have those pages that are like the ugly duckling of your website. They have a line of content and often do not have the right value for your user.

These poor pages are also often reason for the absence of indexing. If Google understands that this page has no value to your user, it’s likely that it will not be indexed.

This is primarily due to the Google Panda algorithm.

It is interesting to note that the existence of many poor content can have extremely harmful effects, if they are not properly treated on their website. In the next content we will talk a little more about it.

Now that you know the main reasons that lead content not be indexed, I think you may already have an indication of why your site has a large number of pages sent to Google and few indexed.

In the next post we will talk about how to identify precisely the practice which are the pages that have not been indexed and the causes that led to it. You also understand when you should not index a page!

Stay tuned and check out our upcoming posts! You got any questions? ask us down !


Please enter your comment!
Please enter your name here