There are many parameters that can be measured in Google (G.) * or the other search engines (SE), but I focus on those that are useful for the SEO community in its search for better rankings.
Those who try to rank websites in G. need to measure the results of a certain promotion campaign or optimization trick. Otherwise, the analysis lacks precision and scientific certainty.**
G. has a number of measures against SEO experts and those who want to crack its algorithm. For instance, some factors have a hard to understand or random interaction with others. And there are some quite mysterious, hard-to-identify penalizations.
The all-powerful, anonymous G. Penalty Tribunal
If you are completely excluded from G., there is an appeal procedure. However, if you rank poorly, there is nobody to complain to.
Most experts agree on these penalizations:
- Over-optimizing penalty
- Duplication penalty
- Link to bad neighborhood penalty
- Hidden text penalty
- Others SE Spam penalties
- Copyright infringement penalty
It is possible to detect if the website has any of the suspect features. Duplication can be measured by a number of tools (see my SE Metrics page, http:// foundfirst.com/seo/se-metrics.php ), link to a bad neighborhood can be checked manually or using a tool (see the bookmark in the SE Metrics page), and the same is true for other spam tricks.
However, SEO and webmasters are prepared to take some chances when trying to obtain traffic for their sites. Thus, they need to detect when they crossed the line and got penalized. Also, when a new consultant is brought to improve a website ranking, there is a clear need to look for penalizations.
Which leads to the main question in this article: How to detect penalization?
Complete penalization or banning is easy to spot. The site is no longer found in G., while Hotbot and other SEs still list it.
Partial penalization is the big threat to SEO practitioners everywhere, because it is very hard to identify and measure. In order to detect it, we need two parameters:
1) Real Ranking (SEPI or a similar one)
2) Deserved Ranking
Real Ranking
Real Ranking is easy to detect for a given keyword. But the matter gets complex when there are several keywords involved, as it is the case in most websites.
Even if 2 or 3 keywords in a website can rank well, others can be forgotten or penalized. So, it is necessary to have a parameter for global ranking value, considering several keywords.
Good SEOs can show important ranking results for hard-to-rank keywords. We (FoundFirst.com) modestly rank very well for web promotion campaigns (1st among 16 million competitors). Poor SEOs will show a first Place in G. for Googleometry, which is an easy-to-rank keyword (2 competitors).
Keyword phrases pose a more hard-to-measure problem than keywords alone.
In other article I refer to a Search Engine Positioning Index (SEPI), with a free SEPI calculator. It considers 2 main factors:
- position for every desired keyword
- difficulty degree for each keyword
I published as a free service in []http://www.domaingrower.com/domain-valuation.php[/]. However, this index has a lot of room for improvement.
Deserved Ranking
Everyone feels its page deserves a top ranking. However, we need a mathematical definition of deserved ranking.
To have a 'Deserved Ranking' we would need a substitute of the real G. algorithm. An exact replica is not feasible, but we can compute keyword density, incoming links, domain age and PageRank. A recent approximation to the G. algo is in the references section of my SE Metrics resource page.
An easy way to estimate Deserved Ranking is to use one of the available open source SEs.
Thus:
Penalization = Deserved Ranking ? Real Ranking
The Googleometry Project will define a Deserved Ranking and start identifying G. Penalizations on pages that infringe G. guidelines.
Besides detecting penalization, creating and improving the Deserved Ranking algo will be of enormous help to all SEOs.
First Results
I show a few preliminary data to tempt possible collaborators, partners and sponsors.
As a first step, I analyzed a number of my own sites, in which I know exactly the amount of duplication and spam.
I tested the G. operators: site:, inurl: and link: for several domains.
G. shows the "we have omitted some entries very similar..." message when it feels there are either too many pages or duplicated contents.
My data are published here: httP://domaingrower.com/duplication-penalty.xls , and there seem to be a few interesting conclusions.
For instance, only sites with less than 1000 indexed pages can be analyzed, because this is the maximum that G. will show.
It seems that G. is reluctant to show a high percentage of indexed pages, when duplication is present. The pages are listed but not shown.
While looking into supplemental results (after the message), the trend is the same.
While using the inurl: operator, there is also a trend to show fewer results in redundant sites. This operator shows fewer pages than the site: operator, but it is also influenced by duplication.
Check the data to confirm or discuss these preliminary thoughts.
What the Googleometry (Google Metrics) Project will involve
We intend to run this project with support from subscribers. We will try to select them in order to exclude representatives from the SEs themselves.
We will run experiments conducting to the following goals:
- establish the best parameters to rank websites in G., including in-site and off-site parameters.
- establish the reasons for possible website penalization. This will be in general and also focusing on the subscribers websites.
- make suggestions to the subscribers that will allow them to improve their rankings
As support measures for the subscribers, we will:
- keep a Forum where the results will be discussed
- identify experts among the community and offer them privileged space for publication
- create tools that will help us achieve our goals
- issue alerts for those subscribers that risk penalization
- issue individual ranking recommendations to our subscribers
- run experiments on demand
Welcome to the Googleometry Project!
Notes
*: We choose G. but the results can usually be applied to most other engines. I often verified that my good results in G. are followed by Yahoo and MSN, while the converse is not true. Probably because G. has a more stringent penalization policy.
**: For instance, some experts say that the Keywords Metatag does not count for ranking results, while others do not agree. The only way to assess the theory is to run a controlled experiment and measure the results.
However, posting two similar pages that only differ in one variable, like a metatag, can trigger the duplication filter in G. and lead to distorted results, penalties or complete banning. Besides, G. or other search engines consider a number of different factors, and it is necessary to keep all of them in mind while running tests. For instance, a keyword in the Keyword Metatag can help rankings in the same degree as a keyword in the body of the page. Or in the upper body. Or lower body. That requires some complex testing.
Also, in order to index 5 experimental pages we need 5 incoming links. And links have never the same G. value. Link value is subject of wide speculation and there are even tools to measure it (see references).
References
SE Metrics page ? Link value tools, Duplication detection tools, SE Spam detection tools.
http://www.stuntdubl.com/2006/06/12/dupe-content/
Indexing/Search project results
http://sourceforge.net/softwaremap/trove_list.php?form_cat=93
A Piece of the G. Algorithm - Revealed
http://www.stuntdubl.com/2006/06/12/dupe-content/
White Paper on Duplicated Content
http://www.seomoz.org/blogdetail.php?ID=1467"
Sergio Samoilovich runs FoundFirst.com, a SEO company in Buenos Aires. The company develops unique SEO software and tools. | |