Catha For Council Blog

Who else?

Archive for the 'Search Engine Optimization Info' Category

Reverse Engineering Search Engine Ranking Algorithms

Back in 1997 I did some research in an attempt to reverse-engineer algorithms used by search engines. In that year, the big ones included AltaVista, Webcralwer, Lycos, Infoseek, and a few others.

I was able to largely declare my research a success. In fact, it was so accurate that in one case I was able to write a program that produced the exact same search results as one of the search engines. This article explains how I did it, and how it is still beneficial today.

Step 1: Determine Rankable Traits

The first thing to do is make a list of what you want to measure. I came up with about 15 different possible ways to rank a web page. They included things like:

- keywords in title
- keyword density

- keyword frequency

- keyword in header
- keyword in ALT tags
- keyword emphasis (bold, strong, italics)
- keyword in body
- keyword in url
- keyword in domain or sub-domain
- criteria by location (density in title, header, body, or tail) etc

Step 2: Invent a New Keyword

The second step is to determine which keyword to test with. The key is to choose a word that does not exist in any language on Earth. Otherwise, you will not be able to isolate your variables for this study.

I used to work at a company called Interactive Imaginations, and our site was Riddler.com and the Commonwealth Network. At the time, Riddler was the largest entertainment web site, and CWN was one of the top trafficed sites on the net (in the top 3). I turned to my co-worker Carol and mentioned I needed a fake word. She gave me “oofness”. I did a quick search and it was not found on any search engine.

Note that a unique word can also be used to see who has copied content from your web sites onto their own. Since all of my test pages are gone (for many years now), a search on Google shows some sites that did copy my pages.

Step 3: Create Test Pages

The next thing to do was to create test pages. I took my home page for my now defunct Amiga search engine “Amicrawler.com” and made about 75 copies of it. I then numbered each file 1.html, 2.html… 75.html.

For each measurement criteria, I made at least 3 html files. For example, to measure keyword density in title, I modified the html titles of the first 3 files to look like this:

1.html:


2.html:
3.html:

The html files of course contained the rest of my home page. I then logged in my notebook that files 1 - 3 were keyword density in title files.

I repeated this type of html editing for about 75 or so files, until I had every criteria covered. The files where then uploaded to my web server and placed in the same directoty so that search engines can find them.

Step 4: Wait for Search Engines to Index Test Pages

Over the next few days, some of the pages started appearing in search engines. However a site like AltaVista might only show 2 or 3 pages. Infoseek / Ultraseek at the time was doing real time indexing so I got to test everything right away. In some cases, I had to wait a few weeks or months for the pages to get indexed.

Simply typing the keyword “oofness” would bring up all pages indexed that had that keyword, in the order ranked by the search engine. Since only my pages contained that word, I would not have competing pages to confuse me.

Step 5: Study Results

To my surprise, most search engines had very poor ranking methodology. Webcrawler used a very simple word density scoring system. In fact, I was able to write a program that gave the exact same search engine results as Webcrawler. That’s right, just give it a list of 10 urls, and it will rank them in the exact same order as Webcrawler. Using this program I would make any of my pages rank #1 if I wanted to. Problem is of course that Webcrawler did not generate any traffic even if I was listed number 1, so I did not bother with it.

AltaVista responded best with the most number of keywords in the title of the html. It ranked a few pages way at the bottom, but I don’t recall which criteria performed worst. And the rest of the pages ranked somewhere in the middle. All in all, AltaVista only cared about keywords in the title. Everything else didn’t seem to matter.

A few years later, I repeated this test with AltaVista and found it was giving high preference to domain names. So I added a wildcard to my DNS and web server, and put keywords in the sub-domain. Voila! All of my pages had #1 ranking for any keyword I chose. This of course led to one problem… Competiting web sites don’t like losing their top positions and will do anything to protect their rankings when it costs them traffic.

Other Methods of Testing Search Engines

I am going to quickly list some other things that can be done to test search engines algorithms. But these are all lengthy topics to discuss.

I tested some search engines by uploading large copies of the dictionary, and redirecting any traffic to a safe page. I also tested them by indexing massive quantities of documents (in the millions) under hundreds of domain names. I found in general that there are very few magic keywords found in most documents. The fact still remains that a few keyword search times like “sex”, “britney spears”, etc brought in traffic but most do not. Hence, most pages never saw any people traffic.

Drawbacks

Unfortunately there were some drawbacks to getting listed #1 for a lot of keywords. I found that it ticked off a lot of people who had competing web sites. They would usually start by copying my winning methodology (like placing keywords in the sub-domain), and then repeat the process themselves, and flood the search engines with 100 times more pages than the 1 page I had made. It made it worthless to compete for prime keywords.

And second, certain data cannot be measured. You can use tools like Alexa to determine traffic or Google’s site:domain.com to find out how many listings a domain has, but unless you have a lot of this data to measure, you won’t get any useable readings. What good is it for you to try and beat a major web site for a major keyword if they already have millions of visitors per day, you don’t, and it is part of the search engine ranking?

Bandwidth and resources can become a problem. I have had web sites where 75% of my traffic was search engine spiders. And they slammed my sites every second of every day for months. I would literally get 30,000 hits from the Google spider every day, in addition to other spiders. And contrary to what THEY believe, they aren’t as friendly as they claim.

Another drawback is that if you are doing this for a corporate web site, it might not look so good.

For example, you might recall a few weeks ago when Google was caught using shadow pages, and of course claimed they were only “test” pages. Right. Does Google have no dev servers? No staging servers? Are they smart enough to make shadow pages hidden from normal users but not smart enough to hide dev or test pages from normal users? Have they not figured out how a URL or IP filter works? Those pages must have served a purpose, and they didn’t want most people to know about it. Maybe they were just weather balloon pages?

I recall discovering some pages that were placed by a hot online & print tech magazine (that wired us into the digital world) on search engines. They had placed numerous blank landing pages using font colors matching the background, which contained large quantities of keywords for their largest competitor. Perhaps they wanted to pay digital homage to CNET? Again, this was probably back in 1998. In fact, they were running articles at the time about how it is wrong to try and trick search engines, yet they were doing it themselves.

Conclusion

While this methodology is good for learning a few things about search engines, on the whole I would not recommend making this the basis for your web site promotion. The quantity of pages to compete against, the quality of your visitors, the shoot-first mentality of search engines, and many other factors will prove that there are better ways to do web site promotion.

This methodology can be used for reverse engineering other products. For example, when I worked at Agency.com doing stats, we used a product made by a major micro software company (you might be using one of their fine operating system products right now) to analyze web server logs. The problem was that it took more than 24 hours to analyze 1 days worth of logs, so it was never up to date. A little bit of magic and a little bit of perl was able to generate the same reports in 45 minutes simply by feeding the same logs into both systems until the results came out the same and every condition was accounted for.

Copyright 2005 CheapBooks.com. All Rights Reserved. CheapBooks.com is a book price comparison shopping engine, allowing you to locate the cheapest prices on millions of books and ebooks.

So, Where Has Your Search Engine Been Today?

Visit Google, Yahoo, MSN or one of the lesser search engines, and you get a few million results for just about any search term. Despite this impressive depth of results, most users consider only a few of the WebPages being pointed to. A lot of research indicates that most searchers exit search engine result pages to visit one of the top three results. That raises the question: What about the remaining million plus results?

We Need a Search-Engine to Search Search-Engine Results!

Based on the above premise, I set out on a mission to simplify search engine results. But, try as I might, I could not find an automated method to simplify search engine results. I think that is logical, otherwise these multi-billion dollar behemoths would have done so themselves. So, I thought: What is that one thing that I can do which the Googles and Yahoos of the world cannot do. And the quick answer was: I can use human / personal discretion in choosing search results. This would bypass the legion of search engine optimizers who keep building link popularity to rise up in search pages.

Can Human Selected Search Results Beat Algorithm Selected Search Results?

Tough to say, but you can look for yourself. Compare the Google results for Hair Removal (http://www.google.com/search?hl=en&q=Hair+Removal) and my selected results for Hair Removal (http://www.human-search-engine.com/6.html). There is some overlap, but the results that I display are a result of my personal visit to the listed pages.

Conclusions

As search engines become better and faster, there is a need for a human touch to search results. In this constant struggle between spammy (scammy?) search engine optimizers and search engine engineers, the searcher can be the victim.

Ajeet Khurana is a search engine enthusiast and the founder of the Human Search Engine http://www.human-search-engine.com/ He is also the search engines correspondent for the AIA content network. Read some of his search engine musings at: http://search-engines.allinfoabout.com In another life, Ajeet is the Business Majors correspondent for About.com, an online publication of the New York Times Company: http://businessmajors.about.com

Search Engine Optimization (SEO) - Boost Your Website Traffic

Search Engine Optimization (SEO) - Boost Your Website Traffic

Search engines bring more than 80 percent of the traffic for small to medium websites. This tells exactly how important it is for small and medium websites to optimize their web structure and pages for search engines. Optimization of your website for search engine includes many aspects: website content, keywords, URL, meta-tag, back links, etc. Let’s explain it one by one.

Select the right keywords. You can pay a visit to your competitors’ websites (only those with top-ranked search engine placements). Through analyzing their web contents and meta tags, you can easily find out the keywords they are using. Overture keywords selection tool can also provide you valuable information. Open your favorite browser, enter http://inventory.overture.com/d/searchinventory/suggestion/, type in the search box your keywords or phrases and see how many times they were searched last month. Select those with high search frequency. Google also has a similar keyword selection tool at https://adwords.google.com/select/KeywordSandbox. You can have a try of both of them and balance the search results.

Target your web site content at selected keywords. After finalizing the keywords, you can now build up your web pages with them. But, be careful, don’t overuse any keywords on your web pages. Overuse of keywords may make search engine spiders think you are spamming and get your website banned. How can I judge overuse or not? Go to http://www.gorank.com/seotools/ to have a check of your keywords density, make sure it is within a reasonable range.

Make search engine friendly URLs. Although some of the search engines can follow all dynamic URLs, like http://www.scriptmenu.com/detail.php?id=25257, some of them still prefer static URLs ended with html, htm, etc. To make search engine friendly URLs, you can create real static pages, but you don’t have to. The web server URL rewrite engine can make this job much easier by reinterpreting the URLs before getting actual pages. If you need more help or tips on how to implement URL rewrite model, follow the link http://www.scriptmenu.com/detail_24379.html and get a tutorial.

Get quality backward links to your page. Although keywords optimization of your web pages can improve significantly your search engine placement, it is still far from sufficient to get your pages top ranked. You have to get some quality backward links to your websites. You need at least 35 quality back links to make google going to your web site and take a look at you. You can get these quality links by submitting your site to high-ranked web site directories or by writing some quality articles and then submitting them to the high-ranked online article archives. Many other ways exist, but remember, only backward links from quality web sites count. Websites poorly indexed or with very low search engine ranking have no value to you.

Keep on improving your website. By keyword optimization, URL optimization and quality backward URLs, your website should have gotten remarkable search engine placement. However, the placement is not static, you competitors are optimizing their websites and trying to kick you out of your current position. To maintain a good search engine ranking, you have to keep on improving your website. Keep on optimizing your website navigation, content and structure. Keep on getting more quality links from top-rated sites…. The battle for top search engine positions will never end. Good luck, :-)

The Google Jagger Update

The Google Jagger Update

What has the Google Jagger
update delivered to us as internet users and how has it impacted
search engine optimization companies?

Jagger Update For Internet Users

The Google Jagger
update was done to try and refine some of the results that were
showing up in the Google search results by concentrating on
weeding out those sites that were not beneficial to the person
searching for their specific search term.

Google, and any search engine for that matter, wants to
deliver high quality unique and relevant sites. That means that
they needed to target scraper sites, monetized directory sites,
spam sites, link exchanges and Blog spam.

With the completion of the Google Jagger update you
should now see more relevant search engine results that are more
likely to be in line with your requirements. Less rubbish and
more good content was the hope and it seems that to a certain
degree this update has been effective and was beneficial in
general to the internet user whose preference is Google for
their search engine.

Jagger Update For Search Engine Optimization
Companies

After much
wailing and gnashing of teeth SEO companies are starting to calm
down…………a little bit anyway. The Google Jagger update
was deemed to be the end for search engine optimization
companies and there was panic everywhere especially in the early
stages of the update when sites seemed to be dropping like flies
out of the Google results pages.

You could here the shouts of “they have gone too far
this time” and “people will start using MSN & Yahoo! more
and more now” any it is open to interpretation just whether
these two statements hold some truth. One thing is certain
though and that is that Google had to do something, too many
black hat ‘SEO companies’ were abusing the previous
algorithms and negatively affecting the overall quality of the
Google search results.

Google
created this monster initially by putting so much weight on
inbound links and it was a relatively easy strategy for SEO
companies to use that exploited how Google was producing
results. Instantly there came wave after wave of irrelevant link
farms whose sole purpose was to exploit the Google algorithms,
it benefited nobody except the site owner who got away with it
and many sites did……………then along came Jagger.

class="style1 style10">The Jagger update seems to affect the way
SEO companies need to work by giving more weight to the
relevancy of inbound and outbound links,
filtering of unacceptable css spam techniques and demotion of
blog relevancy/weight.

Instead of
thinking of quantity over quality we have now got to be smarter,
just like in real life a recommendation (link) from a high
quality company (site) is more beneficial and trustworthy to you
(your website) than that from an general ad lost in the
classifieds section of a tabloid (think link farms).

class="style1 style10">It seems that SEO companies have got to
now be more than manipulators; they have got to play more by the
rules.

Search is ever evolving
and so already there are people who are exploiting loopholes in
the Jagger update but they are next in Google’s hit
list………..may the battle commence!

About the Author:

Glenn Hodgkinson is a published, award winning authority in the
area of website development. Having worked on applications for
multinational corporations he now resides in Maine USA where he
owns a custom web site
design company called GB Interactive.