Google Hacks 2Nd Edition [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Google Hacks 2Nd Edition [Electronic resources] - نسخه متنی

Tara Calishain

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید







Hack 87. Get Inside the PageRank Algorithm

Delve into the inner workings of the Google
PageRank algorithm and how it affects results .

PageRank
is the algorithm used by the Google search engine, originally
formulated by Sergey Brin and Larry Page in their paper
"The Anatomy of a
Large-Scale Hypertextual Web Search Engine." It is based on the premise, prevalent in the world of academia, that
the importance of a research paper can be judged by the number of
citations the paper has from other research papers. Brin and Page
have simply transferred this premise to its Web equivalent: the
importance of a web page can be judged by the number of hyperlinks
pointing to it from other web pages.


8.8.1. So What Is the Algorithm?


It may look daunting to non-mathematicians, but the
PageRank
algorithm is in fact elegantly simple and is calculated as follows:

PR(A) = (1-d) + d { PR(T1) + ... + PR(Tn) }
------ ------
C(T1) C(Tn) PR(A) is the PageRank of a page A.

PR(T1) is the PageRank of a page T1.

C(T1) is the number of outgoing links from the page T1.

d is a damping factor in the range 0 < d < 1, usually set to
0.85.


The PageRank of a web page is therefore calculated as a sum of the
PageRanks of all pages linking to it (its incoming links), divided by
the number of links on each of those pages (its outgoing links).


8.8.2. And What Does This Mean?


From a search engine marketer's point of view, this
means there are two ways in which PageRank can affect the position of
your page on Google:

The number of incoming links. Obviously, the more of these, the
better. But there is another thing the algorithm tells us: no
incoming link can have a negative effect on the PageRank of the page
it points at. At worst, it can simply have no effect at all.

The number of outgoing links on the page that points to your page.
The fewer of these, the better. This is interesting: it means that,
given two pages of equal PageRank linking to you, one with 5 outgoing
links and the other with 10, you will get twice the increase in
PageRank from the page with only 5 outgoing links.


At this point, we take a step back and ask ourselves just how
important PageRank is to the position of your page in the Google
search results.

The next thing that we can observe about the PageRank algorithm is
that it has nothing whatsoever to do with relevance to the search
terms queried. It is simply a single (admittedly important) part of
the entire Google relevance ranking algorithm.

Perhaps a good way to look at PageRank is as a multiplying factor
applied to the Google search results after all other computations
have been completed. The Google algorithm first calculates the
relevance of pages in its index to the search terms, and then
multiplies this relevance by the PageRank to produce a final list.
The higher your PageRank, therefore, the higher up the results you
will be, but there are still many other factors related to the
positioning of words on the page that must be considered first.


8.8.3. So What's the Use of the PageRank Calculator?


If no incoming link has a negative effect, surely I should just get
as many as possible, regardless of the number of outgoing links on
its page?

Well, not entirely. The
PageRank
algorithm is cleverly balanced. Just like the conservation of energy
in physics with every reaction, PageRank is also conserved with every
calculation. For instance, if a page with a starting PageRank of 4
has two outgoing links on it, we know that the amount of PageRank it
passes on is divided equally between all of its outgoing links. In
this case, 4 / 2 = 2 units of PageRank is passed on to each of 2
separate pages, and 2 + 2 = 4so the total PageRank is
preserved!


There are scenarios in which you may find that total PageRank is not
conserved after a calculation. PageRank itself is supposed to
represent a probability distribution, with the individual PageRank of
a page representing the likelihood of a random
surfer chancing upon it.

On a much larger scale, supposing Google's index
contains a billion pages, each with a PageRank of 1, the total
PageRank across all pages is equal to a billion. Moreover, each time
we recalculate PageRank, no matter what changes in PageRank may occur
between individual pages, the total PageRank across all one billion
pages will still add up to a billion.

First, this means that, although we may not be able to change the
total PageRank across all pages, by strategic linking of pages within
our site, we can affect the distribution of PageRank between pages.
For instance, we may want most of our visitors to come into the site
through our home page. We would therefore want our home page to have
a higher PageRank relative to other pages within the site. We should
also recall that all the PageRank of a page is passed on and divided
equally between each outgoing link on a page. We would therefore want
to keep as much combined PageRank as possible within our own site
without passing it onto external sites and losing its benefit. This
means we would want any page with lots of external links (i.e., links
to other people's web sites) to have a lower
PageRank relative to other pages within the site to minimize the
amount of PageRank that is leaked to external
sites. Also, bear in mind our earlier statement, that PageRank is
simply a multiplying factor applied once Google's
other calculations regarding relevance have already been calculated.
We would therefore want our more keyword-rich pages to also have a
higher relative PageRank.

Second, if we assume that every new page in Google's
index begins its life with a PageRank of 1, there is a way we can
increase the combined PageRank of pages within our siteby
increasing the number of pages! A site with 10 pages will start life
with a combined PageRank of 10, which is then redistributed through
its hyperlinks. A site with 12 pages will therefore start with a
combined PageRank of 12. We can thus improve the PageRank of our site
as a whole by creating new content (i.e., more pages), and then
control the distribution of that combined PageRank through strategic
interlinking between the pages.

And this is the purpose of the PageRank Calculatorto create a
model of the site on a small scale including the links between pages,
and see what effect the model has on the distribution of PageRank.


8.8.4. How Does the PageRank Calculator Work?


To get a better idea of the realities of PageRank, visit the

PageRank Calculator (http://www.markhorrell.com/seo/pagerank.asp).

It's simple, really. Start by typing in the number
of interlinking pages that you wish to analyze and hit Submit. I have
confined this number to just 20 pages to ease server resources. Even
so, this should give a reasonable indication of how strategic linking
can affect the PageRank distribution.

Next, for ease of reference once the calculation has been performed,
provide a label for each page (e.g., Home Page, Links Page, Contact
Us Page, etc.), and again hit Submit.

Finally, use the list boxes to select which pages each page links to.
You can use Ctrl and Shift to highlight multiple selections.

You can also use this screen to change the initial PageRanks of each
page. For instance, if one of your pages is supposed to represent
Yahoo!, you may wish to raise its initial PageRank to, say, 3.
However, in actuality, starting PageRank is irrelevant to its final
computed value. In other words, even if one page were to start with a
PageRank of 100, after many iterations of the equation, the final
computed PageRank will converge to the same value as it would had it
started with a PageRank of only 1!

You can play around with the damping factor d ,
which defaults to 0.85, as this is the value quoted in Brin and
Page's research paper.

Mark Horrell

/ 209