Hack 91. Remove Your Materials from Google

Google's various web properties .Some people are more than thrilled to have Google
index their sites. Other folks don't want the
GoogleBot anywhere near them. If you fall into the latter category
and the bot's already done its worst, there are
several things you can do to remove your materials from
Google's index. Each part of GoogleWeb
Search, Google Images, and Google Groupshas its own set of
methodologies.
8.12.1. Google Web Search
Here are several tips to avoid being listed.
8.12.1.1 Making sure your pages never get there to begin with
While you can take steps to remove your content from the Google index
after the fact, it's always much easier to make sure
the content is never found and indexed in the first place.Google's crawler obeys the robot exclusion
protocol , a set of instructions you put on your web site
that tells the crawler how to behave when it comes to your content.
You can implement these instructions in two ways: via a
META tag that you put on each page (handy when you
want to restrict access to only certain pages or certain types of
content) or via a robots.txt file that you
insert in your root directory (handy when you want to block some
spiders completely or want to restrict access to kinds or directories
of content). You can get more information about the robots exclusion
protocol and how to implement it at http://www.robotstxt.org/.
8.12.1.2 Removing your pages after they're indexed
There are several things you can have removed from
Google's results.
|
Use the robots exclusion protocol, probably with
robots.txt.
Removing individual pages
Use the following META tag in the
HEAD section of each page you want to remove:
<META NAME="GOOGLEBOT" CONTENT="NOINDEX, NOFOLLOW"> Removing snippets
A snippet is the little excerpt of a page that
Google displays on its search result. To remove snippets, use the
following META tag in the HEAD
section of each page for which you want to prevent snippets:
<META NAME="GOOGLEBOT" CONTENT="NOSNIPPET"> Removing cached pages
To prevent Google from keeping cached versions of your
pages in its index, use the following META tag in
the HEAD section of each page for which you want
to prevent caching:
<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">
8.12.1.3 Removing that content now
Once you implement these changes, the next time GoogleBot crawls your
web site (usually within a few weeks), it will remove or limit your
content according to your META tags and
robots.txt file. If you want your materials
removed right away, you can use the automatic remover at http://services.google.com:8882/urlconsole/controller.
You'll have to sign in with an account (requires an
email address and a password). Using the remover, you can request
that Google crawl your newly created robots.txt
file, or you can enter the URL of a page that contains exclusionary
META tags.
|
8.12.1.4 Reporting pages with inappropriate content
While you may like your own content fine, you might find that, even
if you have filtering activated, you're getting
search results with explicit content. Or you might find a site with a
misleading title tag and content completely unrelated to your search.
You have two options for reporting these sites to Google. Bear in
mind that there's no guarantee that Google will
remove the sites from the index, but they will investigate them. At
the bottom of each page of search results, you'll
see a "Dissatisfied? Help Us
Improve" link; follow it to a form for reporting
inappropriate sites. You can also send the URL of explicit sites that
show up on a SafeSearch but probably shouldn't to
safesearch@google.com. If you
have more general complaints about a search result, you can send an
email to
8.12.2. Google Images
Google's Image database of materials is separate
from that of the main search index. To remove items from
Google Images, use
robots.txt to specify that the GoogleBot Image
crawler should stay away from your site. Add these lines to your
robots.txt file: User-agent: Googlebot-Image
Disallow: / You can use the automatic remover mentioned in the web search section
to have Google remove the images from its index database quickly.There may be cases where someone has put images on their server for
which you own the copyright. In other words, you
don't have access to their server to add a
robots.txt file, but you need to stop Google
from indexing your content there. In this case, you need to contact
Google directly. Google has instructions for situations just like
this at http://www.google.com/removel; look at
Option 2, "If you do not have any access to the
server that hosts your image."
8.12.3. Google Groups
Like the Google Web Index, you have the option to both prevent
material from being archived on Google and to remove it after the
fact.
8.12.3.1 Preventing your material from being archived
To prevent your
material from being archived on Google, add the following line to the
headers of your Usenet posts: X-No-Archive: yes If you do not have the options to edit the headers of your post, make
that line the first line in your post itself.
8.12.3.2 Removing materials after the fact
If you want materials removed after the fact, you have a couple of
options: If the materials that you want removed were posted under an address
to which you still have access, you can use the automatic removal
tool mentioned earlier in this hack. If the materials that you want removed were posted under an address
to which you no longer have access, you'll need to
send an email to civil or criminal laws that I am the person who posted each of the
foregoing messages or am authorized to request removal by the person
who posted those messages." Your electronic signature.
8.12.4. Google Phonebook
You migt not want to have your contact information made available via
the phonebook searches on Google. You'll have to
follow one of two procedures, depending on whether the listing you
want removed is for a business or for a residential number.
If you want to remove a business phone number,
you'll need to send a request on your business
letterhead to:Google PhoneBook Removal1600 Amphitheatre ParkwayMountain View, CA 94043
Be sure to include a phone number so that Google can reach you to
verify your request.Removing a residential phone number is much simpler. Fill out the
form at http://www.google.com/help/pbremovall.
The form asks for your name, city and state, phone number, email
address, and reason for removal, a multiple choice: incorrect number,
privacy issue, or
"other."