Translate

Robots and SEO | Robots file|Robots Tips and Tricks|SEO :google,yahoo,msn

Many Web sites don’t offer robots.txt, Although this file is very important for
SEO if you want to have a good ranking on search engines,



Most SEO services providers offer robots.txt as one of their service,but after you read this article, you will know how to create the robots.txt yourself!


What is robots.txt?

When a search engine comes to crawl to your site, it will look for a special file which is called robots.txt
on your site. then it tells the search engine spider, which Web pages of your site should be indexed and which should be ignored.
The robots.txt file is a simple text file with no HTML code and must be placed in your root directory , e.g
http://www.yourwebsite.com/robots.txt
So next you must care about how to create a robots.txt file.

How to create a robots.txt file?

The robots.txt file is a simple text file. Open a text editor to create it. The content consists of so-called “records”.

Each record contains the information for a special search engine. It consists of two fields: the user agent line and one or more Disallow lines. Here’s an example:

User-agent: googlebot
Disallow: /cgi-bin/

This robots.txt file would allow the “googlebot”—the search engine spider of Google, to retrieve every page from your site except for files which will be ignored by googlebot from the “cgi-bin” directory .

The Disallow command works like a wildcard. If you enter

User-agent: googlebot
Disallow: /text

both “/text-desk/index.html” and “/text/index.html” as well as all other files in the “text” directory would ignored by search engines.

You’re telling the search engine that all files on your website should be indexed if you leave the Disallow line blank. However,in any case, you must enter a Disallow line for every User-agent record.

If you want to give all search engine spiders ( like google,yahoo and so on) the same rights, use the
robots.txt content as follows:

User-agent: *
Disallow: /cgi-bin/

Where to find user agent names?

By checking for requests to robots.txt, You can find user agent names in your log files. Generally, all search engine spiders should be given the same rights. so, use “User-agent: *” as mentioned above.

Things should be avoided

And you must avoid some things, If your robots.txt file isn’t formated properly, some or all your files might not get indexed by search engines. So you shold do the followings to avoid this:

1. You should not use comments in the robots.txt file
Comments might confuse some search engine spiders, Although thay are allowed in a robots.txt file,
“Disallow: support # Don’t index the text directory” might be misinterepreted as “Disallow: support#Don’t index the text directory”.

2. White space is not allowed at the beginning of a line. For example, don’t write

placeholder User-agent: *
place Disallow: /text

but set as this:

User-agent: *
Disallow: /text

3. If your robots.txt file should work, don’t mix it up,So don’t change the order of the commands like this:
Disallow: /text
User-agent: *

but you should give t=it the right order:

User-agent: *
Disallow: /text

4. Don’t use more than one directory in one Disallow line:

User-agent: *
Disallow: /text /cgi-bin/ /images/

Search engine spiders cannot understand this format. The correct syntax should be as follows:
User-agent: *
Disallow: /text
Disallow: /cgi-bin/
Disallow: /images/

5. The file names on your server are case sensitve.So make sure they are in the right case. Don’t write

“Text” in the robots.txt file,If the name of your directory is “text” on the server.

6.If you don’t want search engine spiders to index all files in a special directory, you don’t have to list

all files like follows:

User-agent: *
Disallow: /text/orders.html
Disallow: /text/technical.html
Disallow: /text/textdesk.html
Disallow: /text/index.htm

You should replace this with

User-agent: *
Disallow: /text

7. Don’t use “Allow” command

There’s no “Allow” command in the robots.txt file. You should write the files you don’t want them to be

indexed. All others will be indexed automatically if they are linked on your site.

Tips and tricks:

1. How to allow all search engine spiders to index all your files:

You can do the following settings:
User-agent: *
Disallow:
The search engine spiders will index all your files.

2. How to avoid all your files indexing by spiders, You can do as follows:
User-agent: *
Disallow: /

Then the search engines will not index any of your web site files.

3. Where to find more complex examples.

View the robots.txt files of big Web sites to see more complex examples:

http://www.pal-stu.com/robots.txt
http://www.cnn.com/robots.txt
http://www.nytimes.com/robots.txt
http://www.spiegel.com/robots.txt
http://www.ebay.com/robots.txt




If you want to have good rankings on search engines.Your site should have a proper robots.txt file . After search engines know what to do with your pages, they can give your site a good ranking.