Restrict Content from External Search Engines

The robots.txt file is a text file placed on your web server that tells webcrawlers such as Googlebot if they should access a page or not. Webcrawlers are programs that traverse the web automatically. Search engines such as Google use them to index web content.

It works likes this: a webcrawlers wants to visits a web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

User-agent: * Disallow: /some-course

The "User-agent: *" means this section applies to all robots. The "Disallow: /some-course" tells the robot that it should not visit the page with the URL of /some-course.

The robots.txt file can be edited directly from EthosCE.

Warning

Do not remove the lines of text that already exists in your robots.txt file. Add your changes to the bottom.

To edit the robots.text file

  1. Log in as a user with the site admin role.

  2. Click the wrench to open the admin menu.

  3. Click "Configuration"

  4. Click "Search and metadata"

  5. Click "RobotsTxt"

  6. Scroll to the bottom of the "Contents of robots.txt" field

  7. Add new entries

  8. Click "Save configuration."

Note that adding lines to the robots.txt file will not remove content from Google or other search engines.