How Search Engine spiders works

Search Engines do not directly search the World Wide Web, instead of that they search a database of web pages cached by spiders. Spiders or also known as robots or crawlers are the part of search engine that automatically fetches the web pages from the entire World Wide Web and stores in the database to provide search engines the web pages to display on the search results.

When a web page is submitted to a search engine, the URL is added to the queue of search engine spiders to visit the website. It can also visit a web page when it has links on other web pages. It stores all the links found on the web pages while crawling and add them in the queue.

When a spider visit a website it first checks if a robots.txt file exists in the website directory. If it finds, it follow the guidelines specified in the robots.txt file and it does not visit the web pages that are specified in the file. Now spiders crawls the web pages one by one and it stores all content of the web page like – text, images, links, its page title, description, Meta keywords and URL in the search engine index corresponding the URL of the page. As most of the web pages contain links, this process never stops and spiders continuously visit the new pages as well as the old pages and if the spider crawls web pages again it will store the latest copy in place of the older one.