Ever wondered how Google works? How does it shows the most relevant results when you type in the query? The web has billions of web pages and websites so how the search engine gets us the result within fraction of seconds? Wouldn’t it be interesting to know how it searches for our query from such a huge repository of data? Google came out with a couple of good resources including a infographics and a video by Matt Cutts. So lets get started into world of “How Google works”.
1) Crawling and Indexing
The journey of a query starts before you ever type a search, with crawling and indexing the web of trillions of documents.
Whenever you search for something on internet(web), search engines (eg:Google) send out bots which are kind of algorithmic software to browse through its huge index to search for the most relevant result as per your query. This process is called Crawling.
Google use software known as “web crawlers” to discover publicly available webpages. The most well-known crawler is called “Googlebot.” Crawlers look at webpages and follow links on those pages, much like you would if you were browsing content on the web. They go from link to link and bring data about those webpages back to Google’s servers. —– Google definition
These crawlers create a copy of the websites that are most relevant to specific keywords (entered by the user). It creates an index of all the web pages relevant to specific keywords to reduce the amount of time if the same query is entered again by some user. This process of indexing sites on basis of keywords is called Indexing.
Google essentially gathers the pages during the crawl process and then creates an index, so we know exactly how to look things up. Much like the index in the back of a book, the Google index includes information about words and their locations. When you search, at the most basic level, our algorithms look up your search terms in the index to find the appropriate pages.
- Google navigates the web by crawling.
- It follows links from page to page.
- Sort the pages by their content and other factors.
- Keep a track of all things in index.
You want the answer, not trillions of webpages. Algorithms are computer programs that look for clues to give you back exactly what you want.
For a typical query, there are thousands, if not millions, of webpages with helpful information. Algorithms are the computer processes and formulas that take your questions and turn them into answers. Today Google’s algorithms rely on more than 200 unique signals or “clues” that make it possible to guess what you might really be looking for. These signals include things like the terms on websites, the freshness of content, your region and PageRank.
- User types a query to search.
- Algorithms get to work looking for clues to better understand what you mean.
- Based on these clues it pulls relevant documents from the index.
- Rank the results based on various factors (like: safe search, site quality, user context etc.)
3) Fighting Spam
Every day, millions of useless spam pages are created. We fight spam through a combination of computer algorithms and manual review.
Spam sites attempt to game their way to the top of search results through techniques like repeating keywords over and over, buying links that pass PageRank or putting invisible text on the screen. This is bad for search because relevant websites get buried, and it’s bad for legitimate website owners because their sites become harder to find. The good news is that Google’s algorithms can detect the vast majority of spam and demote it automatically. For the rest, we have teams who manually review sites.
Google is always looking to fight spam to keep the results as relevant as possible.The majority of spam removal is automatic. Google examines other questionable documents by hand. And if they find spam they take manual action which is followed by a informing to the site owner. So site owners can fix and let Google know.
Overall a great resource from Google for people who are new to this. This gives the very basic idea about how search works. It can be very helpful for people who just want an overview of the process instead of going deep into technicalities. Here’s a video by Matt.