Personal Evaluations of Search Engines:
Google, Yahoo! and
Department of
Date of Evaluation: Fall, 2006
Abstract
In
many ways, search engines have become the most important tool for our
information seeking. Due to their
tremendous economic value, search engine companies constantly put major efforts
to improve their search results. Measuring
search effectiveness is thus an important issue. Although many evaluations have been done on
different search engines in the past, they mainly use fixed sets of queries and
judge the relevance of each returned page by a panel of human judges [1, 3, 4, 5, 8, 9]. The
results were often measured based on precision and recall just like in
information retrieval. However, this
evaluation method is by no means ideal because relevance does not mean user satisfaction.
User satisfaction can only be measured using queries from the user's daily
information needs and based on his/her personal assessment of utility of the
returned results to the queries. An ideal evaluation is a personal evaluation.
In this article, I describe and summarize personal evaluations of 25 people on
three major search engines, Google, Yahoo! and
1. Introduction
The motivation for this
evaluation was mainly to satisfy my own curiosity: how different are the main
search engines in terms of their search effectiveness? Over the years, I heard
people saying both that they are very different and that they are quite
similar. I decided to evaluate myself. One day in summer 2006, I decided to
force myself to use only
As we all know, due to the information redundancy on the Web, it is easy to find a huge number of relevant pages to almost any query. Thus page relevance is no longer a major issue. The usefulness/utility of each top-ranked page to each individual user becomes the key. The evaluation of usefulness can only be done based on queries derived from the user's personal information needs and his/her personal perception of the returned results to the queries. It was this belief that guided this evaluation. Such an evaluation can truly tell us why users choose one search engine over another. The results also show the weaknesses of each search engine and hence tell the search engine company where they should focus their efforts on in order to improve their search effectiveness.
2. Evaluation
Setup
The evaluation was conducted in
September 2006 with 25 students in a data mining and text mining class at the
Department of Computer Science,
The students were split into three groups. The evaluation was done in a span of three weeks. The students were asked to use only one search engine in each week. Only if the designated search engine was unable to give a satisfactory result, might other search engines be employed. In order not to favor any search engine and to mitigate the impact of positive or negative sentiments on one search engine affecting the evaluation of the next search engine, each group is assigned to use a different search engine in each week with the following schedule:
Week 1:-------------------------Week
2 -------------------------------Week 3
Group
1: Google -------------Group 1:
Group
2: Yahoo! ------------Group 2: Google ------------------Group 2:
Group 3:
We did not have a user interface system that could hide the identities of search engines such that students did not know which search engines they were using. Thus, the evaluation could be slightly affected by their pre-conceptions about each search engine. However, I explicitly told the students that they should be as fair as possible and not affected by any pre-conceptions of which search engine is better, and they should not factor in the efficiency (or speed) of each system in their evaluation (see Section 4, efficiency is nevertheless an important issue).
As indicated above, our evaluation had no fixed queries. The students were asked to perform their daily searches as usual based on their daily information needs without any change. The only requirement was that they needed to stick to the same search engine for the week and only to use another search engine if the first search engine did not give good results. They were also asked to record two pieces of simple information for each query, type of query and level of personal satisfaction. No precision or rank positions of the search results were measured.
In the existing literature, two main types of queries were identified [2, 7], navigational queries and information queries.
A navigational query is one that usually has only one satisfactory result, or there is a unique page that the user is looking for. For example, the user types "CNN" and expects to find the Web site of CNN.com, or types the name of a researcher and expects to find his/her homepage.
An informational query can have a few or many appropriate results, with varying degrees of relevance or utility and varying degrees of authority. Many times the user may need to read a few pages to get the complete information. For example, the user types the topic "search engine evaluation" and expects to be provided with pages related to the topic.
Note that the taxonomies of search queries proposed in [2, 7] include sub-categories of multiple levels. They also have another large category called transactional queries [2, 7]. However, there is still no general consensus on the classification. I only used the two most frequent ones in order to make the evaluation simple and less painful to the students (not to confuse them too).
I used only three levels of personal satisfaction, completely satisfied, partially satisfied and not satisfied (again for simplicity). Each student decides for him/herself the satisfaction level for the results of each query without any given criteria.
On possible bias of the
evaluation, I believe that for navigational queries bias of students'
pre-conceptions was not likely because they knew exactly the pages that they
were looking for. However, for informational queries, some slight bias may
exist but should be minimized because of my warning above. As we will see
later, it is the navigational queries that Google has a huge lead over both
Yahoo! and
3. Evaluation Results
Table 1 gives the results for navigational queries, and Table 2 gives the results for informational queries. In each table, columns 2, 3 and 4 are the results for the three search engines. Each cell in row 2 shows the number of not satisfied searches. The number within () in each cell is the percent of queries that falls into the cell. Likewise, row 2 and row 3 give the corresponding results for partially satisfied and completely satisfied queries. The final row in each column gives the total number of queries for that column.
Table 1: Results for navigational queries.
Navigational query |
Google |
Yahoo! |
|
Not
satisfied |
15(4%) |
37(14%) |
49(19%) |
Partially
satisfied |
39(11%) |
60(23%) |
70(27%) |
Completely
satisfied |
303(85%) |
166(63%) |
141(54%) |
Total |
357 |
263 |
260 |
Table 2. Results for informational queries.
Informational query |
Google |
Yahoo! |
|
Not
satisfied |
137(21%) |
103(21%) |
110(21%) |
Partially
satisfied |
93(14%) |
149(30%) |
162(31%) |
Completely
satisfied |
416(65%) |
247(49%) |
257(48%) |
Total |
646 |
499 |
529 |
From Table 1, we observe that for navigational queries,
Google is dramatically better than Yahoo! and
For informational queries,
Google is also better than the other two. However, the difference is much
smaller compared to that for navigational queries. Yahoo! and
From the results in both tables, we can clear see why Google dominates in the marketplace. Google's key strength is its ability to find the right page for almost every navigational query!
We noticed that students had
more informational queries than navigational queries. This difference may
simply be due to the fact that they were students and thus search for
information on more complex topics. However, for the general public, the
situation may be reversed. We also note that there were more queries for Google
than for Yahoo! and
4. Before and After the Evaluation
Right before the evaluation, I
asked the students which search engines they were using daily. Everyone
answered Google. Only 4 students said they tried Yahoo! search a few times
before. No student used
1. Has the evaluation changed your perceptions on the three search engines?
2. Would you consider switching to a different search engine?
The answer for question 1 was Yes for 80% of the students. They said
that Yahoo! and
A final note is that students
raised the issue that both Yahoo! and
5. Conclusion
This article reported a group of
25 personal evaluations of three search engines, Google, Yahoo! and
Final thoughts: Google was remarkably better than Yahoo! and Live search in the Sept 2006 evaluation. However, Google may be reaching the limit of the current search paradigm. Further improvement by Google will take much more effort (possibly exponential amount of effort for linear gain). It is thus time for both Yahoo! and Live search to catch up, which is easier to do. Based on the evaluation results, I believe that it is going to very hard for either one of them to overtake Google (unless Google makes bad decisions), but to get close to Google is very likely, which in my opinion will happen in a not-long-distance future.
Acknowledgements
I would like to thank the 25 students in my fall 2006 CS583 class for their participation in the evaluation. Xiaowen Ding helped analyze the evaluation results. Some discussions with Zijian Zheng (Microsoft) and Ramakrishnan Srikant (Google) helped improve the presentation.
References