<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Seeing lots of Wikipedia in your Google searches?</title>
	<atom:link href="http://www.jurecuhalev.com/blog/2006/10/13/seeing-lots-of-wikipedia-in-your-google-searches/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.jurecuhalev.com/blog/2006/10/13/seeing-lots-of-wikipedia-in-your-google-searches/</link>
	<description>In pursuit of The Idea</description>
	<lastBuildDate>Fri, 12 Mar 2010 15:32:11 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: goabroad</title>
		<link>http://www.jurecuhalev.com/blog/2006/10/13/seeing-lots-of-wikipedia-in-your-google-searches/comment-page-1/#comment-18507</link>
		<dc:creator>goabroad</dc:creator>
		<pubDate>Wed, 29 Apr 2009 13:14:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.kiberpipa.org/~gandalf/blog/?p=66#comment-18507</guid>
		<description>good work. I like Wiki in google search results as it gives you most relevant and comprehensive information.</description>
		<content:encoded><![CDATA[<p>good work. I like Wiki in google search results as it gives you most relevant and comprehensive information.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: åž‚æ­»çˆ­æ‰Ž</title>
		<link>http://www.jurecuhalev.com/blog/2006/10/13/seeing-lots-of-wikipedia-in-your-google-searches/comment-page-1/#comment-1180</link>
		<dc:creator>åž‚æ­»çˆ­æ‰Ž</dc:creator>
		<pubDate>Mon, 13 Nov 2006 14:34:10 +0000</pubDate>
		<guid isPermaLink="false">http://www.kiberpipa.org/~gandalf/blog/?p=66#comment-1180</guid>
		<description>&lt;strong&gt;Wikipedia æ”»ä½” Search Engine...&lt;/strong&gt;

ç•¶å¤§å®¶ç”¨ Googleã€Yahoo search çš„æ™‚å€™ï¼Œæœƒç™¼è¦ºåœ¨é ­å¹¾å€‹çµæžœè£¡é¢ï¼Œç¸½æœƒæœ‰ä¸€å€‹ Wikipedia çš„ entryï¼Œè€Œä¸”æ›´æœ‰è¶Šä¾†è¶Šå¤šçš„è¶¨å‹¢ã€‚...</description>
		<content:encoded><![CDATA[<p><strong>Wikipedia æ”»ä½” Search Engine&#8230;</strong></p>
<p>ç•¶å¤§å®¶ç”¨ Googleã€Yahoo search çš„æ™‚å€™ï¼Œæœƒç™¼è¦ºåœ¨é ­å¹¾å€‹çµæžœè£¡é¢ï¼Œç¸½æœƒæœ‰ä¸€å€‹ Wikipedia çš„ entryï¼Œè€Œä¸”æ›´æœ‰è¶Šä¾†è¶Šå¤šçš„è¶¨å‹¢ã€‚&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: WikiAngela</title>
		<link>http://www.jurecuhalev.com/blog/2006/10/13/seeing-lots-of-wikipedia-in-your-google-searches/comment-page-1/#comment-1139</link>
		<dc:creator>WikiAngela</dc:creator>
		<pubDate>Sat, 11 Nov 2006 01:45:33 +0000</pubDate>
		<guid isPermaLink="false">http://www.kiberpipa.org/~gandalf/blog/?p=66#comment-1139</guid>
		<description>&lt;strong&gt;MSN doesnâ€™t like Wikipediaâ€¦ or Encarta...&lt;/strong&gt;

[Trackback] Jure ÄŒuhalev has published some interesting results from a small study of how three search engines rank Wikipedia in a search for Wikipedia article titles.
...
A really odd result that Jure sent me by email is that MSN does not link to.....</description>
		<content:encoded><![CDATA[<p><strong>MSN doesnâ€™t like Wikipediaâ€¦ or Encarta&#8230;</strong></p>
<p>[Trackback] Jure ÄŒuhalev has published some interesting results from a small study of how three search engines rank Wikipedia in a search for Wikipedia article titles.<br />
&#8230;<br />
A really odd result that Jure sent me by email is that MSN does not link to&#8230;..</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: eszter</title>
		<link>http://www.jurecuhalev.com/blog/2006/10/13/seeing-lots-of-wikipedia-in-your-google-searches/comment-page-1/#comment-1131</link>
		<dc:creator>eszter</dc:creator>
		<pubDate>Fri, 10 Nov 2006 06:34:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.kiberpipa.org/~gandalf/blog/?p=66#comment-1131</guid>
		<description>I&#039;ve posted this comment over on Nick Carr&#039;s blog and Micropersuasion, but thought I&#039;d add it here as well. I agree with Philipp that the methodology is flawed.

The starting point needs to be what users search for not what Wikipedia covers. By sampling from existing Wikipedia entries you are sampling on the dependent variable. By definition the study is controlling for the fact that a relevant Wikipedia entry exists using that query since you derived the search terms from existing Wikipedia titles. Queries on those exact terms are going to favor pages that have the term in the title. But who is to say that people search for those topics using those terms?

You could try using the AOL data for some possibilities (like Philipp suggests), but we don&#039;t really know how representative AOL users are of all Internet users. You could get some ideas from Google&#039;s Zeitgeist (as per Bertil&#039;s suggestion), although that will only give you extremely common topics that may have tons of results and so may well be atypical results not reflecting the likelihood of a Wikipedia result for less common terms and topics. 

I do research on how users look for various types of information online. If interested, we could discuss the possibility of you using some of the terms people in my study - average Internet users - entered on search forms for various types of content. I may not have quite the sample size you&#039;re looking for, but I&#039;d have some queries from real folks. (I also happen to know what they clicked on when using a particular search engine so that could also be interesting additional data.)</description>
		<content:encoded><![CDATA[<p>I&#8217;ve posted this comment over on Nick Carr&#8217;s blog and Micropersuasion, but thought I&#8217;d add it here as well. I agree with Philipp that the methodology is flawed.</p>
<p>The starting point needs to be what users search for not what Wikipedia covers. By sampling from existing Wikipedia entries you are sampling on the dependent variable. By definition the study is controlling for the fact that a relevant Wikipedia entry exists using that query since you derived the search terms from existing Wikipedia titles. Queries on those exact terms are going to favor pages that have the term in the title. But who is to say that people search for those topics using those terms?</p>
<p>You could try using the AOL data for some possibilities (like Philipp suggests), but we don&#8217;t really know how representative AOL users are of all Internet users. You could get some ideas from Google&#8217;s Zeitgeist (as per Bertil&#8217;s suggestion), although that will only give you extremely common topics that may have tons of results and so may well be atypical results not reflecting the likelihood of a Wikipedia result for less common terms and topics. </p>
<p>I do research on how users look for various types of information online. If interested, we could discuss the possibility of you using some of the terms people in my study &#8211; average Internet users &#8211; entered on search forms for various types of content. I may not have quite the sample size you&#8217;re looking for, but I&#8217;d have some queries from real folks. (I also happen to know what they clicked on when using a particular search engine so that could also be interesting additional data.)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jure Cuhalev</title>
		<link>http://www.jurecuhalev.com/blog/2006/10/13/seeing-lots-of-wikipedia-in-your-google-searches/comment-page-1/#comment-1128</link>
		<dc:creator>Jure Cuhalev</dc:creator>
		<pubDate>Thu, 09 Nov 2006 22:56:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.kiberpipa.org/~gandalf/blog/?p=66#comment-1128</guid>
		<description>Philipp Lenssen: I agree with you and I&#039;m working on some better methodology. Still, it&#039;s a full disclosure of methodology so at least I&#039;m not pulling results/queries of the air and claiming something.</description>
		<content:encoded><![CDATA[<p>Philipp Lenssen: I agree with you and I&#8217;m working on some better methodology. Still, it&#8217;s a full disclosure of methodology so at least I&#8217;m not pulling results/queries of the air and claiming something.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Micro Persuasion</title>
		<link>http://www.jurecuhalev.com/blog/2006/10/13/seeing-lots-of-wikipedia-in-your-google-searches/comment-page-1/#comment-1118</link>
		<dc:creator>Micro Persuasion</dc:creator>
		<pubDate>Thu, 09 Nov 2006 18:08:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.kiberpipa.org/~gandalf/blog/?p=66#comment-1118</guid>
		<description>&lt;strong&gt;Study Finds Google Favors Wikipedia...&lt;/strong&gt;

Use Google? Well there&#039;s an 81% likelihood that you will see results from Wikipedia in the top 10 search results, according to an analysis performed recently by Jure Cuhalev. He ran 1000 random terms from Wikipedia into Google, MSN Search...</description>
		<content:encoded><![CDATA[<p><strong>Study Finds Google Favors Wikipedia&#8230;</strong></p>
<p>Use Google? Well there&#8217;s an 81% likelihood that you will see results from Wikipedia in the top 10 search results, according to an analysis performed recently by Jure Cuhalev. He ran 1000 random terms from Wikipedia into Google, MSN Search&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Undercurrent</title>
		<link>http://www.jurecuhalev.com/blog/2006/10/13/seeing-lots-of-wikipedia-in-your-google-searches/comment-page-1/#comment-1116</link>
		<dc:creator>Undercurrent</dc:creator>
		<pubDate>Thu, 09 Nov 2006 15:11:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.kiberpipa.org/~gandalf/blog/?p=66#comment-1116</guid>
		<description>&lt;strong&gt;Wikipedia search engine dominance revisited...&lt;/strong&gt;

Recently many have noted that Wikipedia articles show up on the first page of hits in the big search engines (samples: Nicholas Carr, myself.) g. (a.k.a. Jure Cuhalev, according to Wikipedia Signpost), checked the tendency more thoroughly by reviewing ...</description>
		<content:encoded><![CDATA[<p><strong>Wikipedia search engine dominance revisited&#8230;</strong></p>
<p>Recently many have noted that Wikipedia articles show up on the first page of hits in the big search engines (samples: Nicholas Carr, myself.) g. (a.k.a. Jure Cuhalev, according to Wikipedia Signpost), checked the tendency more thoroughly by reviewing &#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Philipp Lenssen</title>
		<link>http://www.jurecuhalev.com/blog/2006/10/13/seeing-lots-of-wikipedia-in-your-google-searches/comment-page-1/#comment-1049</link>
		<dc:creator>Philipp Lenssen</dc:creator>
		<pubDate>Wed, 01 Nov 2006 11:04:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.kiberpipa.org/~gandalf/blog/?p=66#comment-1049</guid>
		<description>I think to determine &quot;how much Wikipedia people see on top of Google&quot; you&#039;d have to change your methodology -- e.g. use actual AOL query data (and even then you&#039;d have the big constraint that AOL searchers may not be typical, but it would be a start, and as bonus you&#039;d also know where they clicked on).

The fact that searching for Wikipedia titles often brings up Wikipedia doesn&#039;t, IMO, yield relevant results, unless you want to show that Wikipedia has lots of pages indexed in search engines (around over 53 million in Google, according to Google&#039;s &quot;site&quot; operator). But lots of pages indexed does not mean lots of pages will show up in search results. For Wikipedia, we all *know* that&#039;s the case from our searching experience, but to come up with statistically relevant data you&#039;d have to use actual real sample queries for probing.</description>
		<content:encoded><![CDATA[<p>I think to determine &#8220;how much Wikipedia people see on top of Google&#8221; you&#8217;d have to change your methodology &#8212; e.g. use actual AOL query data (and even then you&#8217;d have the big constraint that AOL searchers may not be typical, but it would be a start, and as bonus you&#8217;d also know where they clicked on).</p>
<p>The fact that searching for Wikipedia titles often brings up Wikipedia doesn&#8217;t, IMO, yield relevant results, unless you want to show that Wikipedia has lots of pages indexed in search engines (around over 53 million in Google, according to Google&#8217;s &#8220;site&#8221; operator). But lots of pages indexed does not mean lots of pages will show up in search results. For Wikipedia, we all *know* that&#8217;s the case from our searching experience, but to come up with statistically relevant data you&#8217;d have to use actual real sample queries for probing.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jure Cuhalev</title>
		<link>http://www.jurecuhalev.com/blog/2006/10/13/seeing-lots-of-wikipedia-in-your-google-searches/comment-page-1/#comment-1047</link>
		<dc:creator>Jure Cuhalev</dc:creator>
		<pubDate>Tue, 31 Oct 2006 23:41:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.kiberpipa.org/~gandalf/blog/?p=66#comment-1047</guid>
		<description>Bertil, thanks for the comments. I&#039;ll email you about the details, but until then here are quick answers to the questions:

1. About ask.com: I would live to include more search engines, but only &quot;big tree&quot; offer public API&#039;s that I could use to query the data without having to write my own search engine scrapper.

If you know any other search engines that offer some sort of API or other way to automaticly query for data I would be happy to include it.

2. What kind of dynamic data? I have another version that I also tested but didn&#039;t publish results yet where I take queries from WP:RecentChanges in a certain time window to query only for pages that are active. Those number would probably give me even more pro-wikipedia results.

If there is a good source of data it would certainly be interesting to do it on them.

3. Sure, zeitgeist sounds like a good idea, but it&#039;s probably easier if you just do it manualy then for me to feed it into my system.

If you can email me with details how to get more detailed zeitgeist information I would be *very* happy to repeat it again on that dataset.</description>
		<content:encoded><![CDATA[<p>Bertil, thanks for the comments. I&#8217;ll email you about the details, but until then here are quick answers to the questions:</p>
<p>1. About ask.com: I would live to include more search engines, but only &#8220;big tree&#8221; offer public API&#8217;s that I could use to query the data without having to write my own search engine scrapper.</p>
<p>If you know any other search engines that offer some sort of API or other way to automaticly query for data I would be happy to include it.</p>
<p>2. What kind of dynamic data? I have another version that I also tested but didn&#8217;t publish results yet where I take queries from WP:RecentChanges in a certain time window to query only for pages that are active. Those number would probably give me even more pro-wikipedia results.</p>
<p>If there is a good source of data it would certainly be interesting to do it on them.</p>
<p>3. Sure, zeitgeist sounds like a good idea, but it&#8217;s probably easier if you just do it manualy then for me to feed it into my system.</p>
<p>If you can email me with details how to get more detailed zeitgeist information I would be *very* happy to repeat it again on that dataset.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bertil</title>
		<link>http://www.jurecuhalev.com/blog/2006/10/13/seeing-lots-of-wikipedia-in-your-google-searches/comment-page-1/#comment-1045</link>
		<dc:creator>Bertil</dc:creator>
		<pubDate>Tue, 31 Oct 2006 22:33:22 +0000</pubDate>
		<guid isPermaLink="false">http://www.kiberpipa.org/~gandalf/blog/?p=66#comment-1045</guid>
		<description>Hi!
Great study: I&#039;ve just overlooked it, but there are three &quot;obvious&quot; remarks to be made.

1. What about Ask.com? Have you considered smaller, alternative search-engines for comparison purpose. I&#039;m thinking of an open-source one (sic) that might be a good base line.

2. Please, please, do it again, to have some dynamic data... I&#039;d love to help, if it too much work. (You&#039;ve got my e-mail, though it&#039;s not public, right?)

3. Could you use Zeitgest info, instead of a Wikipedia biased query file?
This only has 10 items or so, http://www.google.com/press/zeitgeist.html
but I believe you might obtain an list of the top 100, unweighted, sorted alphabetically, from one of the four big SE; you can even sign an agreement not to publish it.

I might post another comment when I&#039;m over with the full detail reading.</description>
		<content:encoded><![CDATA[<p>Hi!<br />
Great study: I&#8217;ve just overlooked it, but there are three &#8220;obvious&#8221; remarks to be made.</p>
<p>1. What about Ask.com? Have you considered smaller, alternative search-engines for comparison purpose. I&#8217;m thinking of an open-source one (sic) that might be a good base line.</p>
<p>2. Please, please, do it again, to have some dynamic data&#8230; I&#8217;d love to help, if it too much work. (You&#8217;ve got my e-mail, though it&#8217;s not public, right?)</p>
<p>3. Could you use Zeitgest info, instead of a Wikipedia biased query file?<br />
This only has 10 items or so, <a href="http://www.google.com/press/zeitgeist.html" rel="nofollow">http://www.google.com/press/zeitgeist.html</a><br />
but I believe you might obtain an list of the top 100, unweighted, sorted alphabetically, from one of the four big SE; you can even sign an agreement not to publish it.</p>
<p>I might post another comment when I&#8217;m over with the full detail reading.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
