Web Science/Part2: Emerging Web Properties/Search Engine Ecosystem

Survival of the fittest
 * Fit for whom?
 * Search engine operator, search users, advertisers
 * Unfit for spammers
 * Key performance indicators (multi-criteria optimization problem!)
 * Value per click
 * User: usability, relevance of search results, coverage of the Web
 * Operator: advertising revenues, low cost and scalable technical infrastructure, low personell costs
 * Advertiser: click-through and conversion rate

what is a search engine?

 * why is it important
 * what is key word search?

Search engine history

 * Archie, 1990
 * Gopher, 1991
 * WebCrawler, Lycos, Yahoo search 1994
 * AltaVista search 1996
 * Google search 1998
 * Sequels: Baidu, Yandex, Bing
 * Alternatives: ask.com, wolframalpha.com
 * Vertical search: for products - amazon.com, for people: peoplefinder.com, for egosearch (identity theft prevention): garlik.com,...

Search system architecture

 * what is a web crawler
 * what is a search index (inverted index)
 * (for now) blackbox ranking
 * binary search relevance
 * interface (auto completion, search results,...)

ranking in search I: application of tf idf

 * show how tf idf can be used for ranking.

ranking in search II: random surfer model

 * explaining random surfer model
 * Pagerank Random Surfer animation.svg]]

comparison tfidf vs random surfer

 * Random surfer + tfidf
 * showing how to combine two models.
 * even more methods can be included

relevance is a choice: Trust issues with search engines

 * understand that algorithms are programmed by humans and it is up to us to trust a search engine / choose one
 * it will be hard to sense manipulations (magic keyword barack obama)
 * large search engines are about the most powerful institutions on the web (money wise but also with regards to impact)

SPAM and SEO

 * understand that search results can be manipulated
 * metadata (schema.org)

The following video of the flipped classroom associated with this topic are available:

You can find more information on wiki commons and also directly download this file

multi stakeholder system

 * search engine
 * end user
 * web site owner
 * advertiser
 * (web master (SEO))

economics of a search engine

 * understand the concept of keyword based advertising
 * understand the auction system of keywords
 * understand the model of shared econnomy and man in the middle business models
 * taken from Strategy_for_Information_Markets/Search_engine_business_models and Vickrey_auction
 * http://onlinelibrary.wiley.com/doi/10.1111/j.1540-6261.1961.tb02789.x/pdf
 * Generalized_second-price_auction
 * 
 * 

personalization of search results

 * key methods of personalization (using a coockie)
 * graph view of user interests
 * collaborative filtering

Technologies for your own search engine

 * hadoop
 * solr
 * nutch
 * Elastic search



Key to the most successful search engines was their successful competition for search customers and advertisement customers. Both competitions will be explained in the next two weeks

Advertising
Stakeholders
 * advertiser
 * customer
 * content owner/portal
 * advertising network

Intermediaries:
 * markets (ebay)
 * advertising networks (doubleclick,...)

push out advertisement service from the portal into ad network
 * customer: more exact profile, better ad targeting
 * content owner/portal: better targeted ads lead to higher revenue
 * advertiser: higher click-through rate/conversion rate
 * ad network: valuable business model


 * Technology
 * Business model
 * Pricing, auctions
 * real-time bidding