Web Science/Part2: Emerging Web Properties/Ranking

Short info
3 dimensions of social capital:
 * Structural (links, Page Rank and Random surfer)
 * Cognitive (content, tf-idf)
 * Relations (retweets in twitter)

What we will look at next are: --oleamm (discuss • contribs) 01:05, 15 February 2014 (UTC)
 * User Interest
 * Attention
 * Innovation rate (production of new hashtags)

Short info from the video
Meme implies
 * Information
 * Replication Mechanism (social mechanism, communication, incentive)

In twitter #hashtag is a part (proxy) of meme, while meme is a full message.

Entropy measures how complex (or diverse) information (or user interests) is. Entropy formula.

User interests as a vector. For each user interests vector, entropy can be calculated.

User interests entropy vs system entropy (example on a diagram). Note: all vectors should have the same size (the number of terms held in the vector).

Similarity of user interests. Cosine similarity between user interests vectors. Only possible between 0 and 90 degrees, since there are no negative values in user interests vectors (term's use can not be negative). Cosine similarity = 1 means the angle between two vectors is 0 => we have similar interests. Cosine similarity = 0 means vectors (interests) are not similar at all. Example: does a particular tweet is correlated with a user interest? (suggestions what is interesting for user).

We have interests (without fading – interests do not depending on time, for simplifying) and entropy (diversity).

Meme diffusion model explanation: tweets, screen and user memory size. Interesting point: More memory cause less memes.

Discussed paper: Competition among memes in a world with limited attention (L. Weng,	 A. Flammini, A. Vespignani, F. Menczer). Link. --oleamm (discuss • contribs) 20:08, 11 February 2014 (UTC)

Short info from the video
W3C Meta Framework consisting of:
 * Identity Framework. Authentication, Oauth: client (application), resource owner (end user), server (facebook).
 * Profile Framework. Distributed user profile.
 * Policy Framework. W3C P3P - what data is stored, how and by whom it is used, how long is stored. Usability is still an issue.
 * Content Framework. Cross posting content (for ex. from twitter to facebook).
 * Analytics Framework. Tracking user.
 * Other

Second part.

Modelling user. User characteristics (info, interest - vectors and similarity). Elicitation, customization, stereotyping. --oleamm (discuss • contribs) 14:54, 15 February 2014 (UTC)

Web User Profiling and Recommendations
When you look back to the definition of Web Science, you see the users in the corresponding picture:



But, up to this point we have mainly looked at:
 * 1) Information (e.g. Web page, tweet, meme)
 * 2) * We have aggregated data about such information, e.g.
 * 3) ** terms (words) used
 * 4) ** (page)Rank
 * 5) ** Number of retweets
 * 6) Structure (e.g. page links, friendship links)
 * 7) * We have aggregated data wrt structure, e.g.
 * 8) ** indegree
 * 9) ** outdegree
 * 10) ** degree distribution

But, we have not so much aggregated data about the *user*, although in a technical-social system the representation of the user, his/her activities is central indeed!

In fact, when we either want to analyze the user from a Web Science perspective, or when we want to understand a user's behavior from a commercial point of view (E.g. for advertisement reasons, or for offering him a better Web service), or if we had unethical/illegal objectives of tracking a user (e.g. to steal his identity), we need models that capture the different aspects of a user.

And if we are the user and we want to protect us from unethical or illegal behavior of others, we need to know what they do in order to be able to protect ourselves.

And if we want to provide an ethical and legal service, we let the user choose to which extent he/she wants to be tracked for which kind of purposes. Thereby, providing an ethical and legal service is not just an altruistic objective (it may be), but clearly organizations that behave unethically and gain a bad reputation may quickly loose their business proposition. Or, do you want your credit card number / your identity / your privacy being stolen?

The Filter Bubble
Eli Pariser anectodes

Herding: The Music Experiments Series
These experiments were performed as part of a larger set of experiments by the authors described at: http://www.princeton.edu/~mjs3/musiclab.shtml

The complement to actual herding are paid-for activities, e.g. Facebook like farms: http://www.technologyreview.com/view/530961/the-hidden-world-of-facebook-like-farms/

Rests
1. Sorting into relevant - not relevant 2. ranking according to * any kind of ranking function * page rank * ...  * personalization
 * Web search typically happens in two steps:

* User profile * User represented by vector * Set of users represented by matrix * Recommendations based on linear algebra
 * Personalization and Recommendation

* product/music/... recommendations (slideshare: Steffen likes his own slides!)
 * Examples