User talk:Oleamm/Search engines history and architecture

General remarks

 * You use an unusual high amount of parentheses. Try to avoid them and instead add the information to a separate short sentence or a subordinate clause. --Onse (discuss • contribs) 14:38, 28 January 2014 (UTC)
 * You often write things without proving them. Try to back up more of your claims by existing literature. A great starting point to find articles and books is Google Scholar. I added some "citation needed"-markups from time to time. Bear in mind that adding sources for these sentences is a good start but by far not enough. --Onse (discuss • contribs) 14:38, 28 January 2014 (UTC)
 * You might want to restructure your work as you use terms like "crawlers" before fully explaining them. Alternatively, you could at least note after a very short explanation that the architecture and crawlers are being explained in a later section. --Onse (discuss • contribs) 14:38, 28 January 2014 (UTC)
 * Sometimes, I lack a clear structure. What I mean with this is that one part will end and the next will start without any further transition. Try to connect the sections and paragraphs better by informing the reader what will come next and why. --Onse (discuss • contribs) 14:38, 28 January 2014 (UTC)
 * As a rule of thumb: a section should at least cover a half page. Some of your sections are too short considering this rule. --Onse (discuss • contribs) 14:48, 28 January 2014 (UTC)
 * You should enumerate your sections like it is done in the example chapter. This helps because although section-levels are clearly distinct in the Wiki-syntax, they are hardly in the CSS-formatting. I have not added any sections number yet, as your structuring might still change. --Onse (discuss • contribs) 13:16, 30 January 2014 (UTC)
 * When enumerating, make sure not to nest your sections too deeply. In general, one says that three levels mark the maximum depth. Hence, 1.5.1.1 should be avoided. --Onse (discuss • contribs) 23:50, 12 March 2014 (UTC)
 * You seem to be using graphics made by others, which are licenced as CC BY-SA 3.0. This is good in general but I doubt that linking to the original work is enough in terms of crediting the author. Especially, if you think that this might be published as a (digital) book, which can be read offline . It is therefore advisable to reference the source. --Onse (discuss • contribs) 13:16, 30 January 2014 (UTC)
 * When first introducing a new term, it can help to put it in italics. --Onse (discuss • contribs) 13:53, 30 January 2014 (UTC)
 * You are using several abbreviations which you never spell in full. Hence, you might want to add a list of abbreviations. --Onse (discuss • contribs) 14:00, 30 January 2014 (UTC)
 * I like how you used a reference template that distinguishes through its syntax the information about publication like titles, authors and so on. As I did not know this template, I do not know how much the display of the references can be influenced. But personally speaking, I would prefer a citation style that would show more information about a publication – especially its authors. --Onse (discuss • contribs) 14:34, 30 January 2014 (UTC)
 * This might be as regarded personal preference but I like the captions for tables to be below the tables as it is typical for scientific papers. You can do this by applying this CSS code to your caption: . --Onse (discuss • contribs) 19:55, 30 January 2014 (UTC)
 * Imprecise words like "often" or pseudo-argumentative words like "obviously" should be avoided in scientific writing. --Onse (discuss • contribs) 13:42, 7 March 2014 (UTC)
 * Dashes make reading unnecessarily hard. An example would be your sentence "PageRank – special quality ranking value that is calculated for each indexed document [on] the Web". Rewrite those sentences by simply using verbs. In this case, you could start the sentence e.g. with "PageRank is a special quality…" --Onse (discuss • contribs) 21:10, 10 March 2014 (UTC)
 * Some sentences are quite long. Split these up into two or more sentences as this eases reading and understanding. --Onse (discuss • contribs) 21:24, 11 March 2014 (UTC)

Personalised search and filter bubbles
With your latest commit, you have reproduced one of the main topics of my part. We should definitely merge these two. --Onse (discuss • contribs) 17:18, 26 January 2014 (UTC)

Web Search Engine
This section could be renamed to something more meaningful. Even better would probably merge it with "What is a search engine?" as a general overview to search engines. Also, this could be used as an overview to the chapter indicating which of the later sections will deal with what topic. This way, the reader will know how the document is structured and what information she will later gain. Also, it would legitimise that this section contains effectively no sources backing up your claims, because the information will be worked on in the later sections. --Onse (discuss • contribs) 13:27, 28 January 2014 (UTC)

History

 * I have no clue what you mean when you are saying that Gopher had a look. Try reformulating this. --Onse (discuss • contribs) 14:38, 28 January 2014 (UTC)
 * I also don't understand what you are trying to say later with "[Veronica] only provided search by Gopher menus, or wide area information server provided a full text search."--Onse (discuss • contribs) 14:38, 28 January 2014 (UTC)
 * The last part of Yahoo! is quite confusing. You state that it originally had no crawlers. This of course is true as Yahoo! started off as a Web directory and you earlier defined Web directories to be managed by hand. Therefore, the transition to a full-fledged search engine (which later e.g. used Bing) does not become clear. --Onse (discuss • contribs) 14:38, 28 January 2014 (UTC)
 * You claim that AltaVista used "a very advanced hardware for its back-end". What does that mean? Why or how was it advanced? How did it differ from previous search engines? --Onse (discuss • contribs) 14:38, 28 January 2014 (UTC)
 * Also you write that AltaVista's minimalistic interface was an innovation. Why? Where earlier sites all bloated with information? --Onse (discuss • contribs) 14:38, 28 January 2014 (UTC)
 * If you write about alternative search engines, you might want to note search engines that focus a bit more on user privacy like DuckDuckGo, Startpage, Gibiru etc. --Onse (discuss • contribs) 14:38, 28 January 2014 (UTC)
 * "Wolfram Alpha is a computational knowledge engine and not a search engine" - then why do you list it as an alternative search engine? Either explain why it fits or leave it out. --Onse (discuss • contribs) 14:38, 28 January 2014 (UTC)
 * You might want to motivate why meta search engines exist at all. In earlier days, they tried to present a more complete picture of the Web. Nowadays, it might be about showing unfiltered results or applying a "better" ranking algorithm. --Onse (discuss • contribs) 14:38, 28 January 2014 (UTC)

Architecture

 * This section could need a few more references. These are descriptions of more or less technically advanced topics, which should be backed up by literature. For a general approach, I can recommend you Arasu et al., which you also referenced. --Onse (discuss • contribs) 13:37, 30 January 2014 (UTC)
 * Your figure 3 does not only show the "High-level architecture of a standard Web crawler" but more the infrastructure and data flow for crawling. Either be more precise on that or highlight the Web crawler in the original figure. This remark is also applicable on figure 2. What I dislike about these graphics in general is that they do neither follow a standardised modelling language nor have a legend. This leaves the reader with guessing what different shapes, arrows and annotations should mean. --Onse (discuss • contribs) 13:45, 30 January 2014 (UTC)
 * "because various unnecessary situations can occur" I am not 100% sure what you are trying to say here but you are definitely looking for a different adjective than "unnecessary". Maybe it is "unwanted". --Onse (discuss • contribs) 13:59, 30 January 2014 (UTC)
 * "infinitive variance of HTTP GET parameters in each URL is possible" I honestly do not understand this sentence. Do you mean redirect-loops? --Onse (discuss • contribs) 13:59, 30 January 2014 (UTC)
 * "At the same time the amount of unique content is limited" – I have changed the word "bounded" to "limited" but I am still a bit unsure, what you are trying to say here. The following sentences do not seem to explain it further. Are you talking about mechanisms to detect (and filter out) duplicates? --Onse (discuss • contribs) 13:59, 30 January 2014 (UTC)
 * "DOM can be often violated" – you talk about DOM without introducing or at least shortly explaining it. Additionally, you might want to talk about the DOM specification or interface to make clear to the reader, why DOM can be "violated". This is just to prevent ambiguities created by the W3C when they named the interface a "model" (which could hardly be violated). --Onse (discuss • contribs) 14:10, 30 January 2014 (UTC)
 * You seem to add name your questions for Web crawlers in parentheses. This is not clear while reading it the first time. You could use italics and might place the name in front: "Re-visit policy: How often should the crawler refresh each page?". Also, you should note beforehand that these questions can be understood as policies. Here, you should additionally make clear, were you derived these names from. They would be of no(t much) use for the reader if you just made them up but can be of interest if they are commonly used in the literature. --Onse (discuss • contribs) 14:29, 30 January 2014 (UTC)
 * "using the minimal bypass of the entire collection of the documents" I don't understand this part. --Onse (discuss • contribs) 14:55, 30 January 2014 (UTC)
 * You talk about "backlinks" but do not explain what these are. --Onse (discuss • contribs) 14:55, 30 January 2014 (UTC)
 * PageRank is mentioned several times before the reader knows what it is. Either restructure your document, introduce it shortly in the beginning or write about general ranking algorithms. --Onse (discuss • contribs) 14:55, 30 January 2014 (UTC)
 * You write about MIME and do not explain it. --Onse (discuss • contribs) 14:55, 30 January 2014 (UTC)
 * "protect from double visiting one document" this should be more clear. The reader might get confused, why one would not want to visit a document twice as you talked about keeping the information about webpages up to date. --Onse (discuss • contribs) 14:55, 30 January 2014 (UTC)
 * "[URL normalisation] is the process of modifying URLs to some standard look and clearing ambiguous parameters" – I do not know, what parameters are ambiguous and what parameters you refer to. You might want to write that URLs can (and should) be shorted by removing any parameters. This can be done by removing anything in a URL trailing a  or  . --Onse (discuss • contribs) 14:55, 30 January 2014 (UTC)
 * "Search (or document) index parses and stores data [...]" – I would say that this is done by an indexer as an index is just a data structure. --Onse (discuss • contribs) 14:55, 30 January 2014 (UTC)
 * You might want to define what a "word" is. Do you mean a word from a programmer's or a linguist's perspective? --Onse (discuss • contribs) 14:55, 30 January 2014 (UTC)
 * "eliminating the need to low scan" – what is a low scan? --Onse (discuss • contribs) 14:55, 30 January 2014 (UTC)
 * "Existence of the index increase the amount of necessary storage and consumption of computing resources (to fill in and update the index each time when new version of documents arrive), but it is a reasonable fee for the possibility of rapid search." – this sentence is too long, complex and difficult to understand. Split it up into several sentences. Also, I think "fee" is not the word you are looking for but maybe "cost". --Onse (discuss • contribs) 14:55, 30 January 2014 (UTC)
 * "But before creating the structure of the index it is very useful to look at a search process again" this sounds as if the crawler would first have to look at the search process. Either restructure this part or make clear that you will explain some parts of the search process to the reader so that she can understand the crawlers actions more easily. --Onse (discuss • contribs) 14:55, 30 January 2014 (UTC)

Ranking in search

 * "Those algorithms allow search engines to return back to a user intelligent and accurate results" – do you mean personalised results or are you just thinking of "better" results? Also, in the previous sentence, you are talking about "much more complicated algorithms". I instead wrote "more complex algorithms" and you might want to add one sentence about how they are more complex. For this, you can e.g. mention that Google Inc. uses over 50 different factors to rank their results (I do not have the source at hand right now). --Onse (discuss • contribs) 14:03, 4 February 2014 (UTC)

tf-idf

 * I do not think that abbreviations make a good heading. A reader skimming your text or looking at the table of contents, might not understand what the topic is about. Instead, I would use the full word and introduce the abbreviation in the text. Again, as this is a new term, you might want to use italics. --Onse (discuss • contribs) 14:47, 4 February 2014 (UTC)
 * Using abbreviations as the first word in a sentence is tricky and can be hard to read when the first letter is not a capital one. Hence, I would refrain from starting sentences with "tf-idf". --Onse (discuss • contribs) 14:47, 4 February 2014 (UTC)
 * You talk about calculating the frequency of a "word" in documents and the formula only contains the calculation for a "term". You should decide on one of these two words. Additionally, you might want to add another sentence explaining what a term/word is. --Onse (discuss • contribs) 14:47, 4 February 2014 (UTC)
 * This part contains no sources, making me believe that you came up with this formula yourself. --Onse (discuss • contribs) 14:47, 4 February 2014 (UTC)
 * You use document frequency $$\text{df}(t,D)$$ without defining or explaining it. --Onse (discuss • contribs) 14:47, 4 February 2014 (UTC)
 * I have my doubts if $$|D|$$ should be $$1$$, because normally you would have a collection of many documents where $$t\in D$$. But maybe, it is just a misunderstanding. --Onse (discuss • contribs) 14:47, 4 February 2014 (UTC)
 * You multiply $$(-1)\cdot\textrm{tf}\cdot\textrm{idf}$$ to obtain tf-idf. The multiplication with $$-1$$ is not intuitive and should be explained. --Onse (discuss • contribs) 13:42, 7 March 2014 (UTC)
 * "[other tf-idf variations usually] differ in coefficients and implementation of logarithmic scales" – maybe you want to explain why this formula might not be sufficient for all use cases. For what purpose are changes made in other implementations and how do these changes affect the introduced calculation of tf-idf? --Onse (discuss • contribs) 13:42, 7 March 2014 (UTC)

Motivation

 * You start off with a definition and a motivation afterwards. Normally, the motivation should be the first section because it introduces (motivates) a topic. Also, you explain what tf is the abbreviation for, although you had done this in the previous section. This is another indicator for that you want to switch those two parts. --Onse (discuss • contribs) 13:42, 7 March 2014 (UTC)
 * You introduce the term "keyword summarisation" which I have never heard. Additionally, I cannot quickly find another source that uses this term. --Onse (discuss • contribs) 13:42, 7 March 2014 (UTC)
 * Stay consistent in the formatting! Sometimes, tf-idf is written without styling, other times it is written in math-font. --Onse (discuss • contribs) 13:42, 7 March 2014 (UTC)
 * Normally, I would expect that the motivation starts off with the problem and then introduces the tools to solve them. The way it is currently written, it resembles more a general description of tf-idf. --Onse (discuss • contribs) 13:42, 7 March 2014 (UTC)
 * You do not clearly define a keyword and a word and use it simultaneously. It might be better to understand when sticking to one term. Even better, when describing *term*-frequency, you might want to talk about terms. --Onse (discuss • contribs) 13:42, 7 March 2014 (UTC)
 * You explain that tf-idf can be calculated over a document and will then return a description of "a particular paragraph (or sentence, page)". Either talk about a document again or clarify that tf-idf can be applied on any String (say a sentence), which contains words. Also, you describe tf-idf would return values which can semantically interpreted as a description of the content. Although this can be the case, I would want to abstract from this. --Onse (discuss • contribs) 13:42, 7 March 2014 (UTC)
 * You talk about the "importance" of a word. When is a word important? You might mean the relevancy of a keyword in a document. Or you mean the descriptive insight it offers about the content it is calculated from. --Onse (discuss • contribs) 13:42, 7 March 2014 (UTC)
 * "[tf-idf] also finds how often this word appears in other documents". You mean the right thing but it is irritating, because one only calculates tf-idf over one document. Maybe you can make more clear, how idf is a calculation over several documents whereas tf-idf is a relation of this with a concrete document. --Onse (discuss • contribs) 13:42, 7 March 2014 (UTC)

Example

 * As an example you use a quote. For scientific purposes, this is absolutely legitimate. But still, you will have to name the author. Either, add the author to the quote or use an own sentence. --Onse (discuss • contribs) 17:52, 7 March 2014 (UTC)
 * I mentioned it formerly but the example makes it very clear: explicitly define a term (or word)! At this point, I cannot find out, why "the" is a term as well as "Web Science". I would separate terms at spaces or symbols. Hence, "Web Science" would be two terms. Then again, "Web" also appears as a separate term. This is irritating. --Onse (discuss • contribs) 17:52, 7 March 2014 (UTC)
 * You should explain, from where you obtained the number of documents $$D$$. --Onse (discuss • contribs) 17:52, 7 March 2014 (UTC)

Cosine similarity

 * "To do that cosine similarity between 2 vectors is used" by whom? By all search engines? As a general technique? Do search engines just built on the cosine similarity or is this all the magic behind it? Please give a source (I guess, it is Arasu et al.) for your approach. --Onse (discuss • contribs) 21:07, 9 March 2014 (UTC)
 * You should also note, that this is a limited interest-driven approach, which can naturally be extended or refined. For example, a query which uses an advanced syntax like the earlier mentioned binary operators cannot be handled this way. --Onse (discuss • contribs) 21:07, 9 March 2014 (UTC)
 * "In real calculations value n can be of several thousands" wouldn't $$n=|V|$$ (with V as vocabulary, say all known/indexed terms) and therefore way higher? Maybe, you also want to note if this can still be computed (e.g. sparse matrix representation) or whether a different approach is used. In any case, note why $$n$$ can be that much higher. --Onse (discuss • contribs) 21:07, 9 March 2014 (UTC)
 * "cosine similarity is still not enough for receiving appropriate search results" – I would prefer, if you would explain here why. What is it lacking or doing wrong? --Onse (discuss • contribs) 21:07, 9 March 2014 (UTC)
 * You end this part looking back on tf-idf. This is confusing as its topic was cosine similarity. Separate these two properly or open up the title with something like "Combining tf-idf with cosine similarity", "Finding relevant documents" or "Detecting documents matching the user's search query". --Onse (discuss • contribs) 20:54, 10 March 2014 (UTC)
 * I have never heard of BM25. If you mention it, at least reference it, so I can quickly look it up. Also, is there a reason, why you chose BM25 as an example? Is it frequently used? --Onse (discuss • contribs) 20:54, 10 March 2014 (UTC)
 * This section consists mainly of a formula and redundant descriptive text. You should probably extend it a bit or not make it a separate section. Similar goes for the tf-idf-section. --Onse (discuss • contribs) 20:54, 10 March 2014 (UTC)

PageRank

 * I realised that reformulating to remove grammar mistakes or make things easier to understand takes too much time for me. Therefore, please respect that I stopped doing this from this section onwards. Nevertheless, the formerly edited sections should give you an idea on how to continue. --Onse (discuss • contribs) 21:24, 11 March 2014 (UTC)
 * "special quality ranking value" – what makes PageRank "special"? If you do not explain it, such words do not add any benefit to the text for the reader. As a result, they should be left out. --Onse (discuss • contribs) 21:24, 11 March 2014 (UTC)
 * You talk about a "link graph structure" without introducing or explaining it. --Onse (discuss • contribs) 21:24, 11 March 2014 (UTC)
 * This section like others before does not give any reference. You could at least refer to Brin and Page
 * It is debatable whether a metaphor like "scientific world" should be used. Personally, I would prefer a reformulation. --Onse (discuss • contribs) 21:24, 11 March 2014 (UTC)
 * This section again uses several unnecessary dashes and parentheses. --Onse (discuss • contribs) 21:24, 11 March 2014 (UTC)
 * You try explaining PageRank by giving three different real world examples. One analogy might help but in this case, you do not focus enough on a real explanation. Your examples could be generalised by introducing the Pareto principle. --Onse (discuss • contribs) 21:24, 11 March 2014 (UTC)
 * Also, this part is quite vague describing how things "should" be. Hence, this is a good example why you should back up your assumptions with scientific data. --Onse (discuss • contribs) 21:24, 11 March 2014 (UTC)
 * "there is no sense to link to a page which serves no interesting information" what about spamdexing? There is indeed reason to link to pages without interesting information. --Onse (discuss • contribs) 21:24, 11 March 2014 (UTC)
 * "another important factor, which is the weight of each page" the "weight" of a page lacks a real definition. At this point, I have no clue, if e.g. https://duckduckgo.com/ can be considered heavy or light. --Onse (discuss • contribs) 21:24, 11 March 2014 (UTC)
 * You define what a back-link is in parentheses as a an embedded sentence. This should not be done with such a fundamental term. Instead, you should introduce this term earlier together with the explanation of a link graph structure. --Onse (discuss • contribs) 21:24, 11 March 2014 (UTC)
 * Your figure 4 refers to "[m]athematical PageRanks". What other PageRanks are there? How does a mathematical PageRank differ from the general PageRank you introduce in this section? --Onse (discuss • contribs) 21:24, 11 March 2014 (UTC)
 * You use the word "hyperlink" twice in your document. Maybe just stick to "link" or always be that precise. --Onse (discuss • contribs) 21:24, 11 March 2014 (UTC)
 * "If a lot of other web pages start linking to page j, hence, inside the Web there is a belief that page j has a high-level of importance" I know what you want to say but you really have to reformulate this sentence. Also, this sentence is not true, if you think of spamdexing. This holds true especially when thinking of hacked web pages being manipulated to link to j. --Onse (discuss • contribs) 21:24, 11 March 2014 (UTC)
 * The part explaining linking between i, j and M has many too long and complex sentences (including parenthesis). --Onse (discuss • contribs) 21:24, 11 March 2014 (UTC)
 * I would refrain from saying that a page is "important" when pages with high PR link to them. The problem here is how one would define importance. Intuitively, one would say a governmental page is more important than a Web site with funny cat pictures. Yet, in several cases, the PR for Web sites only hosting funny cat pictures will be higher. --Onse (discuss • contribs) 21:24, 11 March 2014 (UTC)
 * Your formula for PR lacks the explanation why PR(A) is calculated by summing PR(B), PR(C) et cetera. Stick to your formerly introduced example to ease the reader's understanding. Then it will become clear that B, C and D link to A. --Onse (discuss • contribs) 21:24, 11 March 2014 (UTC)
 * Your formula does not initialise the pages with a basic PR-value like $$\frac{1}{|pages|}$$. Therefore, your PR should always equal zero. --Onse (discuss • contribs) 21:24, 11 March 2014 (UTC)
 * You specify this formula later on introducing a scaling factor. It could be noted in this section that this PR will have its problem with linking loops. Then, it could also be added that a later model will solve this problem. Again, the ordering of your sections seems a bit weird. If you would have first introduced the random surfer model, you could now build upon it. Instead, you explain the PR, take a step back to the random surfer model, just to finally get back to a more useful PR. Personally, I find this order and split up of PR hard to follow. --Onse (discuss • contribs) 21:24, 11 March 2014 (UTC)

Random surfer model

 * Remove the parentheses and add references. --Onse (discuss • contribs) 20:15, 12 March 2014 (UTC)
 * I don't feel the random surfer model to be properly introduced. You skip directly to an example. You should first explain in general that there is a user, who will randomly click on links placed on the Web page he is currently viewing. --Onse (discuss • contribs) 20:15, 12 March 2014 (UTC)
 * When introducing the transition matrix, you have to explain what $$a_{ij}$$ is. --Onse (discuss • contribs) 20:15, 12 March 2014 (UTC)
 * After giving the example, you jump to calculating PageRank. Why? Why introduce the random surfer model at all, especially after PageRank? What I am trying to say is that there is a motivation missing. The current transition just does not feel natural. --Onse (discuss • contribs) 20:15, 12 March 2014 (UTC)
 * "As it is known" – As mentioned in the general remarks, please refrain from using such phrases. Additionally, claiming that something is generally known is an indicator for the lack of a reference or proof. --Onse (discuss • contribs) 20:15, 12 March 2014 (UTC)
 * I do not think that the big left bracket serves any purpose in the set of equations. It can therefore be removed. --Onse (discuss • contribs) 20:15, 12 March 2014 (UTC)
 * I like the detail in which you explained the calculation of eigenvectors and eigenvalue. It is easy to follow and understand. But you should explain how you obtain $$c=\frac{1}{18}=\sum_{i=1}^{n}v_i \ni dim(v)=n$$ --Onse (discuss • contribs) 20:15, 12 March 2014 (UTC)

Power method

 * Your heading is "power method" but you do not mention or explain it in this section. --Onse (discuss • contribs) 23:44, 12 March 2014 (UTC)
 * You do not explain first why you suddenly start calculating $$A^{k}u$$. This can be confusing, although your calculations show your goal. You may elaborate on this further as this section is very short anyways. --Onse (discuss • contribs) 23:44, 12 March 2014 (UTC)
 * I have never seen the use of an "equilibrium value" except for in chemical contexts. Are you sure that this term is correct? Maybe explain it too. --Onse (discuss • contribs) 23:44, 12 March 2014 (UTC)
 * "So, the same PageRank vector can be computed in another alternative way" this is somewhat unsatisfying as a motivation. Combine it explicitly with the result of this section. Say that we will want to use this alternative calculation to approximate the PR in a faster and less resource-consuming manner. This means, I would write that this is a better or at least more practical approach in the introduction. --Onse (discuss • contribs) 23:44, 12 March 2014 (UTC)
 * This section and all other subsections of PageRank are missing in the table of contents. Also, you should rethink your structuring, because 1.5.2.1 is normally seen as nested too heavily. --Onse (discuss • contribs) 23:47, 12 March 2014 (UTC)

Random walks modelling

 * You talk about random walks but don't explain them. You can't expect the reader to analyse Java code in hope that he will get a grasp of what random walks are. --Onse (discuss • contribs) 21:28, 13 March 2014 (UTC)
 * You could explain how you obtained your values for [A,E]. Simply say that these are the results of a former run of the shown code. --Onse (discuss • contribs) 21:28, 13 March 2014 (UTC)
 * You must credit René for the code as yours is heavily based on his (ID 1123777 – Revision as of 14:32, 10 December 2013) --Onse (discuss • contribs) 21:28, 13 March 2014 (UTC)
 * This part is way too short to be a section on its own. You should merge several, especially as they all deal with a similar topic. --Onse (discuss • contribs) 21:28, 13 March 2014 (UTC)

PageRank and the real Web

 * The newly introduced randomness are often called "jumps" . You might want to mention that. --Onse (discuss • contribs) 21:28, 13 March 2014 (UTC)
 * You use $$n$$ without defining it e.g. as $$n=|\mathrm{pages}|$$. --Onse (discuss • contribs) 21:28, 13 March 2014 (UTC)
 * Normally, you define a probability $$s$$ to calculate the PR of the current page and a probability $$s-1$$ to perform a random jump . Also, what value for $$s$$ is typically used? Something around 0.8 and 0.9? --Onse (discuss • contribs) 21:28, 13 March 2014 (UTC)

Relevance and Trust

 * This section is empty and starts directly with a subsection. You should introduce and motivate the subsections first. This is a special case though because – as mentioned earlier – this actually deals with my part. Hence, it should be removed from your part and I will not correct it. Instead, I will see whether I can incorporate some of your ideas in my text or if our texts mainly overlap. --Onse (discuss • contribs) 21:28, 13 March 2014 (UTC)

Conclusion

 * This is less a conclusion than a summary or a tl;dr. You don't summarise a solution produced in this chapter to a formerly described problem. I would rename this section accordingly --Onse (discuss • contribs) 21:28, 13 March 2014 (UTC)
 * You seem to explain most steps you took throughout this chapter. These can roughly be explained in the introduction. The final section should give a less text-focussed summation (and maybe an outlook). It therefore is not of interest, where there were examples within this chapter. Only the results and their relevance matter. --Onse (discuss • contribs) 21:28, 13 March 2014 (UTC)
 * Like formerly mentioned, the summarising words about my part should be removed. --Onse (discuss • contribs) 21:28, 13 March 2014 (UTC)
 * The outlook is one long sentence. Split it up. Also adapt the order of these sections. --Onse (discuss • contribs) 21:28, 13 March 2014 (UTC)

Quiz questions

 * Add additional information to help a reader understand why some answers are correct and others are not. To do this, add a new line starting with "|| " after any answer that should show additional information. --Onse (discuss • contribs) 21:28, 13 March 2014 (UTC)
 * "We use idf to calculate the frequency of the word in a particular document." is not a proper answer to the question whether idf is really needed. Although it is wrong, it is still an unrelated statement. --Onse (discuss • contribs) 21:35, 13 March 2014 (UTC)