User:Renepick/test

unit 1
{Which two reasons make it difficult to answer the question: "What is the size of the world wide web?"} + It is not clear what size means. + It is underspecified what we understand as the world wide web - the dark web makes it hard to index all web pages - Once the size of something exceeds a peta byte it cannot be measured precisely
 * even though the fact is true. this is not a correct answer with respect to the video

{When Looking at this picture of the web. What modelling choices could have been made?} + web pages are nodes of a graph and edges are links + domains are nodes of a graph and edges exists for every link between domains + all of the above - none of the above

{Which of the following viewpoints are commonly used when modelling the world wide web} + software system - a diagram + graph of connected web pages - a mathematical function - a spider web + collection of text documents

{how could you measure the size of a distributed software system} + count how many computers have the software installed - gather information about the computing time being used - you cannot do so + the linces of code of each installed software component is a good indicator

unit 2
{Which are the fundamental types of Models the web can be seen as?} - Descriptive Model - Predictive Model + Software system + Graph Model - Generative Model + Collection of Text documents

{Which of the following are Pros of looking at the Web as a collection of Text documents} - Methods from the field of Software engineering can be applied + Amount of information on the web can be quantified via entropy + Methods from Natural language processing and information retrieval can be applied - Structural information can easily be analysed

{Which of the following are Cons of looking at the Web as a graph of connected Web Pages} + Large amount of data to Model + Ignores crucial information - Trust of web pages cannot be determined - Hard to have a good measure for the size of the Web

{Which of the following statements about modelling the Web are true?} + Generative Models are created to understand why certain properties arise on the Descriptive Models of the Web. - Descriptive Models are created to understand why certain properties arise on the Generative Models of the Web. + Descriptive Models help understanding what properties the World Wide Web has. - When studying the Web as a graph one must use Generative Models. - When studying the Web as a collection of Text documents one must use Descriptive Models. + Descriptive and Generative Models can be used to model collections of text documents as well as graphs of web pages.

unit 3
{Of How many words consists the following sentence: "John F. Kennedy visited New York."} +3 +4 +5 +6 +The correct answer depends on the modelling choice that are not further specified here +all of the above are possible

{Assuming sentences end with punctuation signs and everything between two successive whitespaces is considered a word. How many sentences and words can be counted in the following sequence: "John F. Kennedy visited New York"} - 1 Sentence with 6 words. - 1 Sentence with 5 words. + 2 Sentences with two words in the first sentence and 4 words in the second one - 2 Sentences with two words in the first sentence and 3 words in the second one
 * correct! under the given assumptions The first sentence consits of {John, F} and the second sentence consists of {Kennedy, visited, New, York}

{You want to measure the size of the Simple English Wikipedia by counting words. Which of the following are strong assumptions creating an impact on the result?} + Wikipedia is identical to the crawling of it + Words are separated by White space + all pages are reached by the crawler - The size should be measured in Byte
 * it will most certainly have changed after crawling
 * There could be pages that are not interlinked and are invisible for a crawler!
 * Though one could argue about this statement, this is not an assumption changing the result of the modle.

unit 4
{Why is formulating a hypothesis so crucial in the process of scientific modelling?} + Formulating a hypothesis clears the path towards a clear defined model + Often simplifying assumptions are knit into the hypothesis and the afterwards built model

{Every Minute 0.19305 words are generated on the simple english wikipedia} - true + False
 * on average this number is true but this does not mean that every minute 0.19305 words are produced. Read more carefully next time!

{Why should one have several runs of a generative probabilistic model?} - in the first run the caches need to warm up + to get statistic stability + because random experiments can produce strong outliers in just one run - there is no cost of making sure the computer did correct calculations by running the experiment twice or more + because every scientific experiment should be repeated more than once to avoid mistakes