WikiJournal Preprints/When the Wikimedia movement challenges how to do science

Voir cette article en français

About the influence of the field and the institutional framework
Alain Testart said: "The method, as a means, can only be subordinated to a finality: the study of a scientific object. The object justifies the method. It is therefore with it that we must begin when we ask ourselves: how can we define social anthropology?" Jean-Paul Colleyn in his turn, affirmed that "There are as many anthropologies today as there are objects of study (anthropology of art, music, religion, health or perception)". Michael Singleton then concluded by saying that "Anthropology doesn't exist, [...] what really exists are anthropologists".

Following on from these assertions, and according to my own experience, I would like to say that beyond the objects of study and the personality of the researchers, there are ultimately as many possible anthropologies as there are research fields and institutional environments. Indeed, throughout my participant observation within the Wikimedia movement and my integration within the laboratoire d'anthropologie prospective of the University of Louvain, I have gradually established what Pierre-Joseph Laurent will no doubt call a double "informed familiarity". This double adaptation will result in a change in my way of looking at science. This change is illustrated here by different postures, the first of which will respond to the need to situate my work among the many coexisting and often partisan scientific disciplines grouped within the field of human and social sciences.

Opting for a science away from corporatism
In April 2011, I had the idea of writing my Master thesis entitled: Culture FR Wikipédia, Monographie ethnographique de la communauté des contributeurs actifs sur l'espace francophone de Wikipédia within Wikipedia itself. I thus wished to kill two birds with one stone by writing my ethnography within my field of observation participating in a kind of recursive process. Unfortunately, it turned out that this was not possible because of the first of the five "founding principles" of the encyclopedia project, which states that: "Wikipedia is an encyclopedia". A trivial assertion at first glance, but one that in the end allows us to agree on everything "that Wikipedia is not". Reading this content, I learned at my expense that: "Personal essays and unpublished works have no place on Wikipedia."

I was then redirected to another project called Wikiversity, that I didn't know at the time, although it was part of more than a dozen other collaborative projects called "Wikipedia sister projects" (see figure 1 opposite). So I went to the home page of the Wikiversity site and discovered with great interest that this project was a place dedicated to "sharing educational content and writing research papers".

After announcing my arrival in the project by a message posted on the page of a sort of general forum called "la salle café", I then looked for the place where I could situate my work. In the course of this search, the user "Crochet.david", a teacher in electrical engineering and an administrator on Wikiversity project, who had already responded sympathetically to my arrival message, proposed to me on his discussion page that I place my work among the "research works in sociology". I will remain very surprised until I discovered the organization chart of the Wikiversity project in which anthropology appeared as departments of the sociology faculty.

This situation seemed extremely complicated to me, because not only did I have to ask my promoter's agreement to write my thesis online and in real time on a website, but in addition, I now had to tell him that this thesis, produce in the framework of a master's degree in anthropology, would be published in a sociology faculty. Knowing the very clear split within my university between sociologists and anthropologists, I felt somewhat helpless in the face of this situation.

I then tried to place my work at the level of the anthropology department of Wikiversity without mentioning the sociology faculty. But then David Crochet, by his real name, came back to me and told me that "projects are associated with faculties, not departments." A debate then began which was transferred to the salle café so that it could be accessible to other members of the community. At the end of the discussions, we finally came to the conclusion that I had to initiate a "prise de décision" to rename the faculty of sociology.

At the time of this decision, JackPotte, a computer engineer and other site administrator, had filed a message to keep us informed of the Universal Decimal Classification. In this version of the Universal Decimal Classification (CDU), the term anthropology appeared several times, once in the field of social sciences (cultural anthropology) and another time in the field of biology (physical anthropology). Such information encouraged me all the more to rename the faculty of sociology into the faculty of socio-anthropology so that I could, with a single word and in an explicit manner, bring together sociology and anthropology within the same faculty, while excluding physical anthropology from it.

The unanimous acceptance of my proposal was for me a double satisfaction. On the one hand, I was able to present my dissertation project in good conditions, and on the other hand, I was able to launch and participate for the first time in a decision making process within the Wikimedia movement. However, this experience raised a certain amount of questions for me. Indeed, how could a separation between sociology and anthropology have come about and how has it persisted until today?

Coincidentally or almost, I found the answer to this question in a journal entitled "Socio-anthropologe", founded in 1997 by Pierre Bouvier, with the ambition of addressing "the destructuring and recomposition that are at the heart of the contemporary world". In the first issue of this review, Yves Grafmeyer is quoted recalling that at one time "anthropology, the science of man, was mainly devoted to the study of primitive peoples". And one discovers later in the text the idea that "anthropology is responsible for the study of societies without writing, in which exotic cultures are revealed, while sociology has the rule to study societies that are advanced in urbanization and industrialization."

This answered my question about the origin of the cleavage between anthropology and sociology. But this is only an explanation of the origins, because today, the expression "primitive peoples" has disappeared and the notion of exoticism has lost all meaning since an anthropology laboratory located in Belgium can bring together researchers from the four corners of the world. As for the so-called "advanced" societies in urbanization and industrialization, they have long since gone beyond the borders of the West.

On the other hand, since the end of the twentieth century, anthropology has become increasingly interested in the Western and contemporary world. Among the first works attesting to this change are, for example, the works of participating observations carried out in the world of work by Pierre Bouvier already mentioned above. Along with Marc Augé, he was also one of the first French-speaking anthropologists to speak of a "Socio-anthropology of the Contemporary". Mobilising nowadays the question of exoticism and an alleged stage of advancement of societies to dissociate anthropology from sociology therefore no longer makes sense.

There remains the possibility of distinguishing between sociology and anthropology by their methods. But, here too, things are debatable. For following the arrival of the interactionist current within the Chicago school, anthropological practices such as ethnography and participant observation were adopted by sociology. Harold Garfinkel, professor of sociology at Harvard, did not hesitate in 1967 to use the expression "ethnomethodology" to situate his method of work. Such paradigm shifts will bring to light "the conflicts of methods in sociology" whose existence will render the argument of method to specifically distinguish anthropology from sociology null and void.

In truth, we are entitled to ask ourselves today which sociologists are still forbidden to practice ethnography, case study, or other inductive approaches? And conversely, what anthropologists today could still say that the quantitative analysis of field data and the formulation of initial questions should be proscribed from any anthropological approach?

At the end of this line of reasoning, I am therefore tempted to believe that what separates anthropology from sociology today is nothing other than the maintenance of a certain "corporatism" present within our universities. From this will certainly be born this reputation of "basket of crabs" attributed to the academic milieu by the political one. This is a very sad observation, since any sectarian attitude will always be detrimental to open-mindedness and exchange between researchers, and therefore ultimately to the progress and development of knowledge and science in general.

Finally, let us remember the concept of "complétude étude" introduced in the foreword to this work and directly inspired by the work of Ken Wilber remarkably popularized in his work entitled : "A Brief History of Everything". Doesn't it make sense to think about breaking down the barriers between anthropology and sociology? Doesn't it also invite us to break down any other barriers that would separate scholars from all disciplines in order to bring them together around the same universal cause? That of a complete four-dimensional study (cultural, social, psychologic and intentional) of any scientific object?

Luckily and as Rémi Bachelet, lecturer at the École Centrale de Lille and contributor to the project since September 2009, said on Wikiversity, "we are far from the wars of disciplines!" This is no doubt why I felt free to conceive the concept of "complétude étude" and to integrate quantitative and statistical data from the field in my study which was initially intended to be purely ethnographic and initially not interested by such a overabundance of data.

Integrating statistical and textual Big Data analysis into an ethnographic study
Following this first field experience, there will come another questioning, this time on how to integrate within an ethnographic work typically considered as a qualitative research, a multitude of quantitative or statistical data and discussion texts freely accessible in my field of study.

To clarify matters, it is perhaps good to remember that quantitative data, as opposed to qualitative data, is characterized by something measurable. As a trivial example, we have this quote from Rosie Stephenson-Goodknight about the editors of Wikipedia: "You can imagine probably 90 percent being men", "90 percent" information will be quantitative while "human" information will be qualitative. However, it should also be kept in mind that quantitative data can become the source of qualitative data and vice versa. The 29 notches on the Lebombo bone, the oldest tally stick known to date, is a very good example of this. These marks attest indeed on the one hand that the first human scriptural manifestations were quantitative, but they also allow us to assume on the other hand, with reference to their number (quantitative data), that they were made by an African woman (qualitative data) in reference to her menstrual cycle.

It is therefore important to stress here that a so-called qualitative study could not afford to ignore, or even neglect, quantitative data when it comes to the field. And, as mentioned earlier, the online space of the Wikimedia movement happens to be overflowing with an unfathomable amount of quantitative data, sometimes in raw form, sometimes in the form of statistical tables and illustrations that are freely accessible and usable.

To understand this situation, it is necessary to know that the vast majority of websites containing Wikimedia projects are managed using a software called MediaWiki, and that this software instantly and automatically records all the actions made by the contributors, as soon as the site is created. All this data is then archived and made accessible, with a few exceptions, to any user via a chronological and configurable classification of hyperlinks listed in public logs or contribution history pages (see figures 2 and 3 below).

In addition to their archiving and access facilities, all this information is published under a creative commons CC.BY.SA license. According to the terms of this license, the data contained on these pages are therefore free of exploitation and republication, as is, or in derived works. Only two conditions restrict this freedom: first, reusers agree to "give appropriate credit, provide a link to the license, and indicate if changes were made", and second, that reusers who create derived works agree to "distribute [their] contributions under the same license as the original".

This creative commons license represents a real godsend for researchers and especially for statisticians, as can be proved by the existence of a multitude of websites presenting analyses sometimes in real time from data collected on Wikimedia sites via an application programming interface (API). These statistical analyses are published under a CC.BY.SA licence and are therefore available to researchers under the same conditions as those mentioned above.

Beyond this profusion of quantitative and statistical data, the Wikimedia movement is also a producer of an unfathomable amount of textual information likely to constitute text corpuses of considerable size. All this information is available on numerous discussion places scattered throughout the projects, email mailing lists and more recently on the community space "Wikimedia space active from June 25, 2019 to February 18, 2020 as a discussion space and maintained thereafter as a simple blog space.

This overabundance of textual information is not specific to the Wikimedia environment, and would seem rather related to the digital context. Olivier Servais, ethnographer within the World of Warcraft virtual universe, testifies to this when he asks questions similar to mine: "How then can this massive data management be reconciled with this qualitative ambition? How to make qualitative textual big data in this digital context?"

Faced with this question, I, like my promoter, probably found myself between two extremes: either to ignore an exhaustive treatment of statistical and textual data at the risk of offering a partial and potentially false vision of reality, or to embark on a computerized treatment of quantitative data and linguistic corpuses at the risk this time of lacking competence, investigation time and computer power.

Through practice, and after several years of trial and error, I finally came to establish a kind of recursive process of going back and forth between these two extremes. On the one hand, I worked at times on computer and statistical processing of field data, while on the other hand and at other times, I pursued a more classic ethnographic work of participant observation during which the classic semi-directive interview gave way to informal discussion within the digital space of Wikimedia.

While the computer processing provided me with information useful to the accomplishment of my ethnographic work, the information provided by my participant observation and my discussions allowed me in turn to orient my choices in the computer processing of other data, and so on. In the end, this return trip was a great way to relieve the burden of research work in a digital space that could prove to be very stressful if we are not careful to make up for the lack of variation in intellectual activity in terms of physical activity.

As an example of the treatment of quantitative data, here is a statistical analysis based on the financial reports published on the Wikimedia Foundation's website. This analysis will have led to the production of a very telling histogram (figure 4 below) concerning the expenses of the foundation which will have allowed me on June 26, 2018 to update the article of the French-speaking Wikipedia project dedicated to the Wikimedia foundation. On the basis of a source dating from 2009, one could indeed read the following obsolete information: "Nearly half of the financial resources [of the foundation] are used to buy new servers and pay for hosting" However, this information was already incorrect in 2009, when one could already guess that an increasing part of the foundation's budget would be allocated to paying the salaries of its employees. The cost of hosting editorial projects, on the other hand, will remain relatively and counter-intuitively stable as of 2012.

It is therefore clear here that accounting and statistical work, however daunting it may seem to a researcher accustomed to qualitative studies, was necessary to rectify the information provided by a simple work of ethnographic observation. It would in fact have been possible to rely, for example, on the erroneous and probably recovered content of the Wikipedia article contained in a WikiMOOC video of 2017 in which one could hear: "Where do the funds of the Wikimedia foundation come from? Because providing the technical infrastructure, the servers for the fifth most visited website in the world, is not free."

Let us now take another example, this time concerning the processing of textual data. In this example, we will exploit one of the 300 mailing lists distributed by projects and linguistic spheres within the Wikimedia movement. All these email exchanges are in fact archived month by month, historicized and made freely available under a CC.BY.SA license on a site hosted by the Wikimedia foundation. From the archives of the mailing list entitled "Wikimedia-l", which is reputed to be a discussion space for the wider Wikimedia community, it is possible to quickly build up textual corpora and submit them to automatic natural language processing.

The software chosen was TXM, a computer program developed by two French universities. This program allowed me, for example, to discover at the beginning of a simple lexical query, and in reference to the word "the" appearing at a frequency of 1,869,554 times, that the sign "@" appeared in the corpus 879,105 times, immediately followed by the word "gmail" appearing 877,346 times. A simple query from which we can therefore conclude that the vast majority of users of this mailing list communicate from a Google account.

We will then see that the first names appearing in the list will be "Gerard" (27 888), followed by "Erik" (21 924) and David (20 624). An analysis of the occurrences in the text will then show that the first names "Gerard" are associated with the person of "Gerard Meijssen" (11,096) which is the subject of an article on Wikidata but also of "David Gerard". " (12,717) whose detailed user page can be found on Wikipedia and that the first name "Erik" is mainly associated with the person of "Erik Moeller" (8,616) presented in a Wikipedia article

Thanks to this new exercise, we can finally notice, on the one hand, that there is a great correlation between participation in the mailing list and the possession of a gmail account, but also on the other hand, that it becomes possible to identify very active people within the list and even to know their email address. This information will obviously bring a useful help to ethnographic work since it will allow to know and to contact privileged interlocutors likely to narrate in a global and historical way what happens in this place of discussions.

In more advanced analyses and functions, TXM will also make it possible to display graphical illustrations, for example to visualize the evolution of the frequency of a word within the conversations. The example here will be taken from the word "harassment" (harassment in French), which is seen to evolve according to its number of appearances on the mailing list (see figure 5 below).

This graph will have allowed me to see that the issue of harassment encountered in my ethnographic observations is not an epiphenomenon to the community of editors of the English Wikipedia project and that it appeared relatively early and in successive waves in the history of the Wikimedia movement [N 9]. In return for this analysis, it therefore seems useful to discuss the phenomenon again during field conversations and to maintain a certain vigilance regarding its observation. In order to save time and to avoid overburdening field actors, a return to the textual corpus and TXM will also allow a full-text analysis. In figure 2.6 below, we can see how, starting from the content of the Wikimedia-l mailing list, TXM's concordance search tool allows us to display the list of text extracts containing the word harassment by centering them on it.

Since the subject of harassment has become a central theme within the Wikimedia movement, it becomes interesting to make a case study of it in order to better illustrate the phenomenon. To do so, it would be interesting, for example, to start from one's own experience or to start from a testimony such as the well-documented one on one of the user pages of a contributor answering the user name "Idéalités".

After having reconsidered things in detail, it will then be possible to start again with an even more thorough textometric analysis, based this time on new corpora formed from the discussion spaces selected according to the appearance of the term "Ideality", this time with the idea of removing new information that could counterbalance the discourse of idealities and at the same time give oneself a quick and localized access to the comments that she will have exchanged with other contributors.

As we have thus understood, the method proposed here is based on a back and forth between different approaches previously presented through the concept of "completeness of study". And let us now see what posture to adopt in order to reference the information extracted from the digital archives, with the aim of making it searchable and verifiable by the readers. Thanks to this new methodological commitment, readers will be able to form their own opinion, from these sources, and possibly diverge from what they are told in this book.

Producing a "webography" for verifiability purposes
"Verifiability" in the Wikimedia universe can be seen as a particular declination of the empirical and theoretical refutability/falsifiability introduced by Karl Popper in his demarcation between science and non-science. While Karl Popper asks scientists to provide their peers with as much information as possible useful for corroborating a theory in order to determine its scientificity based on a refutability ratio, Wikipedians, for their part, establish a rule of verifiability according to which "information can only be mentioned if readers can verify it".

What seems indispensable to French-speaking Wikipedians, therefore, "is that all information likely to be contested, as well as all theories, opinions, claims or arguments, be attributed to an identifiable and verifiable source". In short, it follows that within this encyclopedic project, "any claim that is or may be disputed must be explicitly attributed to a publication of quality. An unverifiable assertion may be deleted. In the event of a dispute, it is up to the person who wishes to insert information to mention the source".

Thus, the common point between Karl Popper's proposal and the Wikipedian rule will be a certain search for rebuttability through delayed experimentation. At the level of difference, Popper's method will thus concern theories, whereas the Wikipedia rule concerns information. From the Wikipedian rule of verifiability will no longer derive the need to provide a maximum of information useful for corroboration but simply to cite its sources, on the understanding that "any content, questioned or likely to be questioned, must be supported by an annotation leading to one or more references that are based on reliable and clearly identified sources".

The position of Karl Popper was criticized by Jean-Claude Passeron, who said that such epistemological expectations are incompatible with "the empirical relevance of sociological statements [which] can only be defined in a situation where information on the world is collected through historical observation, never through experimentation." It is true that the reader of a scientific work in the social sciences will always be unable to relive at the same time, and therefore under identical circumstances, the experience or observation of a phenomenon described by an author. It is for this reason, moreover, that Jean-Claude Passeron introduced the term "historicity" in order to offer the sciences, historical by nature, a different regime of truth from the so-called natural sciences.

To this epistemological impasse of a temporal nature can be added another impasse of a spatial nature in the case of ethnographic work carried out in remote areas or areas that are difficult for the reader to access. In socio-anthropology, some authors speak of an "ethnographic pact" in which "only ethnologists feel free to explain how they have been able to draw from a unique experience a body of knowledge whose validity they ask everyone to accept."

In this regard, one of the most famous polemics concerns the writings of Carlos Castañeda. Translated into 17 languages and selling 8 million copies, Castañeda's 15 books are today considered a forgery-producing autobiographical work. In his books Castañeda describes a teaching received by a mysterious shaman by the name of Don Juan Matus, of which no one has ever been able to trace. Robert Marshall retraces in a few lines the history of this controversy:

"The books' status as serious anthropology went almost unchallenged for five years. Skepticism increased in 1972 after Joyce Carol Oates, in a letter to the New York Times, expressed bewilderment that a reviewer had accepted Castaneda's books as nonfiction. The next year, Time published a cover story revealing that Castaneda had lied extensively about his past. Over the next decade, several researchers, most prominently Richard de Mille, son of the legendary director, worked tirelessly to demonstrate that Castaneda's work was a hoax. "

Such an episode will therefore raise the question of where the boundary between ethnography and fiction lies. To this question Karl Popper will answer that one must experience the ethnographer's experience again by going back to the field, whereas Jean-Claude Passeron will answer that it is impossible to do so. Moreover, will the informants still be alive? Won't they have changed their minds or points of view? Who will also be the reader who will be able to go in search of Don Juan Matus the shaman of Castañeda? These are all the dead ends that will push the reader to adhere to the ethnographic pact, or, if necessary, to consider his reading as a potential work of fiction.

However, with the new informational framework brought about by the digital revolution, these impasses may gradually diminish and even disappear completely in the case of a study based solely on an observation of the Web. The latter case is all the more true in the context of this study carried out within a totally transparent and archived digital space as we have already partially described. Thanks to the MediaWiki software, which saves all the history of online activities in order to make it freely accessible to all, it becomes indeed possible to offer readers access to the information as it will have been discovered by the researcher. Moreover, in this very precise context, the constraint of historicity raised by Passeron disappears completely, since the archived information will by definition be frozen in time and will therefore not undergo any alteration between the moment it is collected and the moment it is rediscovered by the reader.

Concretely speaking, all that is needed is to provide hyperlinks or, more precisely, permalinks that will redirect readers to Internet pages that will remain in the state in which they were examined by the researcher. In the MediaWiki interface, these permalinks can be accessed via the "Permalink" item located in the left column on all project pages. More specifically, it is also possible to provide a link to a page that will display "version differences", also called "diffs" in Wikipedia jargon. These "diffs" pages, in which what has been removed is highlighted in a left-hand frame and what has been added appears in bold in a right-hand frame, are all accessible from the content page history and allow you to directly view the status of the content pages before and after a change (see figure 7). The main advantage of this method compared to permalinks is that the name of the author of the modification and the exact moment when it was made will be directly visible without any other manipulation.

On Wikipedia, producing hyperlinks pointing to "diffs" pages is a common procedure in the context of a protest to the community. In the context of a "contestation du statut d'administrateur", it is also clearly stated that "a challenge must be explained and supported by diffs or journal entries, otherwise it is not valid". These "diffs" or activity log pages thus allow everyone to validate or "refute" the accusations made against a site administrator. Typically, these pages will contain links pointing to comments or actions that are contrary to the rules and recommendations in force within the projects.

Once again, we see that the epistemic universe of Wikimedia has served as an inspiration on how to organize my ethnographic work. Concretely, here are the resolutions on the way I will cite the sources used in this work as soon as they come from the Web: Each time information from a Web page appears, it will be systematically followed by a call for notes in the form of a number in the form of an exposé preceded by the capital letter W. These calls for notes will thus allow readers to find the permalinks that will allow them to find the sources of all this information in the same state as my observations.

When the information comes from a MediaWiki page, two cases are possible. If the information comes from an organizational page, the reference will point to the permanent link of the page in its consulted version. If it is information about the statements or facts of a field actor, the reference will point to the "diff" page presenting the differences between the pre- and post-writing version or the user action log. Finally, if the page is not from a MediaWiki site, the reference will in this case point to an archived version of the page kept and viewable on the Internet Archive project site The outcome of such a process will be the establishment of a webography section that will find its place alongside the traditional bibliography section of any scientific work and will thus make it possible to easily distinguish within this work what will have been produced from primary sources from what will have been produced from secondary or tertiary sources.

Unfortunately, what it is possible to offer to any Internet reader at the level of primary webographic sources, will unfortunately be impossible at the level of secondary bibliographic sources. Indeed, for a long time now, these sources have been the object of a dramatic commodification making their access, including digital access, subject to payment and therefore limited. In the eyes of some, this situation is the result of an "oligopoly of publishers who derive maximum benefit from the fact that scientific laboratories and researchers are evaluated according to the journals or publishing houses where they publish their results".

In such a context, and following what we have already discussed at the beginning of this section, to the "ethical questions concerning scientific publication" are now added other epistemic questions, this time related to the possible refutability or verifiability of secondary sources cited in a work. They are supposed to exist of course, but without having access to them, so this is a new pact that readers are asked to sign. A pact which this time will bring together a set of three acceptances: a first on the existence of the sources, a second on the fact that they have been fully exploited without being misused, either by distorting what was said or by omitting the context in which they were initially exposed, and a third finally on the fact that the verifiability of these sources is subject to commodification and therefore inevitably to the exclusion of the most financially deprived people.

Is such a pact, and especially its third acceptance, not ultimately even more problematic than the ethnographic pact we were talking about earlier? Indeed, should we accept that the practice of science and the refutation of what it says is the prerogative of a frankly limited part of our human community? Is it not time at last to think of a science where all information would be unconditionally accessible to all and with respect for all?

Aspire to open and transparent science
As we shall see, issues of openness and transparency are not new in the field of scientific research. For a long time now, a movement has been created around the expression "Open Science" with the appearance in 1999 of the site openscience.org dedicated to the writing and dissemination of free and open source scientific software. The English expression will be translated into French as "open science", which should not be confused with the expression "Free Science" which designates the name of a magazine published under copyright.

The open science movement can be considered as the heir of the free software movement launched by Richard Stallman in the 1980s. By launching his operating system project called GNU on 27 September 1983 on the newsletter net.unix-wirards via Arpanet, Stallman used the Golden Rule to promote his project. He rephrased and contextualized it in these terms: "If I like a program, I must share it with others". An excerpt from the book "Richard Stallman et la révolution du logiciel libre" will help us discover the origins and challenges of the free software movement: "Stallman proposes to classify copyrighted works into three categories.

The first, functional, includes computer software, dictionaries, manuals.

The second category includes works that serve as testimonials - for example, scientific or historical documents. Their function could be undermined if both authors and readers were free to modify them at will. This category also includes works of self-expression - diaries, autobiographies, etc. - which are not intended to be used as evidence. - whose modification would amount to falsifying a person's memories or opinions, which Stallman considers unjustifiable from an ethical point of view.

Finally, the third category concerns works of art and entertainment. Stallman believes that the rights granted to the users of each work should be adapted to the type of work. Thus for the first category of functional works, users should be granted the unlimited right to make modified versions of them.

For the second and third categories, the user's rights should be modulated according to the author's wishes. However, Stallman insists that, regardless of the category of work, the freedom to copy and redistribute non-commercially should apply in full and at all times. If that means letting Internet users print out a hundred copies of an article, image, song or book and then e-mailing the copies to a hundred strangers, then so be it."

Now here is another excerpt from a book entitled this time: "Science ouverte, le défi de la transparence" which will allow us to grasp how the open science movement has appropriated Stalleman's legacy: "Well beyond open access, open science extends over a very broad field and takes into account, in an effort of renewal and modernization, all the issues of research and its consequences, such as the openness and management of research data, the openness and interoperability of software, the transparency of evaluations, the encouragement of citizen participation in research and the freedom of access to teaching subjects." On the basis of this last quote, we can already realize how much the Wikimedia movement intrinsically meets the expectations of open science. On the one hand, its project for the free sharing of human knowledge is based on the free software MediaWiki, which offers both openness and interoperability. On the other hand, MediaWiki's automated archiving system as already presented offers an unparalleled degree of transparency to the Wikimedia digital environment.

In concrete terms, this transparency is ensured through each historical page associated with each web page produced by the MediaWiki software. In figure 8 below, a screenshot of the history page of the Wikipedia article entitled "Meta-Wiki", a list of lines is displayed chronologically, with the respective lines listed below:


 * a link "cur" pointing to the content page as it currently appears;
 * a "prev" link pointing to a version difference page in which changes to the content appear in bold (added text) and highlighted (removed text);
 * the exact date and time of the change in the form of a link to the version of the page that was archived immediately after the change was made;
 * the username of the author of the modification followed in brackets by a "talk" link pointing to their discussion page and a "contribs" link pointing to a page listing chronologically all their modifications within the project. By default, the IP address of the Internet connection used by the editor will then be displayed as a link pointing to a page listing all the changes made by this address within the project. A "talk" link will then appear in brackets, pointing to a discussion page dedicated to exchanges with the user account holder or the user with a fixed IP address, or users in the case of a dynamic IP address;
 * in the case of a minor modification, the letter "m" in bold type;
 * the size of the page following the modification and the size of the modification expressed in bytes;
 * in brackets, a summary of any changes made by the author or the title of the section automatically provided by the system;
 * and finally, in parenthesis, a "undo" link allowing to save the version of the page prior to the modification and a "thank" link allowing to send a notification of thanks to the author.

At the top of the historical pages of the English Wikipedia project and after filter revisions, a set of links pointing to external statistical analysis tools will always appear, as shown in figure 8 below. In the order of their respective appearances, "external tools" will allow to :


 * find the author of a written passage produced on the page (and alternate);
 * see all the contributions of one author on the article
 * general statistics about the article concerning the rate of edits and text provided by authors
 * see a pageviews analys
 * connect a restricted website for managing the interface

These pages of history, will allow you to visualize the edition of a page and its evolution over time as shown in video 1 below.

All this shows how much transparency can be guaranteed in the publishing projects supported by the Wikimedia movement. The numerous features of the MediaWiki software that some might describe as a "fantasy of technology" t hus appear, in the very specific context of open science, as a solution to the "challenge of transparency". I also see it as a unique, free and open opportunity to write my scientific works in a space that is totally respectful of the claims made by the open science movement, and this without any effort.

It is even possible to take things even further by creating, for example, a study laboratory such as the Laboratory for the Study of the Wikimedia Movement, in which I invite everyone to get involved in the study of the Wikimedia movement. Such a space thus makes it possible to publicly share a whole set of resources discovered or produced during research work, and which cannot be published due to a lack of space at any level of an editorial standard.

Respecting the privacy of field actors
It is questionable whether such a level of transparency within Wikimedia projects might not pose a privacy concern. To answer this concern, simply click on the hyperlink entitled "terms of use" present at the bottom of each page of the Wikimedia projects. This link effectively leads to a page of general information where there is a new link, this time pointing to a page dedicated to the privacy policy adopted by the Wikimedia Foundation.

In Italy, as in many other countries in the world, the legal responsibility of the Wikimedia Foundation in relation to the editorial projects it supports is limited to its status as host and in no way to that of publisher. On the other hand, the Wikimedia Foundation and the Wikimedia movement by extension is very concerned by the protection of the privacy of the users of the projects it hosts as well as their personal data.

For example, on the French-speaking Wikipedia project, there is a page entitled "Wikipedia:droit de disparaître" which has long anticipated the appearance of the right to forget or more precisely the "right to erase" which appeared in 2016 in Article 17 of Regulation No 2016/679 published by the European Commission, also called the General Data Protection Regulation (GDRP). Somewhat unexpectedly, however, the arrival of this regulation was publicly condemned by the Wikimedia Foundation. Applied to the content of its editorial projects, the foundation indeed sees this regulation as an open door to the manipulation of information present on the net. As a result, a request for deletion of information related to a user account will be granted, while that of information contained in an article dealing with the same user will be refused.

In terms of privacy protection, several other options are available to users of the Wikimedia digital space in which, it should already be noted, it is not necessary to provide an e-mail address to open a user account. The first and most popular protection is to create a user account with a pseudonym so that changes and actions made are not attributed to one's own identity. The second option, more common among less active users, is to contribute to projects without logging in. In this case, instead of the user's pseudonym, the IP address of the Internet connection used by the user will appear.

However, this second option is less respectful of a user's privacy, because from a simple IP address, an Internet user can always know either the organization that uses it if this information is public, or the nearest city with a private Internet connection and the coordinates of the company that provided it when it was used. Using the IPv4 address: 130.104.34.155 for example, the site whatismyipaddress.com will indicate that it is used by the Catholic University of Louvain whereas using the address 176.164.50.155 on the site fr.geoipview.com will display a map on which the city of Blois in France will be designated.

More frequently used by mobile connections, IPv6 addresses are less easily geo-locatable. But whatever the situation, it will always be possible for a mandated person to contact the Internet Service Provider (ISP) of an IP address to find out the identity of the client who will have used it at a specific time and therefore, for example, at the time of registration of a modification made on a Wikimedia site. In France, but this may vary according to the legislation in force in the different states of the world, the information allowing the link between IP addresses and customers must be kept for at least one year. In the computer system of the Wikimedia movement, on the other hand, the IP addresses of user accounts that are only visible by persons mandated by the foundation are definitively deleted after only three months.

A last option for those who do not necessarily wish to contribute anonymously will be to create a user account in their own name. This is a personal choice that has to be taken into account since part of one's life is exposed to the connected world in a potentially irreversible way. Indeed, we must never forget that on the Web, any information disclosed can always be saved by someone on his computer for one day reappear somewhere on the web despite its deletion. A good example of this is the videos that are banned from the Net, which disappear and reappear over and over again.

However, displaying one's true identity in terms of one's contributions to the Wikimedia project has not only disadvantages. It also has the advantage of ensuring the authorship of his writings and thus protecting them from the risk of plagiarism while publishing them in most cases under a CC.BY.SA license that will protect them from possible retrieval and copyright. Finally, such a choice can also meet ethical obligations related to the framework of a scientific research for example.

Finally, all these options and provisions will guarantee an "à la carte" management of the privacy of wikimedians and their personal data. They will also allow certain users located in countries subject to censorship and repression to connect to virtual private networks (VPN) without the risk of revealing their identity or the address of the foreign connection they will use to connect to the sites. Last but not least, all of these provisions provide a climate conducive to freedom of expression and dialogue, which after all represents a new advantage for researchers.

Writing the research in a dialogical process
Dialogical writing in socio-anthropology is not a new concept. An anthropologist such as Mondher Kilani was already talking about it in the 1990s, citing as examples the writings of Philippe Descola, Jeanne Favret-Saada and his own. He describes his own experience as such :

"My text is not the evocation of an irreducible subjective experience. It is as much the product of a "truth" negotiated with the oasians as a construction explicitly addressed to a distant audience for which I reconstruct the different contexts of this negotiation".

More recently, Frédéric Laugrand, will develop a system of intergenerational knowledge transmission workshop (ATIS ) aimed at a co-construction of knowledge between participating researchers and actors in a transmission dynamic aimed at young people by "doing as if". The final goal of this process will be to produce documents in the form of verbatims that will ultimately be validated by the participants.

In the case of this work, it will not be so much a question of such collective production of knowledge, but rather of a negotiation similar to that expressed by Kilani. In this way, the co-construction of ideas will be clearly dissociated from the writing of the resulting text, even though spelling or syntactic alterations will always be welcome. As Kilani explained before me, it was therefore a question of "a dialogical writing that places personal testimony and the voice of others at the center of the anthropological narrative", but without giving others the opportunity to participate in the writing of this narrative. Here again, the socio-technical device implemented within the Wikimedia projects was of great help to me.

All the editorial sites supported by the Wikimedia movement are indeed collaborative spaces in which the sharing of knowledge ends up being established through the harmless gestures of editing and reciprocal surveillance. When I consulted the article "Open Science" on Wikipedia while writing this text, I did not hesitate, for example, to reformulate the introductory sentence with the following summary for my modification: "Reformulation of the first sentence for a better understanding." This modification once completed then becomes visible on this "diff" page which we have already spoken about previously and whose screenshot, as a reminder, is located at the level of figure 7.

As soon as the change is saved, this "diff" page will have been notified to all registered users who have chosen to add it to their watch list by clicking on the little star located between the "View history" tab and the "Search Wikipedia" box. This watchlist (see figure 9 below) looks very much like a history page of an article, with the difference that all the changes made to the articles that you want to track are listed here. In addition, users who have properly configured their notification system in their personal preferences will also receive the information and a link to the "diff" page directly in their mailbox. Such a technical device, therefore, reinforces the dialogical dimension between the editors of Wikimedia projects as soon as there is a difference of opinion on the modification made on a page. In case of disagreement, the best practice is to click on the "Discussion" tab at the top of all Wikimedia project pages to start the debate on a discussion page associated with the editing page.

As part of my PhD thesis on the Wikiversity project, I thought I would take advantage of this feature to encourage dialogue about research reports. My idea was to invite the actors of the Wikimedia movement to express this content either on the main discussion space when it comes to the whole of my thesis, with a system of structured discussions to simplify the life of people not initiated to the use of wikicode, or on other discussion pages associated with the chapters of my book. In order to make these pages as visible as possible, I took care to add below the title of each chapter the mention: "[ React to the content of this chapter ]".

Later on, I even encouraged the actors of the movement to enter into dialogue about my research, by regularly posting invitation messages on the main forum-type spaces available within the Wikimedia movement. Here is for information the content of the discussion that followed my first message entitled "Notice of work in progress" posted in the French Wikipedia "village pump" on May 31, 2019 :

Hello, I have started writing a PhD thesis published on Wikiversity on the Wikimedia movement. The first chapter of this work devoted to methodology is currently ready to be reread by those active in the movement. The formatting of the text is not finished and the spelling must be deplorable, but I would like to submit it for reaction before a next meeting with my support committee for a confirmation test. I therefore invite all interested people to react freely on the chapter discussion page. If you feel like it, you can also correct any spelling mistakes while reading. I would be most grateful. Thanking you in advance and wishing you all a great day. Yours sincerely, Lionel Scheepmans ✉ Contact Désolé pour ma dysorthographie, dyslexie et "dys"traction. 31 mai 2019 à 01:43 (CEST)
 * Interesting, but, apart from the spelling, who writes the thesis, the PhD student or the Wikipedia community? - Siren - (discuter) 31 mai 2019 à 14:12 (CEST)
 * Hello Siren, To answer the question: At the level of words and sentences, it's the PhD student. At the level of knowledge and ideas, it's the PhD student and the community, the Wikipedia community but also the community of all the projects supported by the foundation. If the question is asked, it is probably because things are not clear enough. So I'm going to try to rephrase it in a more explicit way. Moreover, this present interaction between us already partly illustrates the idea of a dialogical construction of knowledge. In the context of my PhD, it cannot unfortunately be similar to what happens on Wikipedia. This work leads to a diploma, and in the academic world around me, to be awarded the title of doctor, one has to defend a solo thesis on one's own. That said, Jimbo Wales received an honorary doctorate from my university, without having written a thesis. So there are many other people who know much more about the Wikimedia movement than I do, and it would be foolish and presumptuous of me not to invite them to enter into a dialogue about writing my thesis. Already a big thank you for the spelling corrections and a nice end of the day! Lionel Scheepmans ✉ Contact Désolé pour ma dysorthographie, dyslexie et "dys"traction. 31 mai 2019 à 23:45 (CEST)
 * Wow, I'm going to take a look at this page for a spin, and, oh, Scotch! Scotch. Absolutely fascinating, this stuff, I highly recommend reading! So, obviously, like all theses in a field not mine, it's so concentrated that for my poor mind will take a while to absorb everything, but already thoughts are flowing.
 * For example, I love the basic idea that the object of research, anchored in the real of true reality, shapes the methodologies and not the contrary, which is normally what we are taught. I do agree, however, that our tendency to determine strict, well-laid out, universal frameworks etc. comes from an era (let's say since the 18th century) where we favour the creation of categories even before we put objects in them, a willingness to sort of regulate everything, to classify and universalize everything, to produce empty frameworks. Very Newtonian. Perhaps linked to all representations of time (the temporal environment), but not necessarily.
 * I also like, intuitively, the reflection on the imaginary and its force of construction! A last point on the first chapter (I see that there have been many additions), it is said that the social sciences do not pretend to define a set of absolute parameters that make experiences reproducible, contrary to the other sciences, called hard. But in my opinion, neither do the other sciences. They pretend, they pretend to make reproducible, but it's just a useful tool. The parameters are subject to the same differences, but the hard sciences also tend to have applications and therefore want to be operational. It's a bit like making a reasoning by cutting the path in small steps (like Descartes) to reach the goal, but we are well aware that the path as such doesn't exist, we created it to solve the problem.--Dil (discuter) 31 mai 2019 à 23:57 (CEST)
 * Thank you for that encouraging feedback Dil ! Lionel Scheepmans ✉ Contact Désolé pour ma dysorthographie, dyslexie et "dys"traction. 2 juin 2019 à 01:42 (CEST) »

In addition to the discussion forums, I also used the Echo notification system set up within the Wikimedia digital space. This system allows you to notify a registered user from any page of the Wikimedia sites. This notification will appear at the top of all Wikimedia project pages once the user is logged in and again, depending on the user's preferences, in their email inbox.

Concretely, I only need to create the hyperlink: "Psychoslave" in this text produced on Wikiversity so that the user answering to the pseudonym "Psychoslave" will be warned that I mention it here. He will then know that I wish to draw his attention to this page, in this case to thank him for his interest in my work. In order not to attract his attention, I could have written his username without creating a hyperlink. In this case, Psychoslave will then have to do a laborious search using an internal or external search engine at Wikiversity to find this place where I talk about him.

Finally, in order to make the dialogical device work, I had to be able to master a minimum of the language of my interlocutors. Didn't Joseph-Marie de Gérando, one of the precursors of modern anthropology, write in the journal of the Société des observateurs de l'homme : "The first way to get to know the Savages [a common expression at the time] well is to become in some way one of them; and it is by learning their language that one will become their fellow citizen" ?

With more than 300 language versions of Wikipedia, the Wikimedia movement appears as an extremely polyglot meeting place. Fortunately, as always in this type of cosmopolitan community, English will come to everyone's rescue to serve as the lingua franca. However, it appears that in the digital Wikimedia space, knowledge of natural language is not always enough. As we have already seen, the practice of wikicode may sometimes be necessary to participate in discussions taking place in the many digital spaces managed by the MediaWiki software. And again, fortunately, understanding wikicode is not an insurmountable obstacle at first experience.

However, in order to fully understand the issues at stake in the digital world, but also to be able to comfortably dialogue about the socio-technical environment of the Wikimedia movement with the actors involved in its management, knowledge of Wikicode will often prove to be insufficient. These discussions will in fact often require a minimum knowledge of the vocabulary and grammar of various computer languages. A passive knowledge of the latter is therefore all the more indispensable if one does not want to ignore the warnings of Lawrence Lessig, published in his famous article entitled: "Code is law": "This code, or architecture, sets the terms on which life in cyberspace is experienced. It determines how easy it is to protect privacy, or how easy it is to censor speech. It determines whether access to information is general or whether information is zoned. It affects who sees what, or what is monitored. In a host of ways that one cannot begin to see unless one begins to understand the nature of this code, the code of cyberspace regulates." Somewhat similarly to Tom Boellstorf, the anthropologist in Second Life who organized discussion groups in her virtual house called "Ethnographia", the efforts invested in setting up a dialogical construction of my doctoral thesis will have been fruitful. They will have been a precious opportunity to confront my own vision of the movement with the "emic" point of view of the actors on the ground, while allowing them to react in case of problems. This process, which will not have finally led to an exceptional number of exchanges, will on the other hand have generated a large number of consultations of my work, nearly 1,150 times between 15 October 2019 and 15 March 2020. A reassuring set of figures if one adheres to the well-known adage: "He who says nothing, consents".

The ultimate advantage finally linked to the Wikimedia digital environment is that all these exchanges, as well as all the content of editorial projects, can be freely exploited in a study without the need for prior authorization. The only obligation sine qua non to be able to take advantage of this freedom, will be to publish one's work also under a CC.BY.SA license in order to respect the sharing condition under the same conditions imposed by the latter. This condition, also called "copyleft", is of primary importance, because it alone guarantees that all free content remains free after reuse.

Unfortunately, most studies on Wikipedia are published in journals or books published under copyright. Perhaps the most emblematic case will be the book ''Commons Knowledge? An ethnography of Wikipedia'', which includes a significant amount of citations from Wikipedia and yet was published under copyright in 2015 by Stanford University Press. This breach of the creative commons CC.BY.SA license is all the more surprising since Dariusz Jemielniak, author of this book, will be elected member of the board of the Wikimedia Foundation during the year of publication of his book and will even be re-elected thereafter in 2017. According to the opinion of an experienced user of the Wikimedia Commons project, Jemielniak and the Stanford University Press, however, are exposed to the risk of one day being concerned by a complaint from one or more users harmed by this abuse.

A call for contributions
At the end of this teaching produced by the meeting of two epistemic communities, that of the academic world and that of the Wikimedia movement, and provided that we overcome the posture of opposition of a certain intellectual elite, the sociotechnical environment so particular to Wikimedia projects can thus appear as a source of inspiration for epistemic communities working in the academic world. Beyond simple inspiration, one could even see it as a great opportunity to dream of a non-corporate, multi-paradigmatic science, verifiable by everyone, open, free access, transparent, dialogical, participatory and respectful of all.

Unfortunately, and until now, the academic community seems reluctant to get involved in the production of new knowledge within the Wikimedia universe. It is true that the movement has mainly focused on the development of the encyclopedic Wikipedia and, consequently, on the collection, synthesis and dissemination of pre-existing knowledge. More recently, the Wikidata project has attracted a lot of attention and institutional and financial means to carry its ambition to develop the semantic web. Will follow the 8 other Wikimedia projects whose consultation of the wishes of the community for 2020, exceptionally limited to "projects outside Wikipedian content", can serve as an indicator of the importance given to them by the community of Wikimedia contributors.

Out of 72 expressed wishes, the Wikisource project comes first with 28 proposals collected. It is followed by the Wiktionary which collects 20 proposals, followed by the Wikiversity project which collects 11 proposals. Unfortunately for them, the remaining 5 projects will not exceed 5 proposals. If we add Wikipedia and Wikidata to these 8 projects, this indicates that Wikiversity, the only project really dedicated to the production of new knowledge, only appears in fifth place in terms of interest from the Wikimedian community.

Yet things are changing on the English-speaking Wikiversity following the birth of the scientific journal "WikiJournal", which was recognized by the Open Publishing Awards in the "open publishing models" category in 2019. Maybe even a new Wikimedia project specifically dedicated to scientific publication will soon see the light of day within the movement. Isn't this a great opportunity for academics around the world to take their place in the Wikimedia movement, no longer as passive users, but as active contributors, for the benefit of a science that is just waiting to be emancipated?

Acknowledgements
Thanks at first to the whole Wikimedia movement to give us so much freely usable thinking material for research in humanities ans social science.

Many thanks to Oliviers Servais, Pierre-Joseph Laurent, Christophe Lazarro and Emmanuel Wathelet as my PhD steering committee.

Thanks also to www.DeepL.com/Translator (free version) for the first translation draft.

And all my gratitude to the users for the native English proofreading of my work.

Competing interests
None

Ethics statement
Explained on the paper.