Web tracking by academic publishers

Providers of texts in electronic format often gather data on their readers: this is the case with E-readers. In the case of academic publishers, web tracking is part of a general trend towards data-driven management of research and higher education, where the data are collected and sold by private companies.

New sources of revenue
In the 2010s, major commercial publishers, in addition to providing content, have started performing data analytics. This is in particular the case of Elsevier, Pearson and Cengage. In 2018, the three leading research data analytics vendors were Clarivate, Digital Science (a division of the Holtzbrinck Publishing Group), and Elsevier (a publisher). These companies are selling research intelligence data tools to universities, research funders, and governments.

For example, Elsevier is selling the information system Pure to universities, with the claim of providing a "comprehensive overview of all their research activities" by aggregating "information from all their data sources". The 2020 partnership of Elsevier with Dutch research institutions, bundles a Publish and Read contract with research intelligence services. In 2018, Elsevier won a contract for collecting data for the European Commission's Open Science Monitor. The Irish Science Foundation is basing its strategy on data it purchases from Elsevier.

Improved services
Tracking readers allows publishers to improve services, for example by providing targeted reading suggestions, or by adapting search results to personal profiles.

Integrating the research workflow
The acquisition of research workflow tools by big publishers has been attributed to a strategy of research workflow embedment, in other words vertical integration of academic infrastructure. For example, Elsevier has acquired the reference manager Mendeley in 2013 and the preprint server SSRN in 2016.

It has been theorized that this integration leads to a data-driven organization of research. The focus is no longer the scholarly article, but the individual researcher, whose online behaviour generates valuable data.

Protecting copyright
The pirate website Sci-Hub has been threatening the subscription revenues of publishers. Sci-Hub downloads articles from publishers' websites using genuine university credentials. Some publishers have been claiming that this is a threat to universities' network security, and have founded the Scholarly Networks Security Initiative for combating it. The initiative has been advertising tools for tracking users before declaring in 2021 that is does not advocate the use of spyware.

Standard methods
Academic publishers use standard methods of web tracking. They gather information on users who connect to their websites, such as login data, browser fingerprints or IP addresses. Extra information can be provided by third-party cookies that publishers insert in users' computers. A 2019 study of 15 publisher websites found an average of "18 third-party assets being loaded on their article pages".

Data collection is facilitated by tools that are ostensibly designed for helping readers access the literature, such as GetFTR, an academic implementation of Single sign-on.

Data on individual users can be aggregated using "audience tools", i.e. commercial software from companies such as Adobe, Oracle or Neustar.

Specialized tools
Systems for managing academic libraries, which may be provided by Alma, ExLibris or OCLC, can perform data collection. Libraries can become dependent on such systems.

Objections
Web tracking by academic publishers has been criticized for:
 * Infringing on academics' privacy.
 * Threatening academic freedom.
 * Transforming science into a data analytics business, while driving the development of new monopolies.
 * Informing governments on dissident intellectuals.

Protests and statements

 * In 2021, the League of European Research Universities issued a data statement with the aim of tackling the "increasing dependence on dominant platform companies".
 * A 2021 petition demanded that publishers "stop tracking science", and asked research institutions to sign the DORA declaration.
 * A 2021 statement by the Invest in Open Infrastructure organization, also supported by other organizations, called for more oversight and regulation of Clarivate after its acquisition of ProQuest, with the aim of reining in "surveillance capitalism" in scientific research.
 * In 2021, the American Library Association issued a Resolution on the Misuse of Behavioral Data Surveillance in Libraries.