Portal:Complex Systems Digital Campus/E-Department on Epistemology of Integrative Sciences

Portal:Complex_Systems_Digital_Campus/E-Department_on_Epistemology of Integrative Sciences Formal epistemology, experimentation, machine learning

Introduction Large cohorts of complex entities are more and more available, especially in medecine, in the social sphere and in the environment. The huge size of the data base makes it very difficult to reconstruct their multiscale dynamics through the multiple downward & upward influences. For such a task, the help of a formal epistemology and of computers is indispensable for complex systems scientists, generalizing the kind of open science performed by the high-energy community. The task of understanding a phenomenon amounts to finding a reasonably precise and concise approximation for that phenomenon and its behavior such that it can be grasped by the human brain. As it is, human intuition cannot handle the intrinsic properties of complex systems unaided. Ideally, optimal formal techniques provide us with candidate concepts and relations, which can then serve as a basis for the human experimental work. When the optimal forms found by the theory do not match the optimal concepts for the human brain, the reason for this discrepancy will itself be the subject of further investigation. Understanding complex systems thus requires defining and implementing a specific formal and applied epistemology. New methods and tools have to be developed to assist experimental design and interpretation for: •	Identifying relevant entities at a given time and space scale. •	Characterizing interactions between entities. •	Assessing and formalizing the system behavior. The strategy from experimental design to post hoc data analysis should reconcile the hypothesis- and data-driven approaches by: •	Defining protocols to produce data adequate for the reconstruction of multiscale dynamics. •	Boostrapping through the simultaneous building of a theoretical framework for further prediction and experimental falsification. •	A functional approach at different levels, leading to the construction of adequate formalisms at these levels. There is no theoretical guarantee that one formal level could then be deducible in any way from another, but this does not matter: Phenomenological reconstruction steps are preferable at each relevant level for the comprehension of the system. Collecting observations is a necessary part of the methodology. However, there arrives a point at which it is not relevant to go on collecting observations without knowing whether they are really required for understanding the system behavior. Phenomenological reconstruction leads to data parameterisation and obtained measurements should allow further detection and tracking of transient and recurrent patterns. These features themselves only make sense if they are integrated into a model aiming to validate hypotheses. We expect here to find a model consistent with the observations. The mere fact of building the model necessitates the formalization of hypotheses on the system behavior and underlying processes. Part of the understanding comes from there. More comes from the possibility of validating the model's predictions through experimentation. This last point is depicted on the right-hand side of the graph below. Formal & Applied Epistemology

The integration of computer science is an essential component of this epistemology. Computer science should provide: •	Exploratory tools for a data-based approach. Unsupervised learning provides the human with candidate patterns and relations that the unaided human intuition would not grasp. Active machine learning is concerned with determining which experiments are best suited to test a model, which is at the heart of the above epistemology. •	Tools for comparison between the model (hypothesis-driven) and the observations. Supervised learning corresponds to exploring the model parameter space for a good fit to the data. Auto-supervised learning is used when a temporal aspect allows the continuous correction of model predictions with the observed data corresponding to these predictions. Computer science methods and tools are required in the following steps: •	Human-machine interactions: visualization and interaction with data, ontologies and simulations. •	Building ontology of relevant functional entities at different levels. •	Constructing hypotheses, formalizing relations between entities, designing models. •	Validating the models. We expect certain fundamental properties from computer science methods and tools: •	Generic tools should be as independent as possible from a logical (interpretation) framework. In particular, because of the varying cultural habits of different disciplines and the specificities of each system, it is preferable to propose a collection of independent and adaptable tools, rather than an integrated environment that would not cover all cases anyway. •	Independence should also apply for the software aspect (for the usage and the evolution and the adaptation of the tools to specific needs). This requires free/libre software as a necessary but not sufficient condition. •	Tools need to be useful for a specialist as well as usable for a non-specialist. For example, by providing domain-specific features that really have an added value for the specialist as extensions (modules, etc.) of the generic tools. •	Readiness for use: The preconditions for the application of the tool should be minimal, the tool should not require a large engineering effort before it can be used. Main Challenges 1. Computer tools for exploration and formalization Identifying the computer as an exploration and formalization tool and integrating it into an epistemology of complex systems. Some research domains currently correspond to this approach and need to be extended. Computational mechanics and its causal state reconstruction is one of these candidate techniques that could possibly automate the phenomenological reconstruction, but there are challenges concerning its real applicability. For example, finding a practical algorithm for the continuous case, or building significant statistical distributions with a limited number of samples (relative to the search space). Statistical complexity can also be considered as a useful exploratory filter to identify the promising zones and interacting entities in the system. Another research domain that could be integrated into the epistemology is the quantification of the generalization capabilities of learning systems (e.g. Vapnik et al.). Automated selection of the most promising hypothesis and/or data instances is the topic of active learning. Its application is particularly straightforward for the exploration of the behavior of dynamical computer models, but more challenging for a multiscale complex system. The problem may be, for instance, to determine response surfaces, leading to a major change of behavior (collapse of an ecosystem, for instance). When the system is in high dimension, the search space is huge and finding the most informative experiments becomes crucial. Some analysis techniques are inherently multiscale (e.g. fractal/multifractal formalisms) and would need to be integrated as well. Dynamical regimes are a essential part of complex systems, where sustained non-stationary and/or transient phenomena maintain the state out of static equilibrium. Some of the existing mathematical and algorithmic tools should be adapted to this dynamic setup and new ones may have to be created specifically. Research is also needed on how to integrate these dynamical aspects directly into the experimental and formal aspects of the above epistemology. 2. Computer assisted human interactions The computer has become a necessary component of the scientific espistemology, as an extension to the human experimentalist who remains at the center of the loop. Three kinds of interactions involving humans and machines might be considered: - Machine to human: This corresponds (in particular) to visualization needs. The human sensory system (sight, hearing, etc.) is exceedingly powerful for some tasks, such as detecting patterns in an image, but quite poor for tasks like visualizing relations in large-dimensional spaces and graphs. Research is needed to provide the human with an adequate representation of a complex system, in a form suitable for the human sensory system. - Human to machine: The feedback and control that an unaided human can perform on a complex system is similarly limited. For example, when the human is used as the discriminant element for repeated decision-making (e.g. attributing/selecting fitness criteria of model parameters) the bottleneck is the limitation of the human to handle a large amount of such decisions in a time scale corresponding to the optimal execution of the algorithm. As a parallel to the visualization problem, human interaction capabilities on a large-dimensional simulation are relatively poor, especially with conventional devices such as a mouse and keyboard. Finding controls (software or hardware) adapted to the human morphology and limitations is another part of this human/complex system interaction challenge. - Human to human: The computer should help human communication. For instance, knowledge from domain experts is often lost when non-specialist computer scientists formalize and create the experiments that the experts need. Ideally, the computer should be a tool that enhances - not hampers - cross-disciplinary communication, as well as being directly usable by the experts themselves for designing experiments and models and running simulations. But the use of the computer as a facilitator of human-to-human relations is not limited to interdisciplinary aspects. The computer should become an integrant part of the collaborative process necessary to handle complex systems.