User:Douglas R. White/Social-circles network model

The generative model of feedback networks, studied by White, Kejžar, Tsallis, Farmer, and White, or social-circles network model, defines a class of random graphs generated by simple processes that are common to edge formation and feedback loops in social circles. This class is distinct from the small-world network and the scale-free network models in network analysis but also captures many of the characteristics of real-world social networks.

Highlights

 * A feedback network is one where active nodes or agents send tokens or communications through their network to help locate potential partners for new ties.
 * When a previously unconnected partner is located given constraints of distance a new tie is formed, creating a feedback loop.
 * Failing to locate a partner within the network, the active node engaged in partner selection recruits a new partner from outside the existing network and forms a tie.
 * Like many real-world networks, feedback networks modeled by this process evolve large networks with cohesive ties (social circles). In this way, small worlds with cohesive subgroups and relatively short distances may be generated, scale-free networks may be generated, or a variety of other network topologies may be generated.
 * The generative feedback network model simulates these processes and their network outcomes as new nodes and edges are added to an initial network.
 * Only three parameters are used to govern network formation in this probabilistic generative model. These govern levels of node activity, distance decay in how soon network traversal fails to find a partner, and the extent to which local hubs are used in network traversal.
 * The model provides an explanation for how hubs are formed in networks, how well-traveled edges are formed, how cycles and cohesion are formed, and how other features of real-world networks may result from different combinations of three basic network parameters.
 * Empirical network data can be fitted to the parameters of the general model.

Rationale
A social-circles network models by a graph-generating process the formation of cohesion and dispersion in connected networks as a function of acitvity and feedback. The process begins with a single node. The next events in the evolution of the network are that: (1) a node is chosen randomly (proportional to the number of current links of each node raised to power alpha) to emit a feedback token, (2) the token attempts to travel by a non-circular path through the current network by choosing its next neighbor randomly, proportional to the number of edges of the neighbor minus the node already traversed, raised to power gamma; (3) the token travels through the network until it reaches a random distance d, proportional to 1 over d raised to a power beta, and  failing that, the source node that originated the token gains an edge to a node that is added to the network (dispersion of edges), but if the token succeeds in reaching a target at distance d, a new 'feedback' edge is added between the source and the target.

The three parameters of the model are alpha, beta, gamma. The notion of 'feedback' in this model is very general and may involve intentionality on the part of source nodes as agents looking for a partner or simply nodes in a network emitting signals randomly that contain information that may enable the source and target to lock into a feedback relation, represented by the formation of a new edge. Whether searches or signals have the capacity to locate a target in an existing network is affected by the current size and topology of the network, the distance decay (beta) in traversal, and the intelligence (gamma) exhibited in the method of search. Failure to locate targets results in a generic substitution (new node and an edge connecting to it), like looking in the phone book for a doctor rather that asking for a referral.

The generative model
At each step three stochastic processes are run. A node i is selected, a distance d (an imaginary number of steps to get to a searched node j) is chosen, and the search path of distance d is generated.
 * A node i is selected with a probability proportional to its degree
 * $$p_\alpha(i)=\frac{[deg(i)]^\alpha}{\sum_{m=1}^N [deg(m)]^\alpha}$$


 * A distance d is chosen with the probability
 * $$p_\beta(d)=\frac{d^{-\beta}}{\sum_{m=1}^\infty m^{-\beta}}$$
 * where d represents the number of steps necessary to create a new cycle. If the number of steps cannot be concluded, a new node with a new edge is addded.


 * A search path of length d is generated. At a given moment a node r is a current node on the path. The next node is selected among the neighbors of r that were not yet visited. The probability of selecting a certain neighbor (say l) is proportional to their 'unused degree'. The 'unused degree', u(l), is considered a degree of the node from which all the already visited neighbors are subtracted. The probability is given by
 * $$p_\gamma(l)=\frac {[1+u(l)^{\gamma}]} {\sum_{m=1}^N[1+u(m)^{\gamma}]}$$

Results
Social-circles network simulation with varying model parameters results in a variety of networks whose topology in terms of network density, clustering, cohesion, and degree distribution varies in ways that resemble the variety of real-world networks. Study of the degree distributions and edge-traversal frequencies shows fit to the statistical distributions of Tsallis entropy, familiar to nonextensive physics. This provides a baseline test of the model to real-world networks, one that contrasts with that of the scale-free network, for which one baseline test is whether the degree distribution follows a power law.

The Tsallis entropy distribution asymptotes to a power law for large degree but bends toward the exponential distribution in the lower part of the degree distribution.

Degree Distributions
Parameters $$\alpha, \beta, \gamma$$ in the social circles model are shown to generate networks with degree distributions that fit a general family of curves that combine power-law with exponential tendencies. The probability density function for degree $$k$$ in this family, with parameters $$\delta, \kappa, q $$, and $$p_0$$, is given by:


 * $$p(k)=p_0k^\delta e_q^{-k/\kappa}$$

where the q-exponential function $$e^{-x}_q$$ with parameter $$q$$ is defined as


 * $$e^{-x}_q = (1+(1-q)x)^{1/(1-q)}  	(e_1^x = e^x) $$

The q-exponential function arises naturally as the solution of the equation $$dk/dt=k^q$$, which is $$k=e^{-x}_q = [1+(1-q) a t]^{1/(1-q)}$$. For the case where $$q=1$$, $$k=e^{at}$$. The function is related to the stationary solution of a nonlinear Fokker-Planck equation known as the porous medium equation. It describes correlated diffusion known as Darcy's law, the scientific basis of fluid permeability used in the earth sciences. This is a constitutive equation that describes the flow of a fluid through a porous medium as formulated by Henry Darcy to describe results of experiments on the flow of water through beds of sand (see references [2,3,4]). In the density function, $$p_0$$ coincides with $$p(0)$$ if and only if $$\delta=0$$; $$\kappa$$ is a characteristic degree. In a generalization of the water-flow analogy, the density function for degree would reflect a q-exponential component for which q=1 is omnidirectional flow, and as q increases there is initially a steep low-dimensional gradient that changes to flat and high dimensional as q grows to infinity. The other component corresponding to $$p_0k^\delta$$ would reflect a characteristic magnitude of flow also affected by q. A loose analogy is to traders whose activities etch the routes of their commercial networks, amplified by gradient intensities and the heterogeneity of terrain.

Retrofitting
The parameters $$\delta, \kappa,  q$$, and $$p_0$$ of the fitted models, which may also be applied to the degree distributions of empirical networks, were found to closely approximate 1-to-1 functions of the generating parameters, as given by the approximations
 * $$\delta = -1.5 \alpha $$
 * $$q = 1+.64\alpha + 6\alpha\gamma(\beta-1)^2$$
 * $$\kappa = 235\gamma(1-\alpha)(\beta-1)$$
 * $$p_0 = 2 + \alpha/4 - \beta  -1.3\gamma$$

These allow approximate solutions for the generating parameters that fit observed empirical degree of networks:
 * $$\alpha = - 2\delta/3$$
 * $$\beta = 3/2 + p_o/2 + q/9$$
 * $$\gamma = 5|(q/k)-.5|^{1/2} + 5p0+ 11q/k$$

The last of these approximations (for $$\gamma$$) is the least accurate (more precise estimates may yet be found). The authors of the paper have provided a more exact optimal matching between the four parameters fitted from empirical degree distributions of actual networks (assuming they follow the $$q$$-exponential) in spreadsheet form. This makes the social circles model a prime candidate for accounting for the degree distributions of empirical networks in terms of a generative network model with the three parameters $$\alpha, \beta, \gamma$$: activity bias of agents, distance decay, and navigability of the network. Maximum likelihood estimates (MLE) of the four $$\delta, \kappa,  q$$, and $$p_0$$ parameters for empirical networks and goodness-of-fit tests can establish the statistical relationship between model and data. Shalizi (2007) provides an MLE procedure for the Pareto II or q-exponential parameter fitting.

Replication and extensions
Kejžar in 2007 replicated these results by simulating feedback networks up to N=5000 nodes, extended the findings to:
 * 1) Degree distributions for directed edges (indegree, outdetree). These were similar to those of the undirected model, especially for $$\gamma=0$$ but differed toward higher q when $$\gamma \rightarrow 1$$
 * 2) Densification. Growth of edges relative to nodes is initially linear, but soon changees to approximate a slightly superlinear power-law (for $$\alpha=0, \beta=1.6, \gamma=0.5, e\approx 0.08+n^{1.08}$$, for example, where e are edge numbers and n numbers of node). Even with 1 billion nodes, however, this is scarcely more than an average degree of 5. Up to 6000 nodes, the average degree is less than two. For large degrees, however, the degree distribution approaches a power law with an exponent $$1/(q-1) -\delta$$. Densification is a natural consequence.
 * 3) Average shortest paths. If l is the average shortest path and n the number of nodes, l shrinks with n if $$\gamma=1$$ but rises if $$\gamma=0$$ and a weighted sum of $$\alpha, \gamma$$ determines the crossover (low $$\alpha$$, high $$\gamma$$ has the most rapid shrinkage). In every case there tends to be a "shrinking limit" as $$n \to \infty$$
 * 4) Variance of shortest paths. Most of the shortest paths between nodes approach the "shrinking limit."

In short, the generative feedback model has realistic and nicely continuous characteristics, and these simulations show no evidence that the density function for degree k is not the correct function.

Links

 * Simulated data in a Pajek Network Project *.paj file for 1250 nodes paramaters 0-2-0
 * Walk-through of a network simulation, edge by edge with the same parameters
 * INSNA International Network for Social Network Analysis -- professional association of network analysts