JMLR: Workshop and Conference Proceedings 20 (2011) 1–16 Asian Conference on Machine Learning Learning Attribute-weighted Voter Model over Social Networks Yuki Yamagishi b08107@u-shizuoka-ken.ac.jp School of Management and Information University of Shizuoka Kazumi Saito k-saito@u-shizuoka-ken.ac.jp School of Management and Information University of Shizuoka Kouzou Ohara ohara@it.aoyama.ac.jp Department of Integrated Information Technology Aoyama Gakuin University Masahiro Kimura kimura@rins.ryukoku.ac.jp Department of Electronics and Informatics Ryukoku University Hiroshi Motoda motoda@ar.sanken.osaka-u.ac.jp Institute of Scienti?c and Industrial Research Osaka University Editor: Chun-Nan Hsu and Wee Sun Lee Abstract We propose an opinion formation model, an extension of the voter model that incorporates the strength of each node, which is modeled as a function of the node attributes. Then, we address the problem of estimating parameter values for these attributes that appear in the function from the observed opinion formation data and solve this by maximizing the likelihood using an iterative parameter value updating algorithm, which is e?cient and is guaranteed to converge. We show that the proposed algorithm can correctly learn the dependency in our experiments on four real world networks for which we used the assumed attribute dependency. We further show that the in?uence degree of each node based on the extended voter model is substantially di?erent from that obtained assuming a uniform strength (a naive model for which the in?uence degree is known to be proportional to the node degree), and is more sensitive to the node strength than the node degree even for a moderate value of the node strength. Keywords: voter model, in?uence degree, attribute dependency. 1. Introduction The growth of Internet has enabled to form various kinds of large-scale social networks, and a variety of information, e.g. news, ideas, hot topics, rumors, innovations, etc. spreads in the form of "word-of-mouth" communications. It is noticeable to observe how much they a?ect our daily life style. The spread of information has been studied by many researchers (Newman et al., 2002; Newman, 2003; Gruhl et al., 2004; Domingos, 2005; Leskovec et al., 2006; Kimura et al., 2009, 2010a). The information di?usion models widely used are the independent cascade (IC) (Goldenberg et al., 2001; Kempe et al., 2003; Kimura et al., 2009) and the linear threshold (LT) c 2011 Y. Yamagishi, K. Saito, K. Ohara, M. Kimura & H. Motoda. Yamagishi Saito Ohara Kimura Motoda (Watts, 2002; Watts and Dodds, 2007) models. Both have been used to solve such problems as the in?uence maximization problem (Kempe et al., 2003; Kimura et al., 2007) and the contamina- tion minimization problem (Kimura et al., 2009; Tong et al., 2010). These two models focus on di?erent information di?usion aspects. The IC model is sender-centered and each active node in- dependently in?uences its inactive neighbors with given di?usion probabilities. The LT model is receiver-centered and a node is in?uenced by its active neighbors if their total weight exceeds the threshold for the node. Thus, it can be said that the IC model emphasizes "information push" and the LT model "information pull". In this paper, we address a di?erent kind of information di?usion, which is "opinion formation", i.e., spread of opinions. A well studied model for opinion dynamics is the voter model which has the same key property with the LT model that a node decision is in?uenced by its neighbor's decision, i.e., a person changes his or her opinion by the opinions of his or her neighbors. The basic voter model is de?ned on an undirected network and allows to have only two opinions. Each node adopts the opinion of a randomly chosen neighbor at each subsequent discrete time-step. There has been a variety of work on the voter model. Dynamical properties of the basic model, including how the degree distribution and the network size a?ect the mean time to reach consensus, have been exten- sively studied (Liggett, 1999; Sood and Redner, 2005) from mathematical point of view. Several variants of the voter model are also investigated (Castellano et al., 2009; Yang et al., 2009) and non equilibrium phase transition is analyzed from physics point of view. Yet another line of work ex- tends the voter model and combines it with a network evolution model (Holme and Newman, 2006; Crandall et al., 2008). The work which is most in?uential to this work is by Even-Dar and Shapria (2007) who in- vestigated the in?uence maximization problem at a given target time. They showed that the most natural heuristic solution, which picks the nodes in the network with the highest degree, is indeed the optimal solution. We extended the basic voter model to be able to handle multiple opinions and asynchronous time delay and, in doing so, introduced the value for each opinion to re?ect the fact that people are a?ected by the importance of the opinion, e.g., quality, brand, authority, etc. (Kimura et al., 2010b). We called this model "Value-weighted Voter Model with Multiple Opin- ions (VwVM)", and addressed the problem of predicting the expected opinion share at a target time from the observed opinion formation data. We further addressed the problem of detecting the change of the opinion values from the observed data (Saito et al., 2011) In this work, we introduce another factor which we call strength of each node. This is di?erent from the value of opinion that was introduced in Kimura et al. (2010b). It is based on the observation that a person is in?uenced not only by what each opinion is about but also by who holds/says that opinion. Some persons are more in?uential than others, and we consider this degree of in?uence by the strength value that is associated with each node. Here, we must note that the in?uence meant here can better be named as direct in?uence and is di?erent from what has been used in previous studies which can better be named as indirect in?uence. In information di?usion the in?uence degree of a node is de?ned as the expected number of active nodes at the end of the random process of the information di?usion that originated from the node (Kempe et al., 2003; Kimura et al., 2007). In particular, in opinion formation, it is de?ned as the expected number of nodes that hold the same opinion with the starting node at the end of the random process of the opinion formation (Even-Dar and Shapria, 2007). We distinguish the strength of a node, i.e., direct in?uence, from the in?uence degree of the node, i.e., indirect in?uence. The strength we de?ne here 2 Learning Attribute-weighted Voter Model over Social Networks is assumed to be intrinsic to each node and is determined independently of the result of information di?usion or opinion formation. The problem we want to solve in this paper is to learn this strength from the observed opinion formation data and investigate how it a?ects the in?uence degree. In principle it is possible to learn the strength of all the nodes in the network from the observed data, given the generative model of opinion formation, by maximizing the likelihood of the observed data being generated. However, the number of nodes is huge and we need prohibitively large amount of observation data to avoid the over?tting problem. We rather assume and think it more natural that the strength is determined by the attributes of each node and its attribute dependency is more or less uniform across the nodes, and try to learn the parameters that de?ne this attribute dependency from the data. We call this model " Attribute-weighted Voter Model with Multiple Opinions (AwVM)" in contrast to the model we previously de?ned, "Value-weighted Voter Model with Multiple Opinions (VwVM)". We derived a very e?cient parameter updating algorithm to maximize the likelihood function that is guaranteed to converge, and tested the performance of the algorithm on four real world networks assuming the attribute dependency of the parameters to be of a particular form. The algorithm can correctly estimate the strength of each node by way of node attributes through a learned function. We further show that the in?uence degree of each node based on the AwVM is substantially di?erent from a naive AwVM that assumes a uniform strength throughout the nodes for which the in?uence degree is known to be proportional to the node degree, and there appears to be no simple heuristic to approximate the in?uence degree with good accuracy unless the network is dense. The sensitivity analysis indicates that as the degree of non-uniformity of the strength becomes greater, the in?uence degree becomes progressively more sensitive to the node strength than the node degree, and even for a moderate value of the non-uniformity of node strength, it is more a?ected by the node strength than by the node degree. 2. Opinion Formation Models Let G = (V, E) be an undirected (bidirectional) network with self-loops, where V and E (? V * V) are the sets of all the nodes and links in the network, respectively. For a node v ∈ V, let Γ(v) denote the set of neighbors of v in G, that is, Γ(v) = {u ∈ V; (u, v) ∈ E}. Note that v ∈ Γ(v). 2.1. Basic Voter Model According to the work of Even-Dar and Shapria (2007), we recall the de?nition of the basic voter model with two opinions on networks G. In the voter model, each node of G is endowed with two states; opinions 1 and 2. The opinions are initially assigned to all the nodes in G, and the evolution process unfolds in discrete time-steps t = 1, 2, 3, · · · as follows: At each time-step t, each node v picks a random neighbor u and adopts the opinion that u holds at time-step t ? 1. More formally, let ft : V → {1, 2} denote the opinion distribution at time-step t, where ft(v) stands for the opinion of node v at time-step t. Then, f0 : V → {1, 2} is the initial opinion distribu- tion, and ft : V → {1, 2} is inductively de?ned as follows: For any v ∈ V, ft(v) = 1, with probability |n1(t, v)|/|Γ(v)|, ft(v) = 2, with probability |n2(t, v)|/|Γ(v)|, where nk(t, v) = {u ∈ Γ(v); ft?1(u) = k} is the set of v's neighbors that hold opinion k at time-step t ? 1 for k = 1, 2. 3 Yamagishi Saito Ohara Kimura Motoda Again, according to the work of Even-Dar and Shapria (2007), we de?ne the expected in?uence degree of each node v, denoted by σb(v). Consider an initial opinion distribution that f0(v) = 1 and f0(w) = 2 if w v. Namely, only v's opinion is 1 and that of all the other nodes is 2. Then σb(v) is de?ned as the expected number of nodes that hold the opinion 1 (node v's initial opinion) after enough time has passed. More formally, for a given network G = (V, E), we identify each node with a unique integer from 1 to |V|. Then we can de?ne the adjacency matrix A ∈ {0, 1}|V|*|V| by setting au,v = 1 if (u, v) ∈ E; otherwise au,v = 0. We also de?ne the diagonal matrix D, each element of which, dv,v = d(v), is the degree of node v, i.e., d(v) = |Γ(v)| = u∈V au,v. Here note that d(v) ≥ 1 due to self-loops. Let hv ∈ {0, 1}|V| be a vector whose v-th element is 1 and all the other elements are 0, and h ∈ {1}|V| be a vector whose elements are all 1. Now, we can calculate the expected number of nodes that hold opinion 1 at time t = 1 starting from a node v by hT v AD?1 h. Thus, the vector b of the expected in?uence degree, each element of which is bv = σb(v), is de?ned as a limiting solution of the following iterative process with the initial setting b(0) = h. b(t) = AD?1 b(t?1) . (1) Especially, in case of an undirected network, the analytical solution can be derived, i.e., bv = σb(v) = |V||Γ(v)|/ u∈V |Γ(u)|. The result clearly states that the in?uence degree bv of a node v is proportional to its node degree |Γ(v)|. 2.2. Attribute-weighted Voter Model We extend the basic voter model by allowing to hold K opinions (K ≥ 2). Further as explained in Section 1, we introduce the strength for each node v, denoted by sv 1. Then, we can de?ne the following probability of opinion adoption for the new voter model with the node strength. P( ft(v) = k) = u∈nk(t,v) su u∈Γ(v) su , (k = 1,K), (2) where nk(t, v) = {u ∈ Γ(v); ft?1(u) = k} is the set of v's neighbors that hold opinion k at time-step t ? 1 for k = 1, 2,K. Similarly to the basic voter model, we can de?ne the expected in?uence degree σa(v) for the new voter model, which is the expected number of nodes that hold the opinion k after enough time has passed when only the node v has the opinion k and all the other nodes have di?erent opinions at t = 0. To this end, we de?ne the diagonal matrix W, each diagonal element of which, wv,v = w(v), represents the total strength of v's neighbors, w(v) = u∈Γ(v) su. According to the arguments of the basic voter model, the vector a of the expected in?uence degree, each element of which is av = σa(v), is de?ned as a limiting solution of the following iterative process with the initial setting a(0) = h. a(t) = SW?1 a(t?1) , (3) where S is the strength matrix which is obtained by replacing au,v of A with au,vsu. Note that unlike Eq. (1), no analytical solution is known for Eq. (3), and how the strength sv a?ects the in?uence degree av is not clear. Thus, we solve it numerically by iteratively calculating Eq. (3) until the di?erence a(t) ? a(t?1) becomes a reasonably small value. The convergence is guaranteed by the 1. One of the anonymous reviewers pointed that our formulation is similar to the policy gradient learning in the rein- forcement learning, which we were not aware of, i.e., Eq. (2) corresponds to a Boltzman distribution policy, Eq. (3) to the transition model of a Markov decision problem (MDP), and estimating the strength of a node to learning the value function of the MDP. 4 Learning Attribute-weighted Voter Model over Social Networks Perron-Frobenius theorem because the network is connected. Further note that Eq. (3) is de?ned independently of the number of opinions K. In general, we don't know the adequate value for the strength of each node apriori. As one possible approach, we consider estimating each of them from a set of observed opinion formation results. However, as each node has its own strength, the number of parameters to learn is so huge that we need prohibitively large amount of training data to learn them all individually. As described in Section 1, we note that it is more natural to think that the strength is determined by the attributes of each node (person), e.g., occupation, physical appearance, income, social status, etc., and its attribute dependency is more or less uniform across the nodes. We, thus, propose an Attribute- weighted Voter Model with Multiple Opinions (AwVM) that explicitly considers the dependency of node strength on its attributes. We assume that each node can have multiple attributes. Let xv, j be a value that node v takes for the j-th attribute, and J the total number of the attributes. We denote the J-dimensional vector of the attribute values for each node v by xv. Then we propose to model the strength sv of node v by the following formula 2: sv = s(xv, θ) = exp(θT xv), (4) where θT = (θ1,θJ) is the J-dimensional parameter vector for the attributes to determine the strength value of each node. So far, we assumed a discrete time step. However, the actual opinion formation takes place in an asynchronous way along the continuous time axis, and the time stamps of the observed data are not equally spaced. Thus, there is a need to extend the model to make the state changes asynchronous. In order to describe the asynchronous voter model, we need to extend the de?nition of nk(t, v), the set of v's neighbors that hold opinion k at t ? 1, to be the one with the latest opinion before time t, i.e., nk(t, v) = {u ∈ Γ(v); ?t(u) = k}, where ?t(u) is the latest opinion of u before time t.3 Then, the evolution process of the asynchronous voter model is de?ned as follows: 1. At time 0, each node v independently decides its update time t according to some probabil- ity distribution such as an exponential distribution with parameter 1.4 The successive update time is determined similarly at each update time t. 2. At update time t, the node v adopts a new opinion according to Eq. (2). 3. The process is repeated from the initial time t = 0 until the next update-time exceeds a given ?nal-time T. 3. Learning Problem and Method We consider the problem of estimating the parameters for the attributes from observed data DT in time-span [0, T], where DT consists of a sequence of (v, t, k) such that node v updated its opinion to opinion k at time t for 0 ≤ t ≤ T.5 By estimating parameters, we can identify the in?uential nodes, 2. This is a simple smooth function with respect to θ that guarantees sv > 0. 3. Note that the opinion update takes place at time t and we need the distribution before the time t. 4. This assumes that the average delay time is 1. 5. The data come in sequence each time the update takes place, but in the formulation we treat them as a set for easiness of the mathematical treatment. 5 Yamagishi Saito Ohara Kimura Motoda i.e., those with large in?uence degree, as well as the relevant attributes for determining the strength of nodes. We formulate our problem for estimating the parameter values of the AwVM from a given observed opinion formation data DT . Based on the evolution process of our model (see Eq. (2)), we can obtain the log likelihood function, L(DT ; θ) = log ? ? ? ? ? ? ? ? (v,t,k)∈DT P( ft(v) = k) ? ? ? ? ? ? ? ? = (v,t,k)∈DT ? ? ? ? ? ? ? ?log ? ? ? ? ? ? ? ? u∈nk(t,v) exp(θT xu) ? ? ? ? ? ? ? ? ? log ? ? ? ? ? ? ? ? u∈Γ(v) exp(θT xu) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? . (5) Thus our estimation problem is formulated as a maximization problem of the objective function L(DT ; θ) with respect to θ. We derive an EM like iterative algorithm for obtaining the maximum likelihood estimators. Now, let ? θ be the current estimates of θ. Then, by considering the posterior probabilities, qv,t,k,u(θ) = exp(θT xu) w∈nk(t,v) exp(θT xw) , (v ∈ V, 0 ≤ t ≤ T, k = 1,K, u ∈ nk(t, v)), we can transform our objective function as follows: L(DT ; θ) = Q(θ; ? θ) ? H(θ; ? θ), (6) where Q(θ; ? θ) is de?ned by Q(θ; ? θ) = (v,t,k)∈DT ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? u∈nk(t,v) qv,t,k,u(? θ) θT xu ? ? ? ? ? ? ? ? ? log ? ? ? ? ? ? ? ? u∈Γ(v) exp(θT xu) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? , (7) and H(θ; ? θ) is de?ned by H(θ; ? θ) = (v,t,k)∈DT u∈nk(t,v) qv,t,k,u(? θ) log qv,t,k,u(θ). (8) It is self-evident that H(θ; ? θ) is maximized when θ = ? θ, which is easily proved by noting the constraint u qv,t,k,u(? θ) = 1. Then, it follows L(DT ; θ) ? L(DT ; ? θ) ≥ Q(θ; ? θ) ? Q(? θ; ? θ). Thus, L(DT ; θ) always increases by repeatedly maximizing Q(θ; ? θ) with respect to θ and updat- ing ? θ. When Q(θ; ? θ) reaches a ?xed point, i.e., Q is the maximum at θ = ? θ for ? θ, it holds that (?Q(θ; ? θ)/?θ)θ= ? θ = (?L(DT ; θ)/?θ)θ= ? θ = 0, i.e., when Q reaches a ?xed point, L(DT ; θ) also reaches a maximum, not necessarily a global maximum. In order to derive our maximization algorithm for Q(θ; ? θ), we further de?ne the following prob- ability that the node v adopts the opinion of node u at time t whatever opinion u has: rv,t,u(θ) = exp(θT xu) w∈Γ(v) exp(θT xw) . Then, we can obtain the gradient vector of Q(θ; ? θ) with respect to θ as follows: ?Q(θ; ? θ) ?θ = (v,t,k)∈DT ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? u∈nk(t,v) qv,t,k,u(? θ) xu ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? u∈Γ(v) rv,t,u(θ) xu ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? . 6 Learning Attribute-weighted Voter Model over Social Networks Similarly, we can obtain the Hessian matrix with respect to θ as follows: ?2Q(θ; ? θ) ?θ?θT = ? (v,t,k)∈DT ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? u∈Γ(v) rv,t,u(θ) xuxT u ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? u∈Γ(v) rv,t,u(θ) xu ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? u∈Γ(v) rv,t,u(θ) xu ? ? ? ? ? ? ? ? T ? ? ? ? ? ? ? ? ? . Thus we can obtain the following modi?cation vector Δθ for updating θ: Δθ = ? ?2Q(θ; ? θ) ?θ?θT ?1 ?Q(θ; ? θ) ?θ . (9) Here note that the following quadratic form of the Hessian matrix is non-positive for an arbitrary J-dimensional non-zero vector z = (z1,zJ), zT ?2Q(θ; ? θ) ?θ?θT z = ? (v,t,k)∈DT ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? u∈Γ(v) rv,t,u(θ) (xT u z)2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? u∈Γ(v) rv,t,u(θ) xT u z ? ? ? ? ? ? ? ? 2? ? ? ? ? ? ? ? ? ≤ 0 The Hessian matrix of Q is non-positive de?nite, and thus, the optimal solution of Q is obtained by using the Newton method. We can regard our estimation method as a variant of the EM algo- rithm. Namely, calculating Q by Eq. (7) and updating θ by Eq. (9) correspond to Expectation and Maximization steps, respectively. Which one in the |nk(t, v)| active parents actually activated its child is not known when |nk(t, v)| ≥ 2, which corresponds to the existence of the latent variables although they are not explicit in our formulation. We want to emphasize here that each time iter- ation proceeds the value of the likelihood function never decreases and the iterative algorithm is guaranteed to converge due to the convexity of Q. Below we summarize the algorithm of the proposed method. 1. Initialize parameter vector θ as θj = 0 for j = 1,J. 2. Calculate the gradient vector at the current parameter vector θ. 3. If the gradient vector is su?ciently small, i.e., ?Q(θ; ? θ)/?θ < η, output the parameter vector ? θ, and then terminate. Otherwise, go to 4. 4. Update the parameter vector θ by Eq. (9), and return to 2. Here η is a parameter for the termination condition. In our experiments, η is set to a su?ciently small number, i.e., η = 10?12. Finally, we brie?y discuss the computational complexity of the learning algorithm. Evidently the most computationally expensive part is the Hessian matrix, i.e., for each (v, t, k) ∈ DT and its corresponding neighbor u ∈ Γ(v), the required computational complexity is the square of the parameter size J. The average number of neighbors Γ(v) is |E|/|V|, and the expected number of elements in DT is T|V| (this is true for the exponential distribution with parameter 1). Thus, let M be the number of iterations required to obtain the solution; then the computational complexity of the algorithm is given by O(MT|E|J2). 4. Experiments First, we evaluated experimentally our learning algorithm using synthetic opinion formation data that were generated from four large real world networks for which we assumed a speci?c relation between the node attributes and the node strength (Eq. (4)). Second, we evaluated how the node strength a?ects the in?uence degree (the expected number of nodes that hold the same opinion with the starting node at the end of the random process of the opinion formation). 7 Yamagishi Saito Ohara Kimura Motoda 4.1. Network Datasets We used four large real networks, which are all bidirectionally connected. The ?rst one is a reader network of "Ameba"6 that is a Japanese blog service site. We crawled the reader lists of 117,374 blogs of the Ameba blog service site in June 2006, and collected a large connected network, which has 56,604 nodes and 1,071,080 directed links (the Ameblo network). The second one is a trackback network of Japanese blogs (Kimura et al., 2009) that has 12, 047 nodes and 79, 920 directed links (the blog network). The third one is a Coauthorship network (Palla et al., 2005) and has 12, 357 nodes and 38, 896 directed links (the coauthorship network). The last one is a network of people that was derived from the "list of people" within Japanese Wikipedia (Kimura et al., 2008), which has 9, 481 nodes and 245, 044 directed links (the Wikipedia network). 4.2. Experimental Setting For each network we generated synthetic opinion formation data DT of time span [0, T] in the fol- lowing way: 1) arti?cially generate node attributes and determine their values in a random manner; 2) determine a parameter vector θ which is assumed to be true; and then 3) generate DT multiple times for the given K and T by running the true AwVM that uses θ, each of which starts from the state in which the initial opinion of each node is selected uniformly at random from the K opinions. The values of K and T are varied as needed. We generated a total of 10 attributes for every node in each network, each with a real value of [?1, 1]. The true parameter vector θ was determined so that the distribution of sv becomes uniform, that is, the expected value of θT xv, i.e., θT xv becomes 0. As one such instance, we chose the parameter vector θ = (2.0, ?1.0, 1.0, ?2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0). Note that this is di?erent from limiting the number of attributes to 4.7 We conducted two kinds of experiments to evaluate the proposed method, one to evaluate the performance of the learning algorithm, and the other to evaluate the e?ect of introducing the strength on the in?uence degree. The learning performance was evaluated in terms of the accuracy of the parameter values ? θ that are estimated from DT by our learning algorithm with di?erent values for T and K, and its computation time. The e?ect of the strength was evaluated by comparing the in?uence degree (Eq. (3)) and the node ranking (with respect to the in?uence degree) of the AwVM that uses the learned parameters with the naive AwVM in which uniform strength i.e., sv = 1, is used, which is equivalent to the basic voter model with K opinions. Further, the node ranking of AwVM is compared with the ranking based on the node degree and other heuristic. We denote the true in?uence degree of AwVM as σa(v) (Section 2.2), the estimated in?uence degree of AwVM as ? σa(v) and the in?uence degree of the naive AwVM as σb(v) (Section 2.1). We terminated the iterative calculation of Eq. (3) if either of the following conditions is satis?ed: the number of iteration exceeds 1,000, or ||a(t) ? a(t?1)|| < 10?8. 4.3. Evaluation of Parameter Estimation Figure 1 shows the mean absolute error which is de?ned as ΣJ j=1|θj ? ? θj|/J. It is the average over 10 trials that were conducted on the 10 distinct opinion formation data DT , each generated inde- pendently for each combination of the corresponding T and K for each network. Figures 1(a) to 1(c) is to show how the time span [0, T] a?ects the results, and Figs. 1(d) to 1(f) is to show how 6. http://www.ameba.jp/ 7. The algorithm should be able to identify the useless attributes for which the parameter values are 0. 8 Learning Attribute-weighted Voter Model over Social Networks T Mean absolute error 0.000 0.004 0.008 0.012 0.016 10 20 30 Ameblo blog Coauthorship Wikipedia (a) K = 3 T Mean absolute error 0.000 0.004 0.008 0.012 0.016 10 20 30 Ameblo blog Coauthorship Wikipedia (b) K = 6 T Mean absolute error 0.000 0.004 0.008 0.012 0.016 10 20 30 Ameblo blog Coauthorship Wikipedia (c) K = 9 K Mean absolute error 0.000 0.004 0.008 0.012 0.016 3 6 9 Ameblo blog Coauthorship Wikipedia (d) T = 10 K Mean absolute error 0.000 0.004 0.008 0.012 0.016 3 6 9 Ameblo blog Coauthorship Wikipedia (e) T = 20 K Mean absolute error 0.000 0.004 0.008 0.012 0.016 3 6 9 Ameblo blog Coauthorship Wikipedia (f) T = 30 Figure 1: Mean absolute errors of the estimated parameter values for each network. Table 1: Computation time (sec.) and the number of iterations of the proposed learning algorithm (in parentheses). K = 3 K = 6 K = 9 T = 10 36.49 (24.9) 22.70 (16.0) 19.49 (13.5) Ameblo T = 20 71.58 (25.0) 45.70 (16.2) 40.70 (14.0) T = 30 103.51 (25.7) 70.45 (17.0) 62.85 (14.1) T = 10 2.24 (23.0) 1.76 (17.0) 1.58 (15.0) blog T = 20 4.55 (24.8) 3.61 (18.5) 3.36 (16.9) T = 30 6.77 (25.6) 5,62 (19.9) 5.26 (18.1) T = 10 1.06 (19.0) 0.97 (15.4) 0.90 (14.0) Coauthorship T = 20 2.10 (21.0) 1.91 (17.0) 1.83 (16.0) T = 30 2.97 (21.0) 2.79 (17.5) 2.71 (16.1) T = 10 8.07 (34.3) 5.69 (23.9) 4.95 (20.2) Wikipedia T = 20 17.14 (37.6) 12.75 (27.1) 11.08 (23.9) T = 30 27.84 (41.3) 19.96 (29.2) 18.55 (26.9) the number of opinions K a?ects the results. The mean absolute error is extremely small and in general decreases as both T and K increase. A larger T implies a larger training sample size of DT which naturally contributes to reducing the error. A larger K implies a larger chance of updating to a di?erent opinion at each update time which contributes to increasing the diversity of the data, again contributing to reducing the error. Said di?erently, when K is smaller, the time to reach local 9 Yamagishi Saito Ohara Kimura Motoda consensus gets earlier and the amount of data to be e?ective for learning gets smaller. The error for Ameblo network being the smallest is explained by the fact that this network has by far the largest number of nodes, indicating the largest expected number of samples which is T|V| as explained in the last paragraph in Section 3. Table 1 shows the computation time. The machine used is Intel(R) Xeon(R) CPU W5590 @3.33GHz with 32GB memory. The computation time roughly follows the results of computa- tional complexity analysis. Under the situation where the number of iteration M is about the same for the same K, which is the case here, the computation time is proportional to the number of links for the same T. Ameblo network takes 4 times longer than Wikipedia network, Wikipedia network takes 3 times longer than Blog network and Blog network takes twice longer than Coauthorship network. The number of opinions K a?ects favorably the computation time. This is because the larger diversity of the data as explained above accelerates the convergence of iterative procedure. Overall, we can say that our learning method can accurately estimate the parameter values of the AwVM, and its performance with respect to the time span and the number of opinions are interpretable. 4.4. Comparative Study on In?uence Degree We evaluated the e?ect of the node strength in terms of the in?uence degree. To this end, we introduced a new parameter c to control the true parameter by θc = cθ. The parameter c is a measure to indicate the non-uniformity of the node strength, and is called the non-uniformity parameter. It a?ects the value of the strength directly, too, but as Eq.(2) shows, what matters is the relative strength. We use c = 0.1, 0.5, and 1.0 in the following experiments. Figure 2(a) illustrates how the mean of the node strength and the standard deviation (non-uniformity) change with c. Both increase exponentially as c increases. The distribution of the node strength for c = 1.0 is much more non- uniform than that for c = 0.5 as shown in Fig. 2(b). The strength is almost uniform across the nodes for c = 0.1, but some nodes have the strength which is 100 times as high as the average for c = 1.0. This can be expected from the form of Eq.(4). We think such a deviation for c = 1.0 is not rare even in our real human relations. Now for each c, we de?ne the mean absolute error of the AwVM by a = 1 |V| v∈V |σa(v)? ? σa(v)|, and that of the naive model by b = 1 |V| v∈V |σa(v) ? σb(v)|. Figure 3 shows the results of the mean absolute errors a and b. Here, we used the 10 distinct opinion formation data, each generated with T = 30 and K = 3 for each network, and ? θ was estimated by our learning method for each trial. All of the results shown in Fig. 3 are the average over the 10 trials. The mean absolute error of the AwVM is reduced to 0.5 to 2% of the naive model for c = 1.0, and 10 to 20% even for c = 0.1 where the node strength has only a small e?ect (Fig. 3(a)). Fig. 3(b) shows the results of the standard deviation of the mean absolute errors. Both the mean and the standard deviation are of the same order, but the standard deviation is more sensitive to c, which can be expected by the increase in the standard deviation (non-uniformity) shown in Fig. 2(a). To do a more detailed analysis, we illustrate in Fig. 4 the actual in?uence degree of each node in the blog network for one particular run, randomly chosen from the 10 independent trials. The nodes in the horizontal axis are ranked in descending order of the true in?uence degree σa. Also from this result, we can ?nd that the di?erence between σa (black solid line) and ? σa (red cross "*") is quite small for any value of c, while the di?erence between σa and σb (blue plus "+") becomes 10 Learning Attribute-weighted Voter Model over Social Networks 0.1 0.3 0.5 0.7 1 0 2 4 6 8 10 12 parameter c node strength standard deviation mean (a) Change of the mean and the standard de- viation of node strength with the non- uniformity parameter c 2000 4000 6000 8000 10000 12000 10 ?2 10 ?1 10 0 10 1 10 2 node ranked by strength strength c = 1.0 c = 0.7 c = 0.5 c = 0.1 (b) The distribution of node strength with the non-uniformity parameter c = 0.1, 0.5, 0.7, 1.0. Figure 2: The relation between the node strength and the non-uniformity parameter c. Average of mean absolute error 1.0E-04 1.0E-03 1.0E-02 1.0E-01 1.0E+00 1.0E+01 0.1 0.5 1.0 Ameblo Ameblo Coauthorship Coauthorship blog blog Wikipedia Wikipedia parameter c (a) Average of the mean absolute errors of ? σa(v) and σb(v) 1.0E-04 1.0E-03 1.0E-02 1.0E-01 1.0E+00 1.0E+01 0.1 0.5 1.0 Ameblo Ameblo Coauthorship Coauthorship blog blog Wikipedia Wikipedia Standard deviation of mean absolute error parameter c (b) Standard deviation of the mean absolute errors of ? σa(v) and σb(v) Figure 3: Average and standard deviation of the mean absolute errors of ? σa(v) and σb(v) for each network. larger as c increases8. Especially, the di?erence between σa and σb is quite large for the top 100 nodes for c = 0.5 and c = 1.0. Note that the horizontal axis is logarithmic scale. This implies that for a majority of nodes the di?erence between σa(v) and σb(v) is very small. This explains why the mean absolute error is low as shown in Fig.3. We observed the same tendency for the other three networks, and here show only the results for the Coauthorship network in Fig. 5 for a reference. In summary, these results indicate that it is important to obtain the parameter values accurately in order to estimate the in?uence degree of each node in good accuracy. In case of the basic model, it has been proven that the in?uence degree σb(v) of a node is proportional to the degree of the node (Section 2.1). We explored to ?nd a measure that correlates well to the in?uence degree in case of AwVM. Figure 6 shows the relation between the node degree and the in?uence degree (nodes ranked according to the degree) and Fig. 7 the relation between the node strength and the in?uence degree (nodes ranked according to the strength), both for the same trial for the blog network as in Fig. 4. The black solid line ? d(v) in Fig. 6 represents the value of d(v) 8. Note that the scale of the vertical axis of Fig. 4(c) is much larger than the other two ?gures. 11 Yamagishi Saito Ohara Kimura Motoda (a) Non-uniformity parame- ter c = 0.1 (b) Non-uniformity parame- ter c = 0.5 (c) Non-uniformity parame- ter c = 1.0 Figure 4: Comparison of in?uence degree, σa(v), ? σa(v), and σb(v) for the blog network for one particular run, randomly selected from the 10 independent trials. (a) Non-uniformity parame- ter c = 0.1 (b) Non-uniformity parame- ter c = 0.5 (c) Non-uniformity parame- ter c = 1.0 Figure 5: Comparison of in?uence degree, σa(v), ? σa(v), and σb(v) for the Coauthorship network for one particular run, randomly selected from the 10 independent trials. that is normalized so that the total sum of d(v) over all nodes becomes equal to the total sum of the in?uence degree, and the black solid line ? s(v) in Fig. 7 represents the value of sv that is normalized in the same way. From Fig. 6, we can see that the node degree d(v) can be a good estimator of ? σa(v) for c = 0.1, but it does not work well for a larger value of c. This is because the AwVM becomes closer to the basic voter model as the parameter c becomes closer to 0. Indeed, in Fig. 6, σb overlaps with the curve that represents ? d(v) because the naive model with sv = 1.0, i.e., c = 0, is identical to the basic voter model. Conversely, Fig. 7 shows that sv can be a good estimator of ? σa(v) in case of c = 1.0, but it does not work well for a smaller c. Also in this case, the node degree ? d(v) = σb(v) can approximate ? σa(v) in good accuracy only for c = 0.19. These results suggest that although either the node strength sv or the node degree d(v) alone cannot be a good estimator of the in?uence degree of the AwVM, their combination could be a good estimator of the in?uence degree. To con?rm this hypothesis, we further investigated the relation between their product, i.e., g(v) = svd(v), which is referred to as the strength-weight degree, and the 9. It does not look so at ?rst glance, but note that the number of plots are the same for both σb(v) and ? σa(v) and there are many overlaps. 12 Learning Attribute-weighted Voter Model over Social Networks (a) Non-uniformity parame- ter c = 0.1 (b) Non-uniformity parame- ter c = 0.5 (c) Non-uniformity parame- ter c = 1.0 Figure 6: Relation between the node degree d(v) and the in?uence degree, ( ? σa(v) and σb(v)) for the blog network for one particular run, randomly selected from the 10 independent trials. (a) Non-uniformity parame- ter c = 0.1 (b) Non-uniformity parame- ter c = 0.5 (c) Non-uniformity parame- ter c = 1.0 Figure 7: Relation between the node strength sv and the in?uence degree ( ? σa(v) and σb(v)) for the blog network for one particular run, randomly selected from the 10 independent trials. in?uence degree for the AwVM. The result is shown in Fig. 8, where the black solid line ? g(v) repre- sents the value of g(v) that is normalized in the same way as ? d(v) in Fig. 6 (nodes ranked according to g(v)). Again, we used the results that are obtained from the same trial for the blog network as in Fig. 4. Unfortunately, this result refutes the aforementioned hypothesis. It can approximate the in- ?uence degree well only for c = 0.1 as d(v) does, but not for a larger c. We examined the other three networks and found that the results strongly depend on the sparseness of the network. Wikipedia and Ameblo networks are much denser than Blog network. They showed better results for a larger c. We then randomly deleted links from these two networks and con?rmed that the results become worse as more links are deleted. The reason behind this needs further investigation. From these results, we can conclude that the opinion formation process for the AwVM becomes quite complex due to the introduction of the node strength, which is not the case for the basic voter model, and it seems that there is no simple heuristic measure that is able to estimate the in?uence degree in good accuracy for a wide range of networks. The strength-weight degree can be a good measure in some cases but not in all. 13 Yamagishi Saito Ohara Kimura Motoda (a) c = 0.1 (b) c = 0.5 (c) c = 1.0 Figure 8: Relation between the product of the node strength and the node degree (g(v) = svd(v)) and the in?uence degree ( ? σa and σb) for the blog network for one particular run, randomly selected from the 10 independent trials. 4.5. Discussion We discuss the relation of the AwVM to the existing two models mentioned earlier. Based on the basic voter model Even-Dar and Shapria (2007) investigated the in?uence maximization problem. They assumed that there need some initial costs to make each node accept an initial opinion. This cost is completely di?erent from the node strength argued in this paper in that the initial cost of each node does not a?ect its in?uence degree at all. In contrast, the node strength directly a?ects the in?uence degree of each node, as shown in our experiments. Clearly, it is straightforward to incorporate initial costs to the AwVM. The theoretical analysis for the VwVM revealed that the opinion with the highest value wins and all the others die (winner-take-all process) for a situation where the local opinion share can be approximated by the average opinion share over the whole network, (e.g., the case of a complete network) (Kimura et al., 2010b). There does not exist an e?ective formula of calculating the in?uence degree for the VwVM, which corresponds to Eq. (3) for the AwVM, and obtaining the in?uence degree for the VwVM is computationally expensive. Clearly, node strength and opinion value are di?erent concept, and it is possible to introduce the opinion values to the AwVM. However, how the node strength a?ects the winner-take-all process remains as an open question. In this paper, we assumed that the network is static. Dynamic social network is easily handled by the current method by updating the neighbor nodes right before time t, incorporating the newly added/deleted nodes and updating the neighbors opinion distribution nk(t, v) accordingly. The same algorithm runs without any other changes. 5. Conclusion Opinion formation over a social network was analyzed by modeling the cascade of interactions of neighboring nodes as probabilistic process of state changes. We modeled this process as a variant of the well known voter model with emphasis given on the strength of each node, called Attribute- weighted Voter Model with Multiple Opinions (AwVM). The strength re?ects the degree of direct in?uence of the node, and we addressed the problem of estimating this strength from the observed opinion formation data. As each node has its own strength, the number of variables we want to 14 Learning Attribute-weighted Voter Model over Social Networks estimate is as large as the number of nodes in the network, which requires a prohibitively large amount of training data. We avoided this problem by assuming a functional dependency of the node strength on the small number of selected node attributes, which we believe to re?ect the reality, and learn the parameter values that specify the functional dependency without a need for such a large amount of data. The task was formulated as the maximum likelihood estimation problem, and an e?cient parameter value update algorithm that guarantees the convergence was derived. We evaluated the performance of the learning algorithm on four real world networks assuming a particular class of attribute dependency, and con?rmed that the dependency can be correctly learned. We further showed that the in?uence degree of each node (expected number of the nodes at the end of opinion formation process that have the same opinion as the starting node considered) based on our AwVM is substantially di?erent from that obtained assuming a model with uniform strength, i.e. without the strength, and the sensitivity analysis indicated that the in?uence degree is more sensitive to the node strength than the node degree even for a moderate value of the node strength. The results, at ?rst glance, suggested to use the strength-weight degree as a rough measure of approximating in?uence degree, but quantitative evaluation refuted this hypothesis. Introduction of node strength, which is quite natural and re?ects the reality, brings complexity to the opinion formation process, and there appears to be no simple heuristic measure that can predict the in?uence degree in good accuracy for a wide range of networks. Acknowledgments This work was partly supported by Asian O?ce of Aerospace Research and Development, Air Force O?ce of Scienti?c Research under Grant No. AOARD-10-4053, and JSPS Grant-in-Aid for Scienti?c Research (C) (No. 23500194). References C. Castellano, M. A. Munoz, and R. Pastor-Satorras. Nonlinear q-voter model. Physical Review E, 80:041129, 2009. D. Crandall, D. Cosley, D. Huttenlocner, J. Kleinberg, and S. Suri. Feedback e?ects between simi- larity and sociai in?uence in online communities. In Proceedings of KDD 2008, pages 160–168, 2008. P. Domingos. Mining social networks for viral marketing. IEEE Intell. Syst., 20:80–82, 2005. E. Even-Dar and A. Shapria. A note on maximizing the spread of in?uence in social networks. In Proceedings of WINE 2007, pages 281–286, 2007. J. Goldenberg, B. Libai, and E. Muller. Talk of the network: A complex systems look at the underlying process of word-of-mouth. Marketing Letters, 12:211–223, 2001. D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins. Information di?usion through blogspace. SIGKDD Explorations, 6:43–52, 2004. P. Holme and M. E. J. Newman. Nonequilibrium phase transition in the coevolution of networks and opinions. Physical Review E, 74:056108, 2006. 15 Yamagishi Saito Ohara Kimura Motoda D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of in?uence through a social net- work. In KDD-2003, pages 137–146, 2003. M. Kimura, K. Saito, and H. Motoda. Minimizing the spread of contamination by blocking links in a network. In AAAI-08, pages 1175–1180, 2008. M. Kimura, K. Saito, and H. Motoda. Blocking links to minimize contamination spread in a social network. ACM Trans. Knowl. Discov. Data, 3:9:1–9:23, 2009. M. Kimura, K. Saito, and R. Nakano. Extracting in?uential nodes for information di?usion on a social network. In AAAI-07, pages 1371–1376, 2007. M. Kimura, K. Saito, R. Nakano, and H. Motoda. Extracting in?uential nodes on a social network for information di?usion. Data Min. and Knowl. Disc., 20:70–97, 2010a. M. Kimura, K. Saito, K. Ohara, and H. Motoda. Learning to predict opinion share in social net- works. In AAAI-10, pages 1364–1370, 2010b. J. Leskovec, L. A. Adamic, and B. A. Huberman. The dynamics of viral marketing. In EC'06, pages 228–237, 2006. T. M. Liggett. Stochastic interacting systems: contact, voter, and exclusion processes. Spriger, New York, 1999. M. E. J. Newman. The structure and function of complex networks. SIAM Rev., 45:167–256, 2003. M. E. J. Newman, S. Forrest, and J. Balthrop. Email networks and the spread of computer viruses. Phys. Rev. E, 66:035101, 2002. G. Palla, I. Der? enyi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435:814–818, 2005. K. Saito, M. Kimura, K. Ohara, and H. Motoda. Detecting changes in opinion value distribution for voter model. In Proceedings of International Conference on Social Computing, Behavioral- Cultural Modeling, & Prediction (SBP11), pages 89–96. Spriger, LNAI 6589, 2011. V. Sood and S. Redner. Voter model on heterogeneous graphs. Physical Review Letters, 94:178701, 2005. H. Tong, B. A. Prakash, C. Tsoourakakis, T. Eliassi-Rad, C. Faloutsos, and D. H. Chau. On the vulnerability of large graphs. In ICDM 2010, pages 1091–1096, 2010. D. J. Watts. A simple model of global cascades on random networks. PNAS, 99:5766–5771, 2002. D. J. Watts and P. S. Dodds. In?uence, networks, and public opinion formation. J. Cons. Res., 34: 441–458, 2007. H. Yang, Z.. Wu, C. Zhou, T. Zhou, and B. Wang. E?ects of social diversity on the emergence of global consensus in opinion dynamics. Physical Review E, 80:046108, 2009. 16