Paper Review 5

By: Chengyi (Jeff) Chen

Article: End-to-End Influence Maximization in the Field


What is the main problem/task addressed by the paper?

Using the CHANGE framework to identify peer leaders to train and teach about HIV prevention to maximize the dispersion of knwoledge to homeless youths.


What was done before, and how does this paper improve on it?

While many algorithms have been proposed for influence maximization, none can be feasibly deployed by a service provider: existing algorithms require costly surveys of the entire social network of the youth to provide input data, and high performance computing resources to run the algorithm itself. Both requirements are crucial bottlenecks to widespread use of influence maximization in real world interventions.

However, current algorithms have a high barrier to entry: they require a great deal of time to gather the complete social network of the youth, expertise to select appropriate parameters, and computational power to run the algorithms. None of these are likely available to resource-strained service providers who will ultimately be the ones to deploy influence maximization.

  • Algorithms like HEALER and DOSIM though significantly outperformed status-quo heuristic algorithms unrealistically assume that entire social networks are available as input and are too computationally inefficient.

We validate that the data gathered is sufficiently accurate to enable influence maximization. Self-reported ties are subject to bias and forgetting [3], making it important to investigate whether they are accurate enough to find influential nodes. This point is of broader interest, since previous work on influence maximization has largely used self-reported ties [25, 27], but no previous field study has validated their accuracy for influence maximization.

  • Collected network data from 72 youth at a drop-in center by:

    1. CHANGE’s sampling mechanism (This is proven feasible)

    2. Self-reports from the entire network

    3. Field observations by research staff

    4. Interviews with staff members

  • Intervention study of the entire CHANGE agent with an additional set of 64 homeless youth

    1. Network data collection

    2. Peer leader selection

    3. HIV awareness trainings for the selected peer leaders

  • Analysis of the real network data to explain why CHANGE can succeed while gathering such a small portion of the network

    • Friendship Paradox: A typical node’s neighbors have more ties than the node itself (CHANGE’s success exploits this attribute to produce sampled networks which are substantially more informative for influence maximization than a comparable number of uniformly random samples)

  • The research first constructs a probabilistic graph which represented the social network structure of homeless youths.

  • Subsquently, the peer leaders are selected using the Adaptive Greedy algorithm proposed who would maximize the expected gain in influence spread.

  • Next, using the friendship paradox attribute characteristic to social networks of homeless youths, the network sampling algorithm used chooses \({\frac{M}{2}}\) to sample and survey, where \({M}\) is the number of youths.

  • The 3rd algorithm used attempts to resolve the fact that the propagation propbability \({p}\) is in fact not known in practice and is therefore unreliable in calculating the marginal gain when choosing the peer leaders that would maximize influence spread in the first greedy algorithm.


What is the one cool technique/idea/finding that was learned from this paper?

The coolest technique used in the research has got to be the 3rd algorithm used for Robust Parameter Selection. In order to address the fact that parameter \({p}\), or the probability of propagation is unknown. First, they discretized the intervals from \({[0 ... 1]}\), and subsequently, it finds the parameters \({i}\) and \({j}\) responsible for minimizing the ratio of Expected influence gained using \({p_i}\) on the adaptive greedy algorithm to the true influence gain using \({p_j}\), the true probability of propagation. This heuristic reduces the runtime of identifying the peer leaders much more than if the DOSIM algorithm was used.


What part of the paper was difficult to understand?

It was hard to understand why propagating influence with probability \({p}\) and drawing a propagation probability \({q}\) from a distribution with \({E[q] = p}\) and then propagate influence with probability \({q}\) are equivalent.


Brainstorming ideas: What generalization or extension of the paper could be done? Or what are other applications that can benefit from the described approach? What more could have been done in this application space that could leverage AI? Does this give you an idea about a possible project a student could do?

I could definitely see the application of the algorithms used in this research in areas such as mapping the bias during voting season that results from positive / negative tweets / media about election candidates. Using the algorithms proposed, it might be possible to reverse trace the greatest sources of media influence and taking actions to prevent these “peer leaders” in this context from explicitly stating anything that could immediately sway its audience to vote for one candidate over another, making the election process “fairer”. For example, if say 50% of all votes come from adults aged 25 to 50, and that the primary news / media / opinion outlets that these voters are exposed to are from channels such as Youtube, Twitter, … the impact of negative opinions on these platforms (which can be either synthetically or genuinely generated) can act as a much unwanted prior in getting them to make a clear decision about which person to vote for.