Controlling the dynamics of a network is an interesting task in many fields, especially social networks (where links represent some kind of interaction between two humans). For instance, influencing the transmission of opinions may lead to maximising the efficiency of a marketing campaign, restricting interactions between humans may avoid an epidemic outbreak, etc. An important factor in network controllability is the structure of such network that is determined by how nodes and links are added and deleted over time. If the controllability of a network is determined by how the network grew, controlling its growth can lead to a more controllable network.

In the paper Action selection in growing state spaces: Control of Network Structure Growth (by Dominik Thalmeier, Vicenç Gómez, and Hilbert J. Kappen), a method for controlling the growth of a network in such a way that the resulting network structure meets a set of measures (characteristics) is proposed. For instance, in online conversation threads, the h-index of the network created by comments and responses is closely related to the quality of the overall discussion; therefore, the growth control should aim to influence the creation of nodes and links in such a way that the h-index has a particular value.

The goal is to find the control function that minimises the total cost of controlling the growth of a network; this means computing what is the least costly action to perform at a state and time . Traditionally, the optimal control is computed using a **greedy local optimisation through Bellman’s equation**: . The solution is computed recursively for all possible states which is an infeasible task in large networks with many nodes.

The authors therefore propose a different approach: approximating the original problem as a Kullback–Leibler control problem, obtaining an **estimation of the optimal cost-to-go** (expected cumulative cost of starting at state and time and acting optimally until a time horizon is reached, parametrized by the temperature ) **using sampling** and using this value to choose the best action to perform at a state and time .

In the aforementioned paper, this method is illustraded in the context of Slashdot conversation threads. The goal: highlighting comments so that users are more prone to respond to them, **indirectly optimizing the structure of the conversation**. To conduct experiments for evaluating the framework, the authors use a realistic model of the uncontrolled dynamics based on the popularity, , novelty, , and root bias (only nonzero for the root), , of the posts.