We study a dynamic model of collective action in which agents interact and learn through a co-evolving social network. We consider two alternative scenarios that differ on how agents form their expectations: while in a "benchmark" agents are assumed completely informed of the prevailing state, in the other context agents shape their expectations through a combination of local observation and social learning a?la DeGroot. We completely characterize the long-run behavior of the system in both cases and show that only in the latter scenario (arguably the most realistic) there is a significant long-run probability of successful collective action within a meaningful time scale. This, we argue, sheds light on the puzzle of how large populations can "achieve" collective action. Finally, we illustrate the empirical potential of the model by showing that it can be efficiently estimated for the so-called Egyptian Arab Spring using large-scale cross-sectional data from Twitter.