Probabilistic graphical models (PGMs) provide a way to represent variables (nodes) along with their conditional dependencies (edges) and therefore allow formalizing our knowledge of the interacting entities in the real world. Structure learning aims to discover the topology (and parameters) of a PGM that represents accurately a given dataset. Accuracy of representation can be measured by the likelihood that the PGM explains the observed data, which leads to maximum likelihood estimation (MLE). From an algorithmic point of view, one challenge faced by structure learning is that the number of possible structures is super-exponential in the number of variables. From a statistical perspective, it is important to find good regularizers in order to avoid over-fitting and to achieve better generalization performance. Regularizers aim to reduce the complexity of the PGM, which can be measured by its number of parameters. First, we present three regularizers for MLE of Gaussian Markov random fields (MRFs): local constancy for datasets where variables correspond to a measurement in a manifold (silhouettes, motion trajectories, 2D and 3D images); variable selection for finding few interacting nodes from datasets with thousands of variables; and multi-task learning for a more efficient use of data which is available for multiple related tasks. For these regularizers, we show bounds of the eigenvalues of the optimal solution, convergence of block coordinate descent optimization, and connections to the continuous quadratic knapsack problem and the quadratic trust-region problem. Second, we focus on learning sparse discrete MRFs through MLE. In this case, computing the objective function as well as its gradient is NP-hard. We study the convergence rate of stochastic optimization of exact NP-hard objectives, for which only biased estimates of the gradient are available. We provide a convergence-rate analysis of deterministic errors and extend our analysis to biased stochastic errors. Third, we show general results for PGMs that allow understanding MLE with regularizers on the differences of parameters (e.g. sparse structural changes, time-varying models), the generalization ability of PGMs, and the use of PGM parameters as features in classification, dimensionality reduction and clustering. To this end, we show that the log-likelihood of several PGMs is Lipschitz continuous with respect to the parameters, and derive bounds on the Kullback-Leibler divergence, expected log-likelihood and Bayes error rate. Finally, we formalize and study the problem of learning the structure of graphical games from strictly behavioral data. We propose MLE of a generative model defined by the Nash equilibria of the game. The formulation brings out the interplay between goodness-of-fit and model complexity: good models capture the equilibrium behavior represented in the data while controlling the true number of equilibria, including those potentially unobserved. We provide a generalization bound for MLE. We discuss several optimization algorithms including convex loss minimization, sigmoidal approximations and exhaustive search. We formally prove that games in our hypothesis space have a small true number of equilibria, with high probability; thus, convex loss minimization is sound. We present experimental results on a wide range of real-world datasets: walking video sequences, motion capture, cardiac MRI, brain fMRI, gene expression, stock prices, world weather and congressional voting.