ProbInG: Distribution Recovery for Invariant Generation of Probabilistic Programs
Probabilistic programming is a new emerging paradigm adopted by high-tech giants, such as Google, Amazon and Uber, to simplify the development of AI/machine learning based applications, such as route planning and detecting cyber intrusions. Probabilistic programming languages include native constructs for sampling distributions allowing to freely mix deterministic and stochastic elements. The resulting flexible framework comes at the price of programs with behaviors hard to analyze, leading to unpredictable adverse consequences in safety-critical applications. One of the main challenges in the analysis of these programs is to compute invariant properties that summarize loop behaviors. Despite recent results, full automation of invariant generation is at its infancy and only targets expected values of the program variables, which is insufficient to recover the full probabilistic program behavior. Our project aims at developing novel and fully automated approaches to generate invariants over higher-order moments and the value distribution of program variables, without any user guidance. We will employ methods from symbolic summation, polynomial algebra and statistics and combine them with static analysis techniques. Our results will reduce the need of expert knowledge in ensuring the safety and security of computer systems and will cut the design costs of applications based on probabilistic programs, bringing crucial intellectual and economic benefits to our society.