Concurrency and static analysis

Lovato, Alberto

The thesis describes three important contributions developed during my doctoral course, all involving the use and the verification of concurrent Java code: Binary decision diagrams, or BDDs, are data structures for the representation of Boolean functions. These functions are of great importance in many fields. It turns out that BDDs are the state-of-the-art representation for Boolean functions, and indeed all real world applications use a BDD library to represent and manipulate Boolean functions. It can be desirable to perform Boolean operations from different threads at the same time. In order to do this, the BDD library in use must allow threads to access BDD data safely, avoiding race conditions. We developed a Java BDD library, that is fast in both single and multi-threaded applications, that we use in the Julia static program analyzer. We defined a sound static analysis that identifies if and where a Java bytecode program lets data flow from tainted user input (including servlet requests) into critical operations that might give rise to injections. Data flow is a prerequisite to injections, but the user of the analysis must later gage the actual risk of the flow. Namely, analysis approximations might lead to false alarms and proper input validation might make actual flows harmless. Our analysis works by translating Java bytecode into Boolean formulas that express all possible explicit flows of tainted data. The choice of Java bytecode simplifies the semantics and its abstraction (many high-level constructs must not be explicitly considered) and lets us analyze programs whose source code is not available, as is typically the case in industrial contexts that use software developed by third parties, such as banks. The standard approach to prevent data races is to follow a locking discipline while accessing shared data: always hold a given lock when accessing a given shared datum. It is all too easy for a programmer to violate the locking discipline. Therefore, tools are desirable for formally expressing the locking discipline and for verifying adherence to it. The book Java Concurrency in Practice (JCIP) proposed the @GuardedBy annotation to express a locking discipline. The original @GuardedBy annotation was designed for simple intra-class synchronization policy declaration. @GuardedBy fields and methods are supposed to be accessed only when holding the appropriate lock, referenced by another field, in the body of the class (or this). In simple cases, a quick visual inspection of the class code performed by the programmer is sufficient to verify the synchronization policy correctness. However, when we think deeper about the meaning of this annotation, and when we try to check and infer it, some ambiguities rise. Given these ambiguities of the specification for @GuardedBy, different tools interpret it in different ways. Moreover, it does not prevent data races, thus not satisfying its design goals. We provide a formal specification that satisfies its design goals and prevents data races. We have also implemented our specification in the Julia analyzer, that uses abstract interpretation to infer valid @GuardedBy annotations for unannotated programs. It is not the goal of this implementation to detect data races or give a guarantee that they do not exist. Julia determines what locking discipline a program uses, without judging whether the discipline is too strict or too lax for some particular purpose.