On Multi-Language Semantics: Semantic Models, Equational Logic, and Abstract Interpretation of Multi-Language Code

Buro, Samuele

Modern software development rarely takes place within a single programming language. Often, programmers appeal to cross-language interoperability. Benefits are two-fold: exploitation of novel features of one language within another, and cross-language code reuse. For instance, HTML, CSS, and JavaScript yield a form of interoperability, working in conjunction to render webpages. Some object oriented languages have interoperability via a virtual machine host (.NET CLI compliant languages in the Common Language Runtime, and JVM compliant languages in the Java Virtual Machine). A high-level language can interact with a lower level one (Apple's Swift and Objective-C). Whilst this approach enables developers to benefit from the strengths of each base language, it comes at the price of a lack of clarity of formal properties of the new multi-language, mainly semantic specifications. Developing such properties is a key focus of this thesis. Indeed, while there has been some research exploring the interoperability mechanisms, there is little development of theoretical foundations. In this thesis, we broaden the boundary functions-based approach à la Matthews and Findler to propose an algebraic framework that provides systematic and more general ways to define multi-languages, regardless of the inherent nature of the underlying languages. The aim of this strand of research is to overcome the lack of a formal model in which to design the combination of languages. Main contributions are an initial algebra semantics and a categorical semantics for multi-languages. We then give ways in which interoperability can be reasoned about using equations over the blended language. Formally, multi-language equational logic is defined, within which one may deduce valid equations starting from a collection of axioms that postulate properties of the combined language. Thus, we have the notion of a multi-language theory and part of the thesis is devoted to exploring the properties of these theories. This is accomplished by way of both universal algebra and category theory, giving us a very general and flexible semantics, and hence a wide collection of models. Classifying categories are constructed, and hence equational theories furnish each categorical model with an internal language. From this we establish soundness and completeness of the multi-language equational logic. As regards static analysis, the heterogeneity of the multi-language context opens up new and unexplored scenarios. In this thesis, we provide a general theory for the combination of abstract interpretations of existing languages in order to gain an abstract semantics of multi-language programs. As a part of this general theory, we show that formal properties of interest of multi-language abstractions (e.g., soundness and completeness) boil down to the features of the interoperability mechanism that binds the underlying languages together. We extend many of the standard concepts of abstract interpretation to the framework of multi-languages. Finally, a minor contribution of the thesis concerns language specification formalisms. We prove that longstanding syntactical transformations between context-free grammars and algebraic signatures give rise to adjoint equivalences that preserve the abstract syntax of the generated terms. Thus, we have methods to move from context-free languages to the algebraic signature formalisms employed in the thesis.