Formalizing the Core DOM in Isabelle/HOL
At its core, the Document Object Model (DOM) defines a tree-like data structure for representing documents in general and HTML documents in particular. It is the heart of any modern web browser. Formalizing the key concepts of the DOM is a prerequisite for the formal reasoning over client-side JavaScript programs and for the analysis of security concepts in modern web browsers.
As a first step towards a verified client-side web application stack, we model and formally verify the Document Object Model (DOM) in Isabelle/HOL. The Document Object Model (DOM) is the central data structure of all modern web browsers. At its core, the DOM defines a tree-like data structure for representing documents in general and HTML documents in particular. Thus, the correctness of a DOM implementation is crucial for ensuring that a web browser displays web pages correctly. Moreover, the DOM is the core data structure underlying client-side JavaScript programs, i.e, client-side JavaScript programs are mostly programs that read, write, and update the DOM.
We formalize the DOM as a shallow embedding in Isabelle/HOL using a typed data model for the node-tree. Furthermore, we formalize a typed heap for storing (partial) node-trees together with the necessary consistency constraints. Finally, we formalize the operations on this heap that allow manipulating node-trees.
For example, the HOL definitions of \(\text{adopt_node}\), i.e. the method that removes a node from its previous parent, if it had any, and assigns it to the new \(\text{ownerDocument}\), looks as follows: \[ \begin{array}{l} \color{blue}{\textbf{definition}}~\text{adopt_node}~::~\\ ~~~~\_~\text{document_ptr}_\text{Core_DOM}~\Rightarrow~\_~\text{node_ptr}_\text{Core_DOM}~\Rightarrow~\_~\text{dom_prog}\\ ~~\textbf{where}\\ ~~~~\text{adopt_node}~document~node~=~\text{do}~\{~\\ ~~~~~~parent\_opt~\leftarrow~\text{get_parent}~node;~\\ ~~~~~~(\text{case}~parent\_opt~\text{of}~\\ ~~~~~~~~\text{Some}~parent~\Rightarrow~\text{remove_child}~parent~node~\\ ~~~~~~|~\text{None}~\Rightarrow~\text{do}~\{~\\ ~~~~~~~~~~old\_document~\leftarrow~\text{get_owner_document}~(\text{cast}~node);~\\ ~~~~~~~~~~\text{remove_from_disconnected_nodes}~old\_document~node\\ ~~~~~~~~\});~\\ ~~~~~~\text{add_to_disconnected_nodes}~document~node~\\ ~~~~~\}~ \end{array} \] First, \(\text{adopt_node}\) tries to retrieve the parent of the node to be adopted. If the node has a parent node, it removes the node from the children list, otherwise it removes it from the list of disconnected nodes of the previous owner document. Finally, the node is now added to the disconnected nodes of the new document.
We can now formally prove important properties of \(\text{adopt_node}\) such as \[ \begin{array}{l} \color{blue}{\textbf{lemma}}~\text{adopt_node_children_remain_distinct}:\\ ~~\textbf{assumes}~\text{wellformed}:~\text{heap_is_wellformed}~h\\ ~~~~\textbf{and}~\text{parent_known}:~\And~parent.\\ ~~~~~~h~\vdash~\text{get_parent}~node\_ptr~\rightarrow_r~\text{Some}~parent\\ ~~~~~~\Longrightarrow~\text{is_known_ptr}_\text{Core_DOM}~parent\\ ~~~~\textbf{and}~\text{adopt_node}:~h~\vdash~\text{adopt_node}~\text{owner_document}~node_ptr~\rightarrow_h~h2\\ ~~~~\textbf{and}~\text{ptr_known}:~\text{is_known_ptr}_\text{Core_DOM}~ptr\\ ~~~~\textbf{and}~\text{children}:~h2~\vdash~\text{get_child_nodes}~ptr~\rightarrow_r~children\\ ~~\color{blue}{\textbf{shows}}~\text{distinct}~children\\ \end{array} \] This lemma states that after using \(\text{adopt_node}\), all children lists remain distinct.
Our machine-checked formalization of the DOM node tree has the following properties:
- It provides a consistency guarantee. Since all definitions in our formal semantics are conservative and all rules are derived, the logical consistency of the DOM node-tree is reduced to the consistency of HOL.
- It serves as a *technical basis for a proof system. Based on the derived rules and specific setup of proof tactics over node-trees, our formalization provides a generic proof environment for the verification of programs manipulating node-trees.
- It is executable*, which allows to validate its compliance to the standard by evaluating the compliance test suite on the formal model, and
- It is extensible, i.e, properties proven over the core DOM do not need to be re-proven for object-oriented extensions such as the HTML document model.
For more details, see our WWW paper [1]. The formalization is available in the Archive of Formal Proofs (AFP) [2].