Interactive online tutorial:

Parsing and Visiting

LibCST provides helpers to parse source code string as concrete syntax tree. In order to perform static analysis to identify patterns in the tree or modify the tree programmatically, we can use visitor pattern to traverse the tree. In this tutorial, we demonstrate a common four-step-workflow to build an automated refactoring (codemod) application:

Parse Source Code
Display The Source Code CST
Build Visitor or Transformer
Generate Source Code

Parse Source Code

LibCST provides various helpers to parse source code as concrete syntax tree: parse_module(), parse_expression() and parse_statement() (see Parsing for more detail).

[2]:

import libcst as cst

source_tree = cst.parse_expression("1 + 2")

Display Source Code CST

The default CSTNode repr provides pretty print formatting for displaying the entire CST tree.

[3]:

print(source_tree)

BinaryOperation(
    left=Integer(
        value='1',
        lpar=[],
        rpar=[],
    ),
    operator=Add(
        whitespace_before=SimpleWhitespace(
            value=' ',
        ),
        whitespace_after=SimpleWhitespace(
            value=' ',
        ),
    ),
    right=Integer(
        value='2',
        lpar=[],
        rpar=[],
    ),
    lpar=[],
    rpar=[],
)

The entire CST tree may be overwhelming at times. To only focus on essential elements of the CST tree, LibCST provides the ``dump`` helper.

[4]:

from libcst.display import dump

print(dump(source_tree))

BinaryOperation(
  left=Integer(
    value='1',
  ),
  operator=Add(),
  right=Integer(
    value='2',
  ),
)

Example: add typing annotation from pyi stub file to Python source

Python typing annotation was added in Python 3.5. Some Python applications add typing annotations in separate pyi stub files in order to support old Python versions. When applications decide to stop supporting old Python versions, they’ll want to automatically copy the type annotation from a pyi file to a source file. Here we demonstrate how to do that easliy using LibCST. The first step is to parse the pyi stub and source files as trees.

[5]:

py_source = '''
class PythonToken(Token):
    def __repr__(self):
        return ('TokenInfo(type=%s, string=%r, start_pos=%r, prefix=%r)' %
                self._replace(type=self.type.name))

def tokenize(code, version_info, start_pos=(1, 0)):
    """Generate tokens from a the source code (string)."""
    lines = split_lines(code, keepends=True)
    return tokenize_lines(lines, version_info, start_pos=start_pos)
'''

pyi_source = '''
class PythonToken(Token):
    def __repr__(self) -> str: ...

def tokenize(
    code: str, version_info: PythonVersionInfo, start_pos: Tuple[int, int] = (1, 0)
) -> Generator[PythonToken, None, None]: ...
'''

source_tree = cst.parse_module(py_source)
stub_tree = cst.parse_module(pyi_source)

Build Visitor or Transformer

For traversing and modifying the tree, LibCST provides Visitor and Transformer classes similar to the ast module. To implement a visitor (read only) or transformer (read/write), simply implement a subclass of CSTVisitor or CSTTransformer (see Visitors for more detail). In the typing example, we need to implement a visitor to collect typing annotation from the stub tree and a transformer to copy the annotation to the function signature. In the visitor, we implement visit_FunctionDef to collect annotations. Later in the transformer, we implement leave_FunctionDef to add the collected annotations.

[6]:

from typing import List, Tuple, Dict, Optional


class TypingCollector(cst.CSTVisitor):
    def __init__(self):
        # stack for storing the canonical name of the current function
        self.stack: List[Tuple[str, ...]] = []
        # store the annotations
        self.annotations: Dict[
            Tuple[str, ...],  # key: tuple of canonical class/function name
            Tuple[cst.Parameters, Optional[cst.Annotation]],  # value: (params, returns)
        ] = {}

    def visit_ClassDef(self, node: cst.ClassDef) -> Optional[bool]:
        self.stack.append(node.name.value)

    def leave_ClassDef(self, node: cst.ClassDef) -> None:
        self.stack.pop()

    def visit_FunctionDef(self, node: cst.FunctionDef) -> Optional[bool]:
        self.stack.append(node.name.value)
        self.annotations[tuple(self.stack)] = (node.params, node.returns)
        return (
            False
        )  # pyi files don't support inner functions, return False to stop the traversal.

    def leave_FunctionDef(self, node: cst.FunctionDef) -> None:
        self.stack.pop()


class TypingTransformer(cst.CSTTransformer):
    def __init__(self, annotations):
        # stack for storing the canonical name of the current function
        self.stack: List[Tuple[str, ...]] = []
        # store the annotations
        self.annotations: Dict[
            Tuple[str, ...],  # key: tuple of canonical class/function name
            Tuple[cst.Parameters, Optional[cst.Annotation]],  # value: (params, returns)
        ] = annotations

    def visit_ClassDef(self, node: cst.ClassDef) -> Optional[bool]:
        self.stack.append(node.name.value)

    def leave_ClassDef(
        self, original_node: cst.ClassDef, updated_node: cst.ClassDef
    ) -> cst.CSTNode:
        self.stack.pop()
        return updated_node

    def visit_FunctionDef(self, node: cst.FunctionDef) -> Optional[bool]:
        self.stack.append(node.name.value)
        return (
            False
        )  # pyi files don't support inner functions, return False to stop the traversal.

    def leave_FunctionDef(
        self, original_node: cst.FunctionDef, updated_node: cst.FunctionDef
    ) -> cst.CSTNode:
        key = tuple(self.stack)
        self.stack.pop()
        if key in self.annotations:
            annotations = self.annotations[key]
            return updated_node.with_changes(
                params=annotations[0], returns=annotations[1]
            )
        return updated_node


visitor = TypingCollector()
stub_tree.visit(visitor)
transformer = TypingTransformer(visitor.annotations)
modified_tree = source_tree.visit(transformer)

Generate Source Code

Generating the source code from a cst tree is as easy as accessing the code attribute on Module. After the code generation, we often use ufmt to reformate the code to keep a consistent coding style.

[7]:

print(modified_tree.code)


class PythonToken(Token):
    def __repr__(self) -> str:
        return ('TokenInfo(type=%s, string=%r, start_pos=%r, prefix=%r)' %
                self._replace(type=self.type.name))

def tokenize(code: str, version_info: PythonVersionInfo, start_pos: Tuple[int, int] = (1, 0)
) -> Generator[PythonToken, None, None]:
    """Generate tokens from a the source code (string)."""
    lines = split_lines(code, keepends=True)
    return tokenize_lines(lines, version_info, start_pos=start_pos)

[8]:

# Use difflib to show the changes to verify type annotations were added as expected.
import difflib

print(
    "".join(
        difflib.unified_diff(py_source.splitlines(1), modified_tree.code.splitlines(1))
    )
)

---
+++
@@ -1,10 +1,11 @@

 class PythonToken(Token):
-    def __repr__(self):
+    def __repr__(self) -> str:
         return ('TokenInfo(type=%s, string=%r, start_pos=%r, prefix=%r)' %
                 self._replace(type=self.type.name))

-def tokenize(code, version_info, start_pos=(1, 0)):
+def tokenize(code: str, version_info: PythonVersionInfo, start_pos: Tuple[int, int] = (1, 0)
+) -> Generator[PythonToken, None, None]:
     """Generate tokens from a the source code (string)."""
     lines = split_lines(code, keepends=True)
     return tokenize_lines(lines, version_info, start_pos=start_pos)

For the sake of efficiency, we don’t want to re-write the file when the transformer doesn’t change the source code. We can use deep_equals() to check whether two trees have the same source code. Note that == checks the identity of tree object instead of representation.

[9]:

if not modified_tree.deep_equals(source_tree):
    ...  # write to file