Best Practices

While there are plenty of ways to interact with LibCST, we recommend some patterns over others. Various best practices are laid out here along with their justifications.

Avoid isinstance when traversing

Excessive use of isinstance implies that you should rewrite your check as a matcher or unroll it into a set of visitor methods. Often, you should make use of ensure_type() to make your type checker aware of a node’s type.

Often it is far easier to use Matchers over explicit instance checks in a transform. Matching against some pattern and then extracting a value from a node’s child is often easier and far more readable. Unfortunately this clashes with various type-checkers which do not understand that matches() guarantees a particular set of children. Instead of instance checks, you should use ensure_type() which can be inlined and nested.

For example, if you have written the following:

def get_identifier_name(node: cst.CSTNode) -> Optional[str]:
    if m.matches(node, m.Name()):
        assert isinstance(node, cst.Name)
        return node.value
    return None

You could instead write something like:

def get_identifier_name(node: cst.CSTNode) -> Optional[str]:
    return (
        cst.ensure_type(node, cst.Name).value
        if m.matches(node, m.Name())
        else None
    )

If you find yourself attempting to manually traverse a tree using isinstance, you can often rewrite your code using visitor methods instead. Nested instance checks can often be unrolled into visitors methods along with matcher decorators. This may entail adding additional state to your visitor, but the resulting code is far more likely to work after changes to LibCST itself. For example, if you have written the following:

class CountBazFoobarArgs(cst.CSTVisitor):
    """
    Given a set of function names, count how many arguments to those function
    calls are the identifiers "baz" or "foobar".
    """

    def __init__(self, functions: Set[str]) -> None:
        super().__init__()
        self.functions: Set[str] = functions
        self.arg_count: int = 0

    def visit_Call(self, node: cst.Call) -> None:
        # See if the call itself is one of our functions we care about
        if isinstance(node.func, cst.Name) and node.func.value in self.functions:
            # Loop through each argument
            for arg in node.args:
                # See if the argument is an identifier matching what we want to count
                if isinstance(arg.value, cst.Name) and arg.value.value in {"baz", "foobar"}:
                    self.arg_count += 1

You could instead write something like:

class CountBazFoobarArgs(m.MatcherDecoratableVisitor):
    """
    Given a set of function names, count how many arguments to those function
    calls are the identifiers "baz" or "foobar".
    """

    def __init__(self, functions: Set[str]) -> None:
        super().__init__()
        self.functions: Set[str] = functions
        self.arg_count: int = 0
        self.call_stack: List[str] = []

    def visit_Call(self, node: cst.Call) -> None:
        # Store all calls in a stack
        if m.matches(node.func, m.Name()):
            self.call_stack.append(cst.ensure_type(node.func, cst.Name).value)

    def leave_Call(self, original_node: cst.Call) -> None:
        # Pop the latest call off the stack
        if m.matches(node.func, m.Name()):
            self.call_stack.pop()

    @m.visit(m.Arg(m.Name("baz") | m.Name("foobar")))
    def _count_args(self, node: cst.Arg) -> None:
        # See if the most shallow call is one we're interested in, so we can
        # count the args we care about only in calls we care about.
        if self.call_stack[-1] in self.functions:
            self.arg_count += 1

While there is more code than the previous example, it is arguably easier to understand and maintain each part of the code. It is also immune to any future changes to LibCST which change’s the tree shape. Note that LibCST is traversing the tree completely in both cases, so while the first appears to be faster, it is actually doing the same amount of work as the second.

Prefer updated_node when modifying trees

When you are using CSTTransformer to modify a LibCST tree, only return modifications to updated_node. The original_node parameter on any leave_<Node> method is provided for book-keeping and is guaranteed to be equal via == and is checks to the node parameter in the corresponding visit_<Node> method. Remember that LibCST trees are immutable, so the only way to make a modification is to return a new tree. Hence, by the time we get to calling leave_<Node> methods, we have an updated tree whose children have been modified. Therefore, you should only return original_node when you want to explicitly discard changes performed on the node’s children.

Say you wanted to rename all function calls which were calling global functions. So, you might write the following:

class FunctionRenamer(cst.CSTTransformer):
    def leave_Call(self, original_node: cst.Call, updated_node: cst.Call) -> cst.Call:
        if m.matches(original_node.func, m.Name()):
            return original_node.with_changes(
                func=cst.Name(
                    "renamed_" + cst.ensure_type(original_node.func, cst.Name).value
                )
            )
        return original_node

Consider writing instead:

class FunctionRenamer(cst.CSTTransformer):
    def leave_Call(self, original_node: cst.Call, updated_node: cst.Call) -> cst.Call:
        if m.matches(updated_node.func, m.Name()):
            return updated_node.with_changes(
                func=cst.Name(
                    "renamed_" + cst.ensure_type(updated_node.func, cst.Name).value
                )
            )
        return updated_node

The version that returns modifications to original_node has a subtle bug. Consider the following code snippet:

some_func(1, 2, other_func(3))

Running the recommended transform will return us a new code snippet that looks like this:

renamed_some_func(1, 2, renamed_other_func(3))

However, running the version which modifies original_node will instead return:

renamed_some_func(1, 2, other_func(3))

That’s because the updated_node tree contains the modification to other_func. By returning modifications to original_node instead of updated_node, we accidentally discarded all the work done deeper in the tree.

Provide a config when generating code from templates

When generating complex trees it is often far easier to pass a string to parse_statement() or parse_expression() than it is to manually construct the tree. When using these functions to generate code, you should always use the config parameter in order to generate code that matches the defaults of the module you are modifying. The Module class even has a helper attribute config_for_parsing to make it easy to use. This ensures that line endings and indentation are consistent with the defaults in the module you are adding the code to.

For example, to add a print statement to the end of a module:

module = cst.parse_module(some_code_string)
new_module = module.with_changes(
    body=(
        *module.body,
        cst.parse_statement(
            "print('Hello, world!')",
            config=module.config_for_parsing,
        ),
    ),
)
new_code_string = new_module.code

Leaving out the config parameter means that LibCST will assume some defaults and could result in added code which is formatted differently than the rest of the module it was added to. In the above example, because we used the config from the already-parsed example, the print statement will be added with line endings matching the rest of the module. If we neglect the config parameter, we might accidentally insert a windows line ending into a unix file or vice versa, depending on what system we ran the code under.