Interactive online tutorial: Notebook

Working with Matchers

Matchers provide a flexible way of comparing LibCST nodes in order to build more complex transforms. See Matchers for the complete documentation.

Basic Matcher Usage

Let’s say you are visiting a LibCST Call node and you want to know if all arguments provided are the literal True or False. You look at the documentation and see that Call.args is a sequence of Arg, and each Arg.value is a BaseExpression. In order to verify that each argument is either True or False you would have to first loop over node.args, and then check isinstance(arg.value, cst.Name) for each arg in the loop before finally checking arg.value.value in ("True", "False").

Here’s a short example of that in action:

[2]:
import libcst as cst

def is_call_with_booleans(node: cst.Call) -> bool:
    for arg in node.args:
        if not isinstance(arg.value, cst.Name):
            # This can't be the literal True/False, so bail early.
            return False
        if cst.ensure_type(arg.value, cst.Name).value not in ("True", "False"):
            # This is a Name node, but not the literal True/False, so bail.
            return False
    # We got here, so all arguments are literal boolean values.
    return True

We can see from a few examples that this does work as intended. However, it is an awful lot of boilerplate that was fairly cumbersome to write.

[3]:
call_1 = cst.Call(
    func=cst.Name("foo"),
    args=(
        cst.Arg(cst.Name("True")),
    ),
)
is_call_with_booleans(call_1)

[3]:
True
[4]:
call_2 = cst.Call(
    func=cst.Name("foo"),
    args=(
        cst.Arg(cst.Name("None")),
    ),
)
is_call_with_booleans(call_2)

[4]:
False

Let’s try to do a bit better with matchers. We can make a better function that takes advantage of matchers to get rid of both the instance check and the ensure_type call, like so:

[5]:
import libcst.matchers as m

def better_is_call_with_booleans(node: cst.Call) -> bool:
    for arg in node.args:
        if not m.matches(arg.value, m.Name("True") | m.Name("False")):
            # Oops, this isn't a True/False literal!
            return False
    # We got here, so all arguments are literal boolean values.
    return True

This is a lot shorter and is easier to read as well! We made use of the fact that matchers handles instance checking for us in a safe way. We also made use of the fact that matchers allows us to concisely express multiple match options with the use of Python’s or operator. We can also see that it still works on our previous examples:

[6]:
better_is_call_with_booleans(call_1)

[6]:
True
[7]:
better_is_call_with_booleans(call_2)

[7]:
False

We still have one more trick up our sleeve though. Matchers don’t just allow us to specify which attributes we want to match on exactly. It also allows us to specify rules for matching sequences of nodes, like the list of Arg nodes that appears in Call. Let’s make use of that, turning our original is_call_with_booleans function into a call to matches():

[8]:
def best_is_call_with_booleans(node: cst.Call) -> bool:
    return m.matches(
        node,
        m.Call(
            args=(
                m.ZeroOrMore(m.Arg(m.Name("True") | m.Name("False"))),
            ),
        ),
    )

We’ve turned our original function into a single call to matches(). As an added benefit, the match node can be read from left to right in a way that makes sense in english: “match any call with zero or more arguments that are the literal True or False”. As we can see, it works as intended:

[9]:
best_is_call_with_booleans(call_1)

[9]:
True
[10]:
best_is_call_with_booleans(call_2)

[10]:
False

Matcher Decorators

You can already do a lot with just matches(). It lets you define the shape of nodes you want to match and LibCST takes care of the rest. However, you still need to include a lot of boilerplate into your Visitors in order to identify which nodes you care about. Matcher Decorators help reduce that boilerplate.

Say you wanted to invert the boolean literals in functions which match the above best_is_call_with_booleans. You could build something that looks like the following:

[11]:
class BoolInverter(cst.CSTTransformer):
    def __init__(self) -> None:
        self.in_call: int = 0

    def visit_Call(self, node: cst.Call) -> None:
        if m.matches(node, m.Call(args=(
            m.ZeroOrMore(m.Arg(m.Name("True") | m.Name("False"))),
        ))):
            self.in_call += 1

    def leave_Call(self, original_node: cst.Call, updated_node: cst.Call) -> cst.Call:
        if m.matches(original_node, m.Call(args=(
            m.ZeroOrMore(m.Arg(m.Name("True") | m.Name("False"))),
        ))):
            self.in_call -= 1
        return updated_node

    def leave_Name(self, original_node: cst.Name, updated_node: cst.Name) -> cst.Name:
        if self.in_call > 0:
            if updated_node.value == "True":
                return updated_node.with_changes(value="False")
            if updated_node.value == "False":
                return updated_node.with_changes(value="True")
        return updated_node

We can try it out on a source file to see that it works:

[12]:
source = "def some_func(*params: object) -> None:\n    pass\n\nsome_func(True, False)\nsome_func(1, 2, 3)\nsome_func()\n"
module = cst.parse_module(source)
print(source)

def some_func(*params: object) -> None:
    pass

some_func(True, False)
some_func(1, 2, 3)
some_func()

[13]:
new_module = module.visit(BoolInverter())
print(new_module.code)

def some_func(*params: object) -> None:
    pass

some_func(False, True)
some_func(1, 2, 3)
some_func()

While this works its not super elegant. We have to track where we are in the tree so we know when its safe to invert boolean literals which means we have to create a constructor and we have to duplicate matching logic. We could refactor that into a helper like the best_is_call_with_booleans above, but it only makes things so much better.

So, let’s try rewriting it with matcher decorators instead. Note that this includes changing the class we inherit from to MatcherDecoratableTransformer in order to enable the matcher decorator feature:

[14]:
class BetterBoolInverter(m.MatcherDecoratableTransformer):
    @m.call_if_inside(m.Call(args=(
        m.ZeroOrMore(m.Arg(m.Name("True") | m.Name("False"))),
    )))
    def leave_Name(self, original_node: cst.Name, updated_node: cst.Name) -> cst.Name:
        if updated_node.value == "True":
            return updated_node.with_changes(value="False")
        if updated_node.value == "False":
            return updated_node.with_changes(value="True")
        return updated_node

[15]:
new_module = module.visit(BetterBoolInverter())
print(new_module.code)

def some_func(*params: object) -> None:
    pass

some_func(False, True)
some_func(1, 2, 3)
some_func()

Using matcher decorators we successfully removed all of the boilerplate around state tracking! The only thing that leave_Name needs to concern itself with is the actual business logic of the transform. However, it still needs to check to see if the value of the node should be inverted. This is because the Call.func is a Name in this case. Let’s use another matcher decorator to make that problem go away:

[16]:
class BestBoolInverter(m.MatcherDecoratableTransformer):
    @m.call_if_inside(m.Call(args=(
        m.ZeroOrMore(m.Arg(m.Name("True") | m.Name("False"))),
    )))
    @m.leave(m.Name("True") | m.Name("False"))
    def invert_bool_literal(self, original_node: cst.Name, updated_node: cst.Name) -> cst.Name:
        return updated_node.with_changes(value="False" if updated_node.value == "True" else "True")

[17]:
new_module = module.visit(BestBoolInverter())
print(new_module.code)

def some_func(*params: object) -> None:
    pass

some_func(False, True)
some_func(1, 2, 3)
some_func()

That’s it! Instead of using a leave_Name which modifies all Name nodes we instead created a matcher visitor that only modifies Name nodes with the value of True or False. We decorate that with call_if_inside() to ensure we run this on Name nodes found inside of function calls that only take boolean literals. Using two matcher decorators we got rid of all of the state management as well as all of the cases where we needed to handle nodes we weren’t interested in.