Working with Matchers
Matchers provide a flexible way of comparing LibCST nodes in order to build more complex transforms. See Matchers for the complete documentation.
Basic Matcher Usage
Let’s say you are visiting a LibCST Call
node and you want to know if all arguments provided are the literal True
or False
. You look at the documentation and see that Call.args
is a sequence of Arg
, and each Arg.value
is a BaseExpression
. In order to verify that each argument is either True
or False
you would have to first loop over node.args
, and then check isinstance(arg.value, cst.Name)
for each arg
in the loop before finally checking arg.value.value in ("True", "False")
.
Here’s a short example of that in action:
[2]:
import libcst as cst
def is_call_with_booleans(node: cst.Call) -> bool:
for arg in node.args:
if not isinstance(arg.value, cst.Name):
# This can't be the literal True/False, so bail early.
return False
if cst.ensure_type(arg.value, cst.Name).value not in ("True", "False"):
# This is a Name node, but not the literal True/False, so bail.
return False
# We got here, so all arguments are literal boolean values.
return True
We can see from a few examples that this does work as intended. However, it is an awful lot of boilerplate that was fairly cumbersome to write.
[3]:
call_1 = cst.Call(
func=cst.Name("foo"),
args=(
cst.Arg(cst.Name("True")),
),
)
is_call_with_booleans(call_1)
[3]:
True
[4]:
call_2 = cst.Call(
func=cst.Name("foo"),
args=(
cst.Arg(cst.Name("None")),
),
)
is_call_with_booleans(call_2)
[4]:
False
Let’s try to do a bit better with matchers. We can make a better function that takes advantage of matchers to get rid of both the instance check and the ensure_type
call, like so:
[5]:
import libcst.matchers as m
def better_is_call_with_booleans(node: cst.Call) -> bool:
for arg in node.args:
if not m.matches(arg.value, m.Name("True") | m.Name("False")):
# Oops, this isn't a True/False literal!
return False
# We got here, so all arguments are literal boolean values.
return True
This is a lot shorter and is easier to read as well! We made use of the fact that matchers handles instance checking for us in a safe way. We also made use of the fact that matchers allows us to concisely express multiple match options with the use of Python’s or operator. We can also see that it still works on our previous examples:
[6]:
better_is_call_with_booleans(call_1)
[6]:
True
[7]:
better_is_call_with_booleans(call_2)
[7]:
False
We still have one more trick up our sleeve though. Matchers don’t just allow us to specify which attributes we want to match on exactly. It also allows us to specify rules for matching sequences of nodes, like the list of Arg
nodes that appears in Call
. Let’s make use of that, turning our original is_call_with_booleans
function into a call to matches()
:
[8]:
def best_is_call_with_booleans(node: cst.Call) -> bool:
return m.matches(
node,
m.Call(
args=(
m.ZeroOrMore(m.Arg(m.Name("True") | m.Name("False"))),
),
),
)
We’ve turned our original function into a single call to matches()
. As an added benefit, the match node can be read from left to right in a way that makes sense in english: “match any call with zero or more arguments that are the literal True
or False
”. As we can see, it works as intended:
[9]:
best_is_call_with_booleans(call_1)
[9]:
True
[10]:
best_is_call_with_booleans(call_2)
[10]:
False
Matcher Decorators
You can already do a lot with just matches()
. It lets you define the shape of nodes you want to match and LibCST takes care of the rest. However, you still need to include a lot of boilerplate into your Visitors in order to identify which nodes you care about. Matcher Decorators help reduce that boilerplate.
Say you wanted to invert the boolean literals in functions which match the above best_is_call_with_booleans
. You could build something that looks like the following:
[11]:
class BoolInverter(cst.CSTTransformer):
def __init__(self) -> None:
self.in_call: int = 0
def visit_Call(self, node: cst.Call) -> None:
if m.matches(node, m.Call(args=(
m.ZeroOrMore(m.Arg(m.Name("True") | m.Name("False"))),
))):
self.in_call += 1
def leave_Call(self, original_node: cst.Call, updated_node: cst.Call) -> cst.Call:
if m.matches(original_node, m.Call(args=(
m.ZeroOrMore(m.Arg(m.Name("True") | m.Name("False"))),
))):
self.in_call -= 1
return updated_node
def leave_Name(self, original_node: cst.Name, updated_node: cst.Name) -> cst.Name:
if self.in_call > 0:
if updated_node.value == "True":
return updated_node.with_changes(value="False")
if updated_node.value == "False":
return updated_node.with_changes(value="True")
return updated_node
We can try it out on a source file to see that it works:
[12]:
source = "def some_func(*params: object) -> None:\n pass\n\nsome_func(True, False)\nsome_func(1, 2, 3)\nsome_func()\n"
module = cst.parse_module(source)
print(source)
def some_func(*params: object) -> None:
pass
some_func(True, False)
some_func(1, 2, 3)
some_func()
[13]:
new_module = module.visit(BoolInverter())
print(new_module.code)
def some_func(*params: object) -> None:
pass
some_func(False, True)
some_func(1, 2, 3)
some_func()
While this works its not super elegant. We have to track where we are in the tree so we know when its safe to invert boolean literals which means we have to create a constructor and we have to duplicate matching logic. We could refactor that into a helper like the best_is_call_with_booleans
above, but it only makes things so much better.
So, let’s try rewriting it with matcher decorators instead. Note that this includes changing the class we inherit from to MatcherDecoratableTransformer
in order to enable the matcher decorator feature:
[14]:
class BetterBoolInverter(m.MatcherDecoratableTransformer):
@m.call_if_inside(m.Call(args=(
m.ZeroOrMore(m.Arg(m.Name("True") | m.Name("False"))),
)))
def leave_Name(self, original_node: cst.Name, updated_node: cst.Name) -> cst.Name:
if updated_node.value == "True":
return updated_node.with_changes(value="False")
if updated_node.value == "False":
return updated_node.with_changes(value="True")
return updated_node
[15]:
new_module = module.visit(BetterBoolInverter())
print(new_module.code)
def some_func(*params: object) -> None:
pass
some_func(False, True)
some_func(1, 2, 3)
some_func()
Using matcher decorators we successfully removed all of the boilerplate around state tracking! The only thing that leave_Name
needs to concern itself with is the actual business logic of the transform. However, it still needs to check to see if the value of the node should be inverted. This is because the Call.func
is a Name
in this case. Let’s use another matcher decorator to make that problem go away:
[16]:
class BestBoolInverter(m.MatcherDecoratableTransformer):
@m.call_if_inside(m.Call(args=(
m.ZeroOrMore(m.Arg(m.Name("True") | m.Name("False"))),
)))
@m.leave(m.Name("True") | m.Name("False"))
def invert_bool_literal(self, original_node: cst.Name, updated_node: cst.Name) -> cst.Name:
return updated_node.with_changes(value="False" if updated_node.value == "True" else "True")
[17]:
new_module = module.visit(BestBoolInverter())
print(new_module.code)
def some_func(*params: object) -> None:
pass
some_func(False, True)
some_func(1, 2, 3)
some_func()
That’s it! Instead of using a leave_Name
which modifies all Name
nodes we instead created a matcher visitor that only modifies Name
nodes with the value of True
or False
. We decorate that with call_if_inside()
to ensure we run this on Name
nodes found inside of function calls that only take boolean literals. Using two matcher decorators we got rid of all of the state management as well as all of the cases where we needed to handle nodes we weren’t interested in.