Metadata

Metadata APIs

LibCST ships with a metadata interface that defines a standardized way to associate nodes in a CST with arbitrary metadata while maintaining the immutability of the tree. The metadata interface is designed to be declarative and type safe. Here’s a quick example of using the metadata interface to get line and column numbers of nodes through the PositionProvider:

class NamePrinter(cst.CSTVisitor):
    METADATA_DEPENDENCIES = (cst.PositionProvider,)

    def visit_Name(self, node: cst.Name) -> None:
        pos = self.get_metadata(cst.PositionProvider, node).start
        print(f"{node.value} found at line {pos.line}, column {pos.column}")

wrapper = cst.metadata.MetadataWrapper(cst.parse_module("x = 1"))
result = wrapper.visit(NamePrinter())  # should print "x found at line 1, column 0"

More examples of using the metadata interface can be found on the Metadata Tutorial.

Accessing Metadata

To work with metadata you need to wrap a module with a MetadataWrapper. The wrapper provides a resolve() function and a resolve_many() function to generate metadata.

class libcst.metadata.MetadataWrapper[source]

A wrapper around a Module that stores associated metadata for that module.

When a MetadataWrapper is constructed over a module, the wrapper will store a deep copy of the original module. This means MetadataWrapper(module).module == module is False.

This copying operation ensures that a node will never appear twice (by identity) in the same tree. This allows us to uniquely look up metadata for a node based on a node’s identity.

__init__(module: Module, unsafe_skip_copy: bool = False, cache: Mapping[ProviderT, object] = {}) → None[source]
Parameters
  • module – The module to wrap. This is deeply copied by default.

  • unsafe_skip_copy – When true, this skips the deep cloning of the module. This can provide a small performance benefit, but you should only use this if you know that there are no duplicate nodes in your tree (e.g. this module came from the parser).

  • cache – Pass the needed cache to wrapper to be used when resolving metadata.

property module

The module that’s wrapped by this MetadataWrapper. By default, this is a deep copy of the passed in module.

mw = ModuleWrapper(module)
# Because `mw.module is not module`, you probably want to do visit and do
# your analysis on `mw.module`, not `module`.
mw.module.visit(DoSomeAnalysisVisitor)
resolve(provider: Type[BaseMetadataProvider[_T]]) → Mapping[CSTNode, _T][source]

Returns a copy of the metadata mapping computed by provider.

resolve_many(providers: Collection[ProviderT]) → Mapping[ProviderT, Mapping[CSTNode, object]][source]

Returns a copy of the map of metadata mapping computed by each provider in providers.

The returned map does not contain any metadata from undeclared metadata dependencies that providers has.

visit(visitor: CSTVisitorT) → Module[source]

Convenience method to resolve metadata before performing a traversal over self.module with visitor. See visit().

visit_batched(visitors: Iterable[libcst._batched_visitor.BatchableCSTVisitor], before_visit: Optional[Callable[[CSTNode], None]] = None, after_leave: Optional[Callable[[CSTNode], None]] = None) → CSTNode[source]

Convenience method to resolve metadata before performing a traversal over self.module with visitors. See visit_batched().

If you’re working with visitors, which extend MetadataDependent, metadata dependencies will be automatically computed when visited by a MetadataWrapper and are accessible through get_metadata()

class libcst.MetadataDependent[source]

The low-level base class for all classes that declare required metadata dependencies. CSTVisitor and CSTTransformer extend this class.

METADATA_DEPENDENCIES = ()

The set of metadata dependencies declared by this class.

metadata

A cached copy of metadata computed by resolve(). Prefer using get_metadata() over accessing this attribute directly.

classmethod get_inherited_dependencies() → Collection[ProviderT][source]

Returns all metadata dependencies declared by classes in the MRO of cls that subclass this class.

Recursively searches the MRO of the subclass for metadata dependencies.

resolve(wrapper: MetadataWrapper) → Iterator[None][source]

Context manager that resolves all metadata dependencies declared by self (using get_inherited_dependencies()) on wrapper and caches it on self for use with get_metadata().

Upon exiting this context manager, the metadata cache on self is cleared.

get_metadata(key: Type[BaseMetadataProvider[_T]], node: CSTNode, default: _T = <object object>) → _T[source]

Returns the metadata provided by the key if it is accessible from this visitor. Metadata is accessible in a subclass of this class if key is declared as a dependency by any class in the MRO of this class.

Providing Metadata

Metadata is generated through provider classes that can be be passed to MetadataWrapper.resolve() or declared as a dependency of a MetadataDependent. These providers are then resolved automatically using methods provided by MetadataWrapper.

In most cases, you should extend BatchableMetadataProvider when writing a provider, unless you have a particular reason to not to use a batchable visitor. Only extend from BaseMetadataProvider if your provider does not use the visitor pattern for computing metadata for a tree.

class libcst.BaseMetadataProvider[source]

The low-level base class for all metadata providers. This class should be extended for metadata providers that are not visitor-based.

This class is generic. A subclass of BaseMetadataProvider[T] will provider metadata of type T.

gen_cache

Implement gen_cache to indicate the matadata provider depends on cache from external system. This function will be called by FullRepoManager to compute required cache object per file path.

set_metadata(node: CSTNode, value: _ProvidedMetadataT) → None[source]

Record a metadata value value for node.

get_metadata(key: Type[BaseMetadataProvider[_MetadataT]], node: CSTNode, default: _T = <object object>) → _T[source]

The same method as get_metadata() except metadata is accessed from self._computed in addition to self.metadata. See get_metadata().

class libcst.metadata.BatchableMetadataProvider[source]

The low-level base class for all batchable visitor-based metadata providers. Batchable providers should be preferred when possible as they are more efficient to run compared to non-batchable visitor-based providers. Inherits from BatchableCSTVisitor.

This class is generic. A subclass of BatchableMetadataProvider[T] will provider metadata of type T.

class libcst.metadata.VisitorMetadataProvider[source]

The low-level base class for all non-batchable visitor-based metadata providers. Inherits from CSTVisitor.

This class is generic. A subclass of VisitorMetadataProvider[T] will provider metadata of type T.

Metadata Providers

PositionProvider, ByteSpanPositionProvider, WhitespaceInclusivePositionProvider, ExpressionContextProvider, ScopeProvider, QualifiedNameProvider, ParentNodeProvider, and TypeInferenceProvider are currently provided. Each metadata provider may has its own custom data structure.

Position Metadata

There are two types of position metadata available. They both track the same position concept, but differ in terms of representation. One represents position with line and column numbers, while the other outputs byte offset and length pairs.

Line and column numbers are available through the metadata interface by declaring one of PositionProvider or WhitespaceInclusivePositionProvider. For most cases, PositionProvider is what you probably want.

Node positions are is represented with CodeRange objects. See the above example.

class libcst.metadata.PositionProvider[source]

Generates line and column metadata.

These positions are defined by the start and ending bounds of a node ignoring most instances of leading and trailing whitespace when it is not syntactically significant.

The positions provided by this provider should eventually match the positions used by Pyre for equivalent nodes.

class libcst.metadata.WhitespaceInclusivePositionProvider[source]

Generates line and column metadata.

The start and ending bounds of the positions produced by this provider include all whitespace owned by the node.

class libcst.metadata.CodeRange[source]
start : libcst._position.CodePosition

Starting position of a node (inclusive).

end : libcst._position.CodePosition

Ending position of a node (exclusive).

class libcst.metadata.CodePosition[source]
line : int

Line numbers are 1-indexed.

column : int

Column numbers are 0-indexed.

Byte offset and length pairs can be accessed using ByteSpanPositionProvider. This provider represents positions using CodeSpan, which will contain the byte offsets of a CSTNode from the start of the file, and its length (also in bytes).

class libcst.metadata.ByteSpanPositionProvider[source]

Generates offset and length metadata for nodes’ positions.

For each CSTNode this provider generates a CodeSpan that contains the byte-offset of the node from the start of the file, and its length (also in bytes). The whitespace owned by the node is not included in this length.

Note: offset and length measure bytes, not characters (which is significant for example in the case of Unicode characters encoded in more than one byte)

class libcst.metadata.CodeSpan[source]

Represents the position of a piece of code by its starting position and length.

Note: This class does not specify the unit of distance - it can be bytes, Unicode characters, or something else entirely.

start

Offset of the code from the beginning of the file. Can be 0.

length

Length of the span

Expression Context Metadata

class libcst.metadata.ExpressionContextProvider[source]

Provides ExpressionContext metadata (mimics the expr_context in ast) for the following node types: Attribute, Subscript, StarredElement , List, Tuple and Name. Not that a Name may not always has context because of the differences between ast and LibCST. E.g. attr is a Name in LibCST but a str in ast. To honor ast implementation, we don’t assignment context to attr.

Three context types ExpressionContext.STORE, ExpressionContext.LOAD and ExpressionContext.DEL are provided.

class libcst.metadata.ExpressionContext[source]

Used in ExpressionContextProvider to represent context of a variable reference.

LOAD = 1

Load the value of a variable reference.

>>> libcst.MetadataWrapper(libcst.parse_module("a")).resolve(libcst.ExpressionContextProvider)
mappingproxy({Name(
                  value='a',
                  lpar=[],
                  rpar=[],
              ): <ExpressionContext.LOAD: 1>})
STORE = 2

Store a value to a variable reference by Assign (=), AugAssign (e.g. +=, -=, etc), or AnnAssign.

>>> libcst.MetadataWrapper(libcst.parse_module("a = b")).resolve(libcst.ExpressionContextProvider)
mappingproxy({Name(
              value='a',
              lpar=[],
              rpar=[],
          ): <ExpressionContext.STORE: 2>, Name(
              value='b',
              lpar=[],
              rpar=[],
          ): <ExpressionContext.LOAD: 1>})
DEL = 3

Delete value of a variable reference by del.

>>> libcst.MetadataWrapper(libcst.parse_module("del a")).resolve(libcst.ExpressionContextProvider)
mappingproxy({Name(
                  value='a',
                  lpar=[],
                  rpar=[],
              ): < ExpressionContext.DEL: 3 >})

Scope Metadata

Scopes contain and separate variables from each other. Scopes enforce that a local variable name bound inside of a function is not available outside of that function.

While many programming languages are “block-scoped”, Python is function-scoped. New scopes are created for classes, functions, and comprehensions. Other block constructs like conditional statements, loops, and try…except don’t create their own scope.

There are five different type of scope in Python: BuiltinScope, GlobalScope, ClassScope, FunctionScope, and ComprehensionScope.

Diagram showing how the above 5 scopes are nested in each other

LibCST allows you to inspect these scopes to see what local variables are assigned or accessed within.

Note

Import statements bring new symbols into scope that are declared in other files. As such, they are represented by Assignment for scope analysis purposes. Dotted imports (e.g. import a.b.c) generate multiple Assignment objects — one for each module. When analyzing references, only the most specific access is recorded.

For example, the above import a.b.c statement generates three Assignment objects: one for a, one for a.b, and one for a.b.c. A reference for a.b.c records an access only for the last assignment, while a reference for a.d only records an access for the Assignment representing a.

class libcst.metadata.ScopeProvider[source]

ScopeProvider traverses the entire module and creates the scope inheritance structure. It provides the scope of name assignment and accesses. It is useful for more advanced static analysis. E.g. given a FunctionDef node, we can check the type of its Scope to figure out whether it is a class method (ClassScope) or a regular function (GlobalScope).

Scope metadata is available for most node types other than formatting information nodes (whitespace, parentheses, etc.).

class libcst.metadata.BaseAssignment[source]

Abstract base class of Assignment and BuitinAssignment.

name

The name of assignment.

scope

The scope associates to assignment.

property references

Return all accesses of the assignment.

class libcst.metadata.Access[source]

An Access records an access of an assignment.

Note

This scope analysis only analyzes access via a Name or a Name node embedded in other node like Call or Attribute. It doesn’t support type annontation using SimpleString literal for forward references. E.g. in this example, the "Tree" isn’t parsed as an access:

class Tree:
    def __new__(cls) -> "Tree":
        ...
node : typing.Union[libcst._nodes.expression.Name, libcst._nodes.expression.Attribute, libcst._nodes.expression.BaseString]

The node of the access. A name is an access when the expression context is ExpressionContext.LOAD. This is usually the name node representing the access, except for: 1) dotted imports, when it might be the attribute that represents the most specific part of the imported symbol; and 2) string annotations, when it is the entire string literal

scope : 'Scope'

The scope of the access. Note that a access could be in a child scope of its assignment.

is_annotation : bool
is_type_hint : bool
property referents

Return all assignments of the access.

record_assignment(assignment: libcst.metadata.scope_provider.BaseAssignment) → None[source]
record_assignments(name: str) → None[source]
class libcst.metadata.Assignment[source]

An assignment records the name, CSTNode and its accesses.

node

The node of assignment, it could be a Import, ImportFrom, Name, FunctionDef, or ClassDef.

class libcst.metadata.BuiltinAssignment[source]

A BuiltinAssignment represents an value provide by Python as a builtin, including functions, constants, and types.

class libcst.metadata.Scope[source]

Base class of all scope classes. Scope object stores assignments from imports, variable assignments, function definition or class definition. A scope has a parent scope which represents the inheritance relationship. That means an assignment in parent scope is viewable to the child scope and the child scope may overwrites the assignment by using the same name.

Use name in scope to check whether a name is viewable in the scope. Use scope[name] to retrieve all viewable assignments in the scope.

Note

This scope analysis module only analyzes local variable names and it doesn’t handle attribute names; for example, given a.b.c = 1, local variable name a is recorded as an assignment instead of c or a.b.c. To analyze the assignment/access of arbitrary object attributes, we leave the job to type inference metadata provider coming in the future.

parent

Parent scope. Note the parent scope of a GlobalScope is itself.

globals

Refers to the GlobalScope.

abstract __contains__(name: str) → bool[source]

Check if the name str exist in current scope by name in scope.

abstract __getitem__(name: str) → Set[libcst.metadata.scope_provider.BaseAssignment][source]

Get assignments given a name str by scope[name].

Note

Why does it return a list of assignments given a name instead of just one assignment?

Many programming languages differentiate variable declaration and assignment. Further, those programming languages often disallow duplicate declarations within the same scope, and will often hoist the declaration (without its assignment) to the top of the scope. These design decisions make static analysis much easier, because it’s possible to match a name against its single declaration for a given scope.

As an example, the following code would be valid in JavaScript:

function fn() {
  console.log(value);  // value is defined here, because the declaration is hoisted, but is currently 'undefined'.
  var value = 5;  // A function-scoped declaration.
}
fn();  // prints 'undefined'.

In contrast, Python’s declaration and assignment are identical and are not hoisted:

if conditional_value:
    value = 5
elif other_conditional_value:
    value = 10
print(value)  # possibly valid, depending on conditional execution

This code may throw a NameError if both conditional values are falsy. It also means that depending on the codepath taken, the original declaration could come from either value = ... assignment node. As a result, instead of returning a single declaration, we’re forced to return a collection of all of the assignments we think could have defined a given name by the time a piece of code is executed. For the above example, value would resolve to a set of both assignments.

get_qualified_names_for(node: Union[str, libcst._nodes.base.CSTNode]) → Collection[libcst.metadata.scope_provider.QualifiedName][source]

Get all QualifiedName in current scope given a CSTNode. The source of a qualified name can be either QualifiedNameSource.IMPORT, QualifiedNameSource.BUILTIN or QualifiedNameSource.LOCAL. Given the following example, c has qualified name a.b.c with source IMPORT, f has qualified name Cls.f with source LOCAL, a has qualified name Cls.f.<locals>.a, i has qualified name Cls.f.<locals>.<comprehension>.i, and the builtin int has qualified name builtins.int with source BUILTIN:

from a.b import c
class Cls:
    def f(self) -> "c":
        c()
        a = int("1")
        [i for i in c()]

We extends PEP-3155 (defines __qualname__ for class and function only; function namespace is followed by a <locals>) to provide qualified name for all CSTNode recorded by Assignment and Access. The namespace of a comprehension (ListComp, SetComp, DictComp) is represented with <comprehension>.

An imported name may be used for type annotation with SimpleString and currently resolving the qualified given SimpleString is not supported considering it could be a complex type annotation in the string which is hard to resolve, e.g. List[Union[int, str]].

property assignments

Return an Assignments contains all assignmens in current scope.

property accesses

Return an Accesses contains all accesses in current scope.

class libcst.metadata.BuiltinScope[source]

A BuiltinScope represents python builtin declarations. See https://docs.python.org/3/library/builtins.html

class libcst.metadata.GlobalScope[source]

A GlobalScope is the scope of module. All module level assignments are recorded in GlobalScope.

class libcst.metadata.FunctionScope[source]

When a function is defined, it creates a FunctionScope.

class libcst.metadata.ClassScope[source]

When a class is defined, it creates a ClassScope.

class libcst.metadata.ComprehensionScope[source]

Comprehensions and generator expressions create their own scope. For example, in

[i for i in range(10)]

The variable i is only viewable within the ComprehensionScope.

class libcst.metadata.Assignments[source]

A container to provide all assignments in a scope.

__iter__() → Iterator[libcst.metadata.scope_provider.BaseAssignment][source]

Iterate through all assignments by for i in scope.assignments.

__getitem__(node: Union[str, libcst._nodes.base.CSTNode]) → Collection[libcst.metadata.scope_provider.BaseAssignment][source]

Get assignments given a name str or CSTNode by scope.assignments[node]

__contains__(node: Union[str, libcst._nodes.base.CSTNode]) → bool[source]

Check if a name str or CSTNode has any assignment by node in scope.assignments

class libcst.metadata.Accesses[source]

A container to provide all accesses in a scope.

__iter__() → Iterator[libcst.metadata.scope_provider.Access][source]

Iterate through all accesses by for i in scope.accesses.

__getitem__(node: Union[str, libcst._nodes.base.CSTNode]) → Collection[libcst.metadata.scope_provider.Access][source]

Get accesses given a name str or CSTNode by scope.accesses[node]

__contains__(node: Union[str, libcst._nodes.base.CSTNode]) → bool[source]

Check if a name str or CSTNode has any access by node in scope.accesses

Qualified Name Metadata

Qualified name provides an unambiguous name to locate the definition of variable and it’s introduced for class and function in PEP-3155. QualifiedNameProvider provides possible QualifiedName given a CSTNode.

We don’t call it fully qualified name because the name refers to the current module which doesn’t consider the hierarchy of code repository.

For fully qualified names, there’s FullyQualifiedNameProvider which is similar to the above but takes the current module’s location (relative to some python root folder, usually the repository’s root) into account.

class libcst.metadata.QualifiedNameSource[source]

An enumeration.

IMPORT = 1
BUILTIN = 2
LOCAL = 3
class libcst.metadata.QualifiedName[source]
name : str

Qualified name, e.g. a.b.c or fn.<locals>.var.

source : scope_provider.QualifiedNameSource

Source of the name, either QualifiedNameSource.IMPORT, QualifiedNameSource.BUILTIN or QualifiedNameSource.LOCAL.

class libcst.metadata.QualifiedNameProvider[source]

Compute possible qualified names of a variable CSTNode (extends PEP-3155). It uses the get_qualified_names_for() underlying to get qualified names. Multiple qualified names may be returned, such as when we have conditional imports or an import shadows another. E.g., the provider finds a.b, d.e and f.g as possible qualified names of c:

>>> wrapper = MetadataWrapper(
>>>     cst.parse_module(dedent(
>>>     '''
>>>         if something:
>>>             from a import b as c
>>>         elif otherthing:
>>>             from d import e as c
>>>         else:
>>>             from f import g as c
>>>         c()
>>>     '''
>>>     ))
>>> )
>>> call = wrapper.module.body[1].body[0].value
>>> wrapper.resolve(QualifiedNameProvider)[call],
{
    QualifiedName(name="a.b", source=QualifiedNameSource.IMPORT),
    QualifiedName(name="d.e", source=QualifiedNameSource.IMPORT),
    QualifiedName(name="f.g", source=QualifiedNameSource.IMPORT),
}

For qualified name of a variable in a function or a comprehension, please refer get_qualified_names_for() for more detail.

static has_name(visitor: libcst._metadata_dependent.MetadataDependent, node: libcst._nodes.base.CSTNode, name: Union[str, libcst.metadata.scope_provider.QualifiedName]) → bool[source]

Check if any of qualified name has the str name or QualifiedName name.

class libcst.metadata.FullyQualifiedNameProvider[source]

Provide fully qualified names for CST nodes. Like QualifiedNameProvider, but the provided QualifiedName instances have absolute identifier names instead of local to the current module.

This provider is initialized with the current module’s fully qualified name, and can be used with FullRepoManager. The module’s fully qualified name itself is stored as a metadata of the Module node. Compared to QualifiedNameProvider, it also resolves relative imports.

Example usage:

>>> mgr = FullRepoManager(".", {"dir/a.py"}, {FullyQualifiedNameProvider})
>>> wrapper = mgr.get_metadata_wrapper_for_path("dir/a.py")
>>> fqnames = wrapper.resolve(FullyQualifiedNameProvider)
>>> {type(k): v for (k, v) in fqnames.items()}
{<class 'libcst._nodes.module.Module'>: {QualifiedName(name='dir.a', source=<QualifiedNameSource.LOCAL: 3>)}}

Parent Node Metadata

A CSTNode only has attributes link to its child nodes and thus only top-down tree traversal is doable. Sometimes user may want to access the parent CSTNode for more information or traverse in bottom-up manner. We provide ParentNodeProvider for those use cases.

class libcst.metadata.ParentNodeProvider[source]

Type Inference Metadata

Type inference is to automatically infer data types of expression for deeper understanding source code. In Python, type checkers like Mypy or Pyre analyze type annotations and infer types for expressions. TypeInferenceProvider is provided by Pyre Query API which requires setup watchman for incremental typechecking. FullRepoManger is built for manage the inter process communication to Pyre.

class libcst.metadata.TypeInferenceProvider[source]

Access inferred type annotation through Pyre Query API. It requires setup watchman and start pyre server by running pyre command. The inferred type is a string of type annotation. E.g. typing.List[libcst._nodes.expression.Name] is the inferred type of name n in expression n = [cst.Name("")]. All name references use the fully qualified name regardless how the names are imported. (e.g. import libcst; libcst.Name and import libcst as cst; cst.Name refer to the same name.) Pyre infers the type of Name, Attribute and Call nodes. The inter process communication to Pyre server is managed by FullRepoManager.

class libcst.metadata.FullRepoManager[source]
__init__(repo_root_dir: str, paths: Collection[str], providers: Collection[ProviderT], timeout: int = 5) → None[source]

Given project root directory with pyre and watchman setup, FullRepoManager handles the inter process communication to read the required full repository cache data for metadata provider like TypeInferenceProvider.

Parameters
  • paths – a collection of paths to access full repository data.

  • providers – a collection of metadata provider classes require accessing full repository data, currently supports

TypeInferenceProvider and FullyQualifiedNameProvider. :param timeout: number of seconds. Raises TimeoutExpired

when timeout.

property cache

The full repository cache data for all metadata providers passed in the providers parameter when constructing FullRepoManager. Each provider is mapped to a mapping of path to cache.

resolve_cache() → None[source]

Resolve cache for all providers that require it. Normally this is called by get_cache_for_path() so you do not need to call it manually. However, if you intend to do a single cache resolution pass before forking, it is a good idea to call this explicitly to control when cache resolution happens.

get_cache_for_path(path: str) → Mapping[ProviderT, object][source]

Retrieve cache for a source file. The file needs to appear in the paths parameter when constructing FullRepoManager.

manager = FullRepoManager(".", {"a.py", "b.py"}, {TypeInferenceProvider})
MetadataWrapper(module, cache=manager.get_cache_for_path("a.py"))
get_metadata_wrapper_for_path(path: str) → libcst.metadata.wrapper.MetadataWrapper[source]

Create a MetadataWrapper given a source file path. The path needs to be a path relative to project root directory. The source code is read and parsed as Module for MetadataWrapper.

manager = FullRepoManager(".", {"a.py", "b.py"}, {TypeInferenceProvider})
wrapper = manager.get_metadata_wrapper_for_path("a.py")