Finding Attribute Docstrings
I maintain a library Magql-SQLAlchemy that generates an API from SQLAlchemy models. As part of that, I wanted to use docstrings from the model classes in Python as the descriptions for objects, fields, and arguments in the API schema.
Docstrings are strings written as the first line of a module, class, or function. Typically they use """ triple quotes and span multiple lines. These are easy to get at runtime, as Python assigns them to the special __doc__ attribute of modules, classes, and functions.
class User(Model):
"""A user for authentication."""
User.__doc__ # the docstring value
Sphinx Autodoc and other documentation tools extend this pattern to allow docstrings written below attributes.
class User(Model):
username: Mapped[str]
"""The unique name used to identify the user."""
User.username.__doc__ # Incorrect!
Python discards these strings at runtime rather than assigning them to the attribute's __doc__. (This is because name.__doc__ needs to refer to the doc of the thing name refers to, not the declaration of name itself.) I asked on Mastodon if anyone knew a library to extract these, and was pointed at LibCST, Griffe, pdoc, and astroid. However, these are all big dependencies aimed at general code analysis or documentation generation. I didn't want to add a huge dependency to my otherwise lightweight library just for this one feature.
Luckily, this discarding happens after parsing, so these strings are still visible in the AST. With a little experimentation, I was able to write a fairly short function to extract docstrings for class attributes using only Python's built-in inspect and ast modules.
def get_attr_docs(cls: type[Any]) -> dict[str, str]:
"""Get any docstrings placed after attribute assignments in a class body."""
cls_node = ast.parse(textwrap.dedent(inspect.getsource(cls))).body[0]
if not isinstance(cls_node, ast.ClassDef):
raise TypeError("Given object was not a class.")
out = {}
# Consider each pair of nodes.
for a, b in pairwise(cls_node.body):
# Must be an assignment then a constant string.
if (
not isinstance(a, ast.Assign | ast.AnnAssign)
or not isinstance(b, ast.Expr)
or not isinstance(b.value, ast.Constant)
or not isinstance(b.value.value, str)
):
continue
doc = inspect.cleandoc(b.value.value)
if isinstance(a, ast.Assign):
# An assignment can have multiple targets (a = b = v).
targets = a.targets
else:
# An annotated assignment only has one target.
targets = [a.target]
for target in targets:
# Must be assigning to a plain name.
if not isinstance(target, ast.Name):
continue
out[target.id] = doc
return out
inspect.getsource() gets the text from the source file corresponding to only that class. It might be indented, and ast.parse() doesn't like that, so use textwrap.dedent() to fix that. ast.parse() always returns a Module, and assuming we really parsed a class, it should have a single ClassDef item in the body. Then inspect each pair of nodes in the class body to find an Assign | AnnAssign followed by a Expr > Constant > str. When we find such a pair, get the name(s) from the assignment node and the value from the constant node. inspect.cleandoc() removes indentation and strips leading and trailing space.
There are a couple things I don't do that the big libraries do. First, there's another convention where #: prefixed comments above an attribute are also considered docstrings. I don't use this pattern, because the prefix makes multiline docs harder to write, especially with indentation within the doc. I also like the consistency of using """ strings for all docstrings. If I did want to support this, I'd do it by parsing the AST to note all lines that assignments start on, then look at the previous lines from inspect.getsourcelines() to collect those that start with the #: prefix.
Second, Sphinx considers docstrings for assignments like self.name inside the __init__ method as well. For the SQLAlchemy model classes I'm dealing with, this wasn't needed. But it should work similarly to what I wrote, except I'd call inspect.getsource(cls.__init__) and would look a little further into the Assign > Name node to get the name part of self.name.
Since I originally started this investigation by asking if a library already existed, I might release this as a library if I have time. If I did, I'd probably add those two additional features to make it more generally useful. But not every small function needs to be a library to be useful, and this code on its own does exactly what I need for this use case.