What arguments was Python called with?
Farewell to my first big PR for Werkzeug, made in 2018. Now that Python 3.9 is EOL, I'm able to remove that giant hack and replace it with a single line of code. Here's an explanation of what I did, and what replaces it.
Werkzeug's reloader has to determine what arguments to start a new process with, including the Python executable, the script, whether it was run with -m, and arguments passed to the script.
Unfortunately, Python doesn't make the fact that it was run as python -m module available, it replaces -m module with the path to the module in sys.argv. However, Python also has different behavior for python -m module vs python file.py: running a file adds its directory to sys.path. When the reloader was run with python -m werkzeug.serving, it would end up calling python /path/to/werkzeug/serving.py on reload, which added /path/to/werkzeug to sys.path. This caused import conflicts with module names in Werkzeug, such as http, shadowing builtins.
After enough digging through internals with the debugger, I found that it is possible to detect that Python was run with -m, and re-adjust sys.argv to reflect that. This took a couple weeks of investigation, as I discovered more and more ways to install a script and corner cases.
To determine if -m was used, I looked at __main__.__package__. __main__ is the module name given to the script that was run, regardless of the name of the script or whether it was run as -m. For all the cases I could think of, it seems that if its __package__ is not None, then it was run as -m, otherwise it was run as a file. If we're running as -m, then the module name can be reconstructed by appending the name of the file in sys.argv[0] to __main__.__package__.
Beyond this, there are a few other things messing with the value. If the script is a pip-installed entry point on Windows, it's an exe but doesn't include the extension in sys.argv. Python -m a.b might refer to a module a/b.py or a package a/b/__init__.py, or a/b/__main__.py. And pydevd, the backend for many IDE debuggers, incorrectly rewrites -m script to script instead of path/to/script.py.
def _get_args_for_reloading() -> list[str]:
"""Determine how the script was executed, and return the args needed
to execute it again in a new process.
"""
rv = [sys.executable]
py_script = sys.argv[0]
args = sys.argv[1:]
# Need to look at main module to determine how it was executed.
__main__ = sys.modules["__main__"]
# The value of __package__ indicates how Python was called. It may
# not exist if a setuptools script is installed as an egg. It may be
# set incorrectly for entry points created with pip on Windows.
if getattr(__main__, "__package__", None) is None or (
os.name == "nt"
and __main__.__package__ == ""
and not os.path.exists(py_script)
and os.path.exists(f"{py_script}.exe")
):
# Executed a file, like "python app.py".
py_script = os.path.abspath(py_script)
if os.name == "nt":
# Windows entry points have ".exe" extension and should be
# called directly.
if not os.path.exists(py_script) and os.path.exists(f"{py_script}.exe"):
py_script += ".exe"
if (
os.path.splitext(sys.executable)[1] == ".exe"
and os.path.splitext(py_script)[1] == ".exe"
):
rv.pop(0)
rv.append(py_script)
else:
# Executed a module, like "python -m werkzeug.serving".
if os.path.isfile(py_script):
# Rewritten by Python from "-m script" to "/path/to/script.py".
py_module = t.cast(str, __main__.__package__)
name = os.path.splitext(os.path.basename(py_script))[0]
if name != "__main__":
py_module += f".{name}"
else:
# Incorrectly rewritten by pydevd debugger from "-m script" to "script".
py_module = py_script
rv.extend(("-m", py_module.lstrip(".")))
rv.extend(args)
return rv
While the answer to "how was this script called" seems like it should be simple, "just check sys.argv", it turned out to be far more complex. And that code above still isn't complete, it doesn't capture the arguments that were passed to python besides -m, such as -u, -X, etc. That's why Python 3.10 added sys.orig_argv. It captures exactly what was passed to python, before any rewriting was done. All of the above can be replaced, with more fidelity, by:
args = [sys.executable, *sys.orig_argv[1:]]