-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-32609: [CI] Add type checking infrastructure and CI workflow for type annotations #48618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
afe4699 to
30017ff
Compare
AlenkaF
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rok for working on this! Splitting up the main PR sounds like a great idea to me.
I only have one small question, otherwise the changes look good to me.
|
Yes, let's wait for Raul 👍 |
raulcd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @rok sorry it took me a while to review with holidays, release, etcetera
|
@github-actions crossbow submit wheel-manylinux-2-28-cp311-cp311-amd64 |
|
Revision: 28fba3a Submitted crossbow builds: ursacomputing/crossbow @ actions-8a459250fe
|
| ARROW_S3: "OFF" | ||
| ARROW_SUBSTRAIT: "OFF" | ||
| ARROW_WITH_OPENTELEMETRY: "OFF" | ||
| PYARROW_TEST_ANNOTATIONS: "ON" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raulcd do you think we actually want to be able to toggle these? I'm thinking we don't really need them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think is worth it if instead of using a separate ci/scripts/python_test_type_annotations.sh we use ci/scripts/python_test.sh and we use the PYARROW_TEST_ANNOTATIONS variable to manage whether we want annotations to be tested or not as part of python testing.
If we decide we want to have two separate scripts I think this env var should be removed.
I lean towards having a single python test script that can allow us to also test annotations but I can find arguments for both so not an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, agreed with your reasoning. I also introduced PYARROW_TEST_ANNOTATIONS to the windows script.
|
Thanks for review @raulcd ! I've responded to your points. |
| ARROW_S3: "OFF" | ||
| ARROW_SUBSTRAIT: "OFF" | ||
| ARROW_WITH_OPENTELEMETRY: "OFF" | ||
| PYARROW_TEST_ANNOTATIONS: "ON" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think is worth it if instead of using a separate ci/scripts/python_test_type_annotations.sh we use ci/scripts/python_test.sh and we use the PYARROW_TEST_ANNOTATIONS variable to manage whether we want annotations to be tested or not as part of python testing.
If we decide we want to have two separate scripts I think this env var should be removed.
I lean towards having a single python test script that can allow us to also test annotations but I can find arguments for both so not an issue.
| # We first populate stub docstrings and then build the wheel | ||
| python setup.py build_ext --inplace | ||
| python -m pip install griffe libcst | ||
| python ../dev/update_stub_docstrings.py pyarrow-stubs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not necessary, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as #48618 (comment). It's a noop and we don't have to introduce python logic in stub PRs if we do here.
| @REM We first populate stub docstrings and then build the wheel | ||
| %PYTHON_CMD% setup.py build_ext --inplace | ||
| %PYTHON_CMD% -m pip install griffe libcst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until we have real annotations this isn't necessary, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added running dev\update_stub_docstrings.py, so I suppose it's a noop now and we can keep it to make sure it doesn't fail before introducing actual stubs? This will keep CI logic out of stub PRs.
|
@github-actions crossbow submit wheel-manylinux-2-28-cp311-cp311-amd64 |
|
Revision: ee72560 Submitted crossbow builds: ursacomputing/crossbow @ actions-cd0acc6696
|
|
@github-actions crossbow submit wheel-manylinux-2-28-cp311-cp311-amd64 |
|
Revision: 19b54bc Submitted crossbow builds: ursacomputing/crossbow @ actions-f406282276
|
|
@github-actions crossbow submit wheel-manylinux-2-28-cp311-cp311-amd64 |
|
Revision: c23cb9c Submitted crossbow builds: ursacomputing/crossbow @ actions-286c26b719
|
|
@raulcd I've factored out |
|
@github-actions crossbow submit wheel-manylinux-2-28-cp311-cp311-amd64 |
|
Revision: 491ff2c Submitted crossbow builds: ursacomputing/crossbow @ actions-b6331706da
|
|
@github-actions crossbow submit wheel-manylinux-2-28-cp311-cp311-amd64 |
|
Revision: ba07ac8 Submitted crossbow builds: ursacomputing/crossbow @ actions-6c5a9f1e7e
|
…d script for including docstrings into stubfiles before building wheels. diff --git c/.github/workflows/python.yml i/.github/workflows/python.yml index e5d3679..4ca0f9b 100644 --- c/.github/workflows/python.yml +++ i/.github/workflows/python.yml @@ -239,6 +239,11 @@ jobs: - name: Test shell: bash run: ci/scripts/python_test.sh $(pwd) $(pwd)/build + - name: Test annotations + shell: bash + env: + PYARROW_TEST_ANNOTATIONS: "ON" + run: ci/scripts/python_test_type_annotations.sh $(pwd)/python windows: name: AMD64 Windows 2022 Python 3.13 @@ -296,3 +301,7 @@ jobs: shell: cmd run: | call "ci\scripts\python_test.bat" %cd% + - name: Test annotations + shell: cmd + run: | + call "ci\scripts\python_test_type_annotations.bat" %cd%\python diff --git c/ci/scripts/python_test_type_annotations.bat i/ci/scripts/python_test_type_annotations.bat new file mode 100644 index 0000000000..3446e32 --- /dev/null +++ i/ci/scripts/python_test_type_annotations.bat @@ -0,0 +1,38 @@ +@Rem Licensed to the Apache Software Foundation (ASF) under one +@Rem or more contributor license agreements. See the NOTICE file +@Rem distributed with this work for additional information +@Rem regarding copyright ownership. The ASF licenses this file +@Rem to you under the Apache License, Version 2.0 (the +@Rem "License"); you may not use this file except in compliance +@Rem with the License. You may obtain a copy of the License at +@Rem +@Rem http://www.apache.org/licenses/LICENSE-2.0 +@Rem +@Rem Unless required by applicable law or agreed to in writing, +@Rem software distributed under the License is distributed on an +@Rem "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +@Rem KIND, either express or implied. See the License for the +@Rem specific language governing permissions and limitations +@Rem under the License. + +@echo on + +set PYARROW_DIR=%1 + +echo Annotation testing on Windows ... + +@Rem Install library stubs +%PYTHON_CMD% -m pip install pandas-stubs scipy-stubs sphinx types-cffi types-psutil types-requests types-python-dateutil || exit /B 1 + +@Rem Install other dependencies for type checking +%PYTHON_CMD% -m pip install fsspec || exit /B 1 + +@Rem Install type checkers +%PYTHON_CMD% -m pip install mypy pyright ty || exit /B 1 + +@Rem Run type checkers +pushd %PYARROW_DIR% + +mypy +pyright +ty check diff --git c/ci/scripts/python_test_type_annotations.sh i/ci/scripts/python_test_type_annotations.sh new file mode 100755 index 0000000000..82610ce --- /dev/null +++ i/ci/scripts/python_test_type_annotations.sh @@ -0,0 +1,40 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +set -ex +pyarrow_dir=${1} + +if [ "${PYARROW_TEST_ANNOTATIONS}" == "ON" ]; then + # Install library stubs + pip install pandas-stubs scipy-stubs sphinx types-cffi types-psutil types-requests types-python-dateutil + + # Install type checkers + pip install mypy pyright ty + + # Install other dependencies for type checking + pip install fsspec + + # Run type checkers + pushd ${pyarrow_dir} + mypy + pyright + ty check; +else + echo "Skipping type annotation tests"; +fi diff --git c/ci/scripts/python_wheel_macos_build.sh i/ci/scripts/python_wheel_macos_build.sh index bd61154..b64eee6 100755 --- c/ci/scripts/python_wheel_macos_build.sh +++ i/ci/scripts/python_wheel_macos_build.sh @@ -177,6 +177,11 @@ export CMAKE_PREFIX_PATH=${build_dir}/install export SETUPTOOLS_SCM_PRETEND_VERSION=${PYARROW_VERSION} pushd ${source_dir}/python +# We first populate stub docstrings and then build the wheel +python setup.py build_ext --inplace +python -m pip install griffe libcst +python ../dev/update_stub_docstrings.py pyarrow-stubs + python setup.py bdist_wheel popd diff --git c/ci/scripts/python_wheel_validate_contents.py i/ci/scripts/python_wheel_validate_contents.py index 84fcaba..ee4a31a 100644 --- c/ci/scripts/python_wheel_validate_contents.py +++ i/ci/scripts/python_wheel_validate_contents.py @@ -35,6 +35,11 @@ def validate_wheel(path): assert not outliers, f"Unexpected contents in wheel: {sorted(outliers)}" print(f"The wheel: {wheels[0]} seems valid.") + candidates = [info for info in f.filelist if info.filename.endswith('compute.pyi')] + assert candidates, "compute.pyi not found in wheel" + content = f.read(candidates[0]).decode('utf-8', errors='replace') + assert '"""' in content, "compute.pyi missing docstrings (no triple quotes found)" + def main(): parser = argparse.ArgumentParser() diff --git c/ci/scripts/python_wheel_windows_build.bat i/ci/scripts/python_wheel_windows_build.bat index b4b7fed..3da7f60 100644 --- c/ci/scripts/python_wheel_windows_build.bat +++ i/ci/scripts/python_wheel_windows_build.bat @@ -135,6 +135,11 @@ pushd C:\arrow\python @Rem Build wheel %PYTHON_CMD% setup.py bdist_wheel || exit /B 1 +@Rem We first populate stub docstrings and then build the wheel +%PYTHON_CMD% setup.py build_ext --inplace +%PYTHON_CMD% -m pip install griffe libcst +%PYTHON_CMD% ..\dev\update_stub_docstrings.py pyarrow-stubs + @Rem Repair the wheel with delvewheel @Rem @Rem Since we bundled the Arrow C++ libraries ourselves, we only need to diff --git c/ci/scripts/python_wheel_xlinux_build.sh i/ci/scripts/python_wheel_xlinux_build.sh index a3fbeb3..977ef64 100755 --- c/ci/scripts/python_wheel_xlinux_build.sh +++ i/ci/scripts/python_wheel_xlinux_build.sh @@ -167,6 +167,11 @@ export ARROW_HOME=/tmp/arrow-dist export CMAKE_PREFIX_PATH=/tmp/arrow-dist pushd /arrow/python +# We first populate stub docstrings and then build the wheel +python setup.py build_ext --inplace +python -m pip install griffe libcst +python ../dev/update_stub_docstrings.py pyarrow-stubs + python setup.py bdist_wheel echo "=== Strip symbols from wheel ===" diff --git c/compose.yaml i/compose.yaml index 2bd38a3..ae0a1d4 100644 --- c/compose.yaml +++ i/compose.yaml @@ -919,12 +919,14 @@ services: environment: <<: [*common, *ccache, *sccache] PYTEST_ARGS: # inherit + PYARROW_TEST_ANNOTATIONS: "ON" volumes: *conda-volumes command: &python-conda-command [" /arrow/ci/scripts/cpp_build.sh /arrow /build && /arrow/ci/scripts/python_build.sh /arrow /build && - /arrow/ci/scripts/python_test.sh /arrow"] + /arrow/ci/scripts/python_test.sh /arrow && + /arrow/ci/scripts/python_test_type_annotations.sh /arrow/python"] conda-python-emscripten: # Usage: @@ -1001,6 +1003,7 @@ services: ARROW_S3: "OFF" ARROW_SUBSTRAIT: "OFF" ARROW_WITH_OPENTELEMETRY: "OFF" + PYARROW_TEST_ANNOTATIONS: "ON" SETUPTOOLS_SCM_PRETEND_VERSION: volumes: *ubuntu-volumes deploy: *cuda-deploy @@ -1008,7 +1011,8 @@ services: /bin/bash -c " /arrow/ci/scripts/cpp_build.sh /arrow /build && /arrow/ci/scripts/python_build.sh /arrow /build && - /arrow/ci/scripts/python_test.sh /arrow" + /arrow/ci/scripts/python_test.sh /arrow && + /arrow/ci/scripts/python_test_type_annotations.sh /arrow/python" debian-python: # Usage: @@ -1500,6 +1504,7 @@ services: python: ${PYTHON} shm_size: *shm-size environment: + PYARROW_TEST_ANNOTATIONS: "ON" <<: [*common, *ccache, *sccache] PARQUET_REQUIRE_ENCRYPTION: # inherit HYPOTHESIS_PROFILE: # inherit @@ -1510,7 +1515,8 @@ services: /arrow/ci/scripts/cpp_build.sh /arrow /build && /arrow/ci/scripts/python_build.sh /arrow /build && mamba uninstall -y numpy && - /arrow/ci/scripts/python_test.sh /arrow"] + /arrow/ci/scripts/python_test.sh /arrow && + /arrow/ci/scripts/python_test_type_annotations.sh /arrow/python"] conda-python-docs: # Usage: @@ -1530,13 +1536,15 @@ services: BUILD_DOCS_CPP: "ON" BUILD_DOCS_PYTHON: "ON" PYTEST_ARGS: "--doctest-modules --doctest-cython" + PYARROW_TEST_ANNOTATIONS: "ON" volumes: *conda-volumes command: ["/arrow/ci/scripts/cpp_build.sh /arrow /build && /arrow/ci/scripts/python_build.sh /arrow /build && pip install -e /arrow/dev/archery[numpydoc] && archery numpydoc --allow-rule GL10,PR01,PR03,PR04,PR05,PR10,RT03,YD01 && - /arrow/ci/scripts/python_test.sh /arrow"] + /arrow/ci/scripts/python_test.sh /arrow && + /arrow/ci/scripts/python_test_type_annotations.sh /arrow/python"] conda-python-dask: # Possible $DASK parameters: diff --git c/docs/source/developers/python/development.rst i/docs/source/developers/python/development.rst index d03b243..c23891e 100644 --- c/docs/source/developers/python/development.rst +++ i/docs/source/developers/python/development.rst @@ -42,7 +42,7 @@ Unit Testing ============ We are using `pytest <https://docs.pytest.org/en/latest/>`_ to develop our unit -test suite. After `building the project <build_pyarrow>`_ you can run its unit tests +test suite. After `building the project <building.html>`_ you can run its unit tests like so: .. code-block:: @@ -101,6 +101,74 @@ The test groups currently include: * ``s3``: Tests for Amazon S3 * ``tensorflow``: Tests that involve TensorFlow +Type Checking +============= + +PyArrow provides type stubs (``*.pyi`` files) for static type checking. These +stubs are located in the ``pyarrow-stubs/`` directory and are automatically +included in the distributed wheel packages. + +Running Type Checkers +--------------------- + +We support multiple type checkers. Their configurations are in +``pyproject.toml``. + +**mypy** + +To run mypy on the PyArrow codebase: + +.. code-block:: + + $ cd arrow/python + $ mypy + +The mypy configuration is in the ``[tool.mypy]`` section of ``pyproject.toml``. + +**pyright** + +To run pyright: + +.. code-block:: + + $ cd arrow/python + $ pyright + +The pyright configuration is in the ``[tool.pyright]`` section of ``pyproject.toml``. + +**ty** + +To run ty (note: currently only partially configured): + +.. code-block:: + + $ cd arrow/python + $ ty check + +Maintaining Type Stubs +----------------------- + +Type stubs for PyArrow are maintained in the ``pyarrow-stubs/`` +directory. These stubs mirror the structure of the main ``pyarrow/`` package. + +When adding or modifying public APIs: + +1. **Update the corresponding ``.pyi`` stub file** in ``pyarrow-stubs/`` + to reflect the new or changed function/class signatures. + +2. **Include type annotations** where possible. For Cython modules or + dynamically generated APIs such as compute kernels add the corresponding + stub in ``pyarrow-stubs/``. + +3. **Run type checkers** to ensure the stubs are correct and complete. + +The stub files are automatically copied into the built wheel during the build +process and will be included when users install PyArrow, enabling type checking +in downstream projects and for users' IDEs. + +Note: ``py.typed`` marker file in the ``pyarrow/`` directory indicates to type +checkers that PyArrow supports type checking according to :pep:`561`. + Doctest ======= diff --git c/python/MANIFEST.in i/python/MANIFEST.in index ed7012e..2840ba7 100644 --- c/python/MANIFEST.in +++ i/python/MANIFEST.in @@ -4,6 +4,7 @@ include ../NOTICE.txt global-include CMakeLists.txt graft pyarrow +graft pyarrow-stubs graft cmake_modules global-exclude *.so diff --git c/python/pyarrow-stubs/pyarrow/__init__.pyi i/python/pyarrow-stubs/pyarrow/__init__.pyi new file mode 100644 index 0000000000..2a68a51 --- /dev/null +++ i/python/pyarrow-stubs/pyarrow/__init__.pyi @@ -0,0 +1,26 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Type stubs for PyArrow. + +This is a placeholder stub file. +Complete type annotations will be added in subsequent PRs. +""" + +from typing import Any + +def __getattr__(name: str) -> Any: ... diff --git c/python/pyarrow/py.typed i/python/pyarrow/py.typed new file mode 100644 index 0000000000..13a8339 --- /dev/null +++ i/python/pyarrow/py.typed @@ -0,0 +1,16 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. diff --git c/python/pyproject.toml i/python/pyproject.toml index 899144d..9f62f02 100644 --- c/python/pyproject.toml +++ i/python/pyproject.toml @@ -84,11 +84,11 @@ zip-safe=false include-package-data=true [tool.setuptools.packages.find] -include = ["pyarrow"] +include = ["pyarrow", "pyarrow.*"] namespaces = false [tool.setuptools.package-data] -pyarrow = ["*.pxd", "*.pyx", "includes/*.pxd"] +pyarrow = ["*.pxd", "*.pyx", "includes/*.pxd", "py.typed"] [tool.setuptools_scm] root = '..' @@ -96,3 +96,39 @@ version_file = 'pyarrow/_generated_version.py' version_scheme = 'guess-next-dev' git_describe_command = 'git describe --dirty --tags --long --match "apache-arrow-[0-9]*.*"' fallback_version = '24.0.0a0' + +# TODO: Enable type checking once stubs are merged +[tool.mypy] +files = ["pyarrow-stubs"] +mypy_path = "$MYPY_CONFIG_FILE_DIR/pyarrow-stubs" +exclude = [ + "^pyarrow/", + "^benchmarks/", + "^examples/", + "^scripts/", +] + +# TODO: Enable type checking once stubs are merged +[tool.pyright] +pythonPlatform = "All" +pythonVersion = "3.10" +include = ["pyarrow-stubs"] +exclude = [ + "pyarrow", + "benchmarks", + "examples", + "scripts", + "build", +] +stubPath = "pyarrow-stubs" +typeCheckingMode = "basic" + +# TODO: Enable type checking once stubs are merged +[tool.ty.src] +include = ["pyarrow-stubs"] +exclude = [ + "pyarrow", + "benchmarks", + "examples", + "scripts", +] diff --git c/python/setup.py i/python/setup.py index a27bd3b..a25d2d7 100755 --- c/python/setup.py +++ i/python/setup.py @@ -121,8 +121,35 @@ class build_ext(_build_ext): def run(self): self._run_cmake() + self._copy_stubs() _build_ext.run(self) + def _copy_stubs(self): + """Copy .pyi stub files from pyarrow-stubs to the build directory.""" + build_cmd = self.get_finalized_command('build') + build_lib = os.path.abspath(build_cmd.build_lib) + + stubs_src = pjoin(setup_dir, 'pyarrow-stubs', 'pyarrow') + stubs_dest = pjoin(build_lib, 'pyarrow') + + if os.path.exists(stubs_src): + print(f"-- Copying stub files from {stubs_src} to {stubs_dest}") + for root, dirs, files in os.walk(stubs_src): + # Calculate relative path from stubs_src + rel_dir = os.path.relpath(root, stubs_src) + dest_dir = pjoin(stubs_dest, rel_dir) if rel_dir != '.' else stubs_dest + + # Create destination directory if needed + if not os.path.exists(dest_dir): + os.makedirs(dest_dir) + + # Copy .pyi files + for file in files: + if file.endswith('.pyi'): + src_file = pjoin(root, file) + dest_file = pjoin(dest_dir, file) + shutil.copy2(src_file, dest_file) + # adapted from cmake_build_ext in dynd-python # github.com/libdynd/dynd-python
Co-authored-by: Raúl Cumplido <[email protected]>
change bat lint add a popd and nicer logging for windows ReplaceElipsis -> DocstringInserter simplify remove sphinx
ba07ac8 to
5ce9011
Compare
|
@github-actions crossbow submit wheel-manylinux-2-28-cp311-cp311-amd64 |
|
@raulcd I think this is ready for another review whenever you have time. |
|
Revision: 5ce9011 Submitted crossbow builds: ursacomputing/crossbow @ actions-46e8b6d779
|
Rationale for this change
This is the first in series of PRs adding type annotations to pyarrow and resolving #32609.
What changes are included in this PR?
This PR establishes infrastructure for type checking:
py.typedmarker to enable type checkingAre these changes tested?
No. This is mostly a CI change.
Are there any user-facing changes?
This does not add any actual annotations (only
py.typedmarker) so user should not be affected.