Packaging with Python
Packaging with Python
This practical was made using various contents from:
- https://python-packaging.readthedocs.io
- https://setuptools.readthedocs.io
- https://packaging.python.org/
- https://github.com/pybind/pybind11
- https://pybind11.readthedocs.io/
- https://github.com/pypa/manylinux
- https://opensource.com/article/19/2/manylinux-python-wheels
- https://stackoverflow.com
Python pip
The Python Package Index (PyPI) is a repository of software for the Python programming language.
You can install the python pip
manager with the following command.
Some outdated distributions still have python2 as default python, use python3 command
On Ubuntu
apt install python3-pip
On OSX
brew install python3
With docker:
docker run -it python:3.8-alpine sh
installing packages with pip
And then you can install any packages you want on https://pypi.org/
sudo pip3 install setuptools==50.0.2 # system wide install
sudo pip3 install setuptools==50.0.2 --user # user install
pip3 install twine wheel --user
pip
is just another python module
python3 -m pip install --user --upgrade pip
Packaging python projects
You can follow the official guide (https://packaging.python.org/tutorials/packaging-projects/)
Creating the package files
This is the recommended basic structure your project should have to easily build a pip package:
your_project/
├── LICENSE
├── README.md
├── example_pkg/
│ └── __init__.py
├── setup.py
└── tests/
But we can adapt it to follow the LBMC guide of good practices
your_project/
├── LICENSE
├── README.md
├── src/
│ └── example_pkg/
│ └── __init__.py
│ └── setup.py
│ └── tests/
All you python code goes in the example_pkg/
folder. The most important file for the packaging is the setup.py
file.
Here is a basic setup.py
file:
import setuptools with open("../README.md", "r") as fh: long_description = fh.read() setuptools.setup( name="example-pkg-YOUR-USERNAME-HERE", # Replace with your own username version="0.0.1", author="Example Author", author_email="author@example.com", description="A small example package", long_description=long_description, long_description_content_type="text/markdown", url="https://github.com/pypa/sampleproject", packages=setuptools.find_packages(), classifiers=[ ········"Programming·Language·::·Python·::·3", ········"License·::·OSI·Approved·::·CEA·CNRS·Inria·Logiciel·Libre·License,·\ version·2.1·(CeCILL-2.1)", ········"Operating·System·::·OS·Independent" ····], python_requires='>=3.6', )
name
is the distribution name of your package. This can be any name as long as only contains letters, numbers, _ , and -. It also must not already be taken on pypi.org. Be sure to update this with your username, as this ensures you won’t try to upload a package with the same name as one which already exists when you upload the package.version
is the package version see PEP 440 for more details on versions.author
andauthor_email
are used to identify the author of the package.description
is a short, one-sentence summary of the package.long_description
is a detailed description of the package. This is shown on the package detail package on the Python Package Index. In this case, the long description is loaded fromREADME.md
which is a common pattern.long_description_content_type
tells the index what type of markup is used for the long description. In this case, it’s Markdown.url
is the URL for the homepage of the project. For many projects, this will just be a link to GitHub, GitLab, Bitbucket, or similar code hosting service.packages
is a list of all Python import packages that should be included in the Distribution Package. Instead of listing each package manually, we can use find_packages() to automatically discover all packages and subpackages. In this case, the list of packages will be example_pkg as that’s the only package present.classifiers
gives the index and pip some additional metadata about your package. In this case, the package is only compatible with Python 3, is licensed under the MIT license, and is OS-independent. You should always include at least which version(s) of Python your package works on, which license your package is available under, and which operating systems your package will work on. For a complete list of classifiers, see https://pypi.org/classifiers/.
Creating distribution archives
Now you just have to run the following command in the same directory as the setup.py
file:
python3 setup.py sdist bdist_wheel
It will create files in the dist/
directory
dist/
example_pkg_YOUR_USERNAME_HERE-0.0.1-py3-none-any.whl
example_pkg_YOUR_USERNAME_HERE-0.0.1.tar.gz
The tar.gz
file is a Source Archive whereas the .whl
file is a Built Distribution. Newer pip versions preferentially install built distributions, but will fall back to source archives if needed. You should always upload a source archive and provide built archives for the platforms your project is compatible with.
What can you do with those two files ?
Install them:
You can use the .whl
or the .tar.gz
file to install your package
pip3 install dist/example_pkg_YOUR_USERNAME_HERE-0.0.1.tar.gz --user
Upload them
You can upload your package to pypi, but first you can run tests on https://test.pypi.org/. As https://pypi.org is an archive, if you upload broken packages, they will stay there.
You first need to create an account https://test.pypi.org/account/register/
Then we use the twine
tools that we installed before
twine upload --skip-existing --repository testpypi dist/*
The output should look like that:
Uploading distributions to https://test.pypi.org/legacy/
Enter your username: [your username]
Enter your password:
Uploading example_pkg_YOUR_USERNAME_HERE-0.0.1-py3-none-any.whl
100%|█████████████████████| 4.65k/4.65k [00:01<00:00, 2.88kB/s]
Uploading example_pkg_YOUR_USERNAME_HERE-0.0.1.tar.gz
100%|█████████████████████| 4.25k/4.25k [00:01<00:00, 3.05kB/s]
To install your package from https://test.pypi.org you can use the following pip
options:
pip install --index-url https://test.pypi.org/simple/ --no-deps example-pkg-YOUR-USERNAME-HERE --user
You should be able to open a python console anywhere and run:
>>> import example_pkg
When everything is OK, you can create an account on https://pypi.org and use the twine
command without the --repository testpypi
option.
Creating executable software
You can also use pip
to distribute executable software. To do that, you have to specify the __main__
function to execute when calling your software in the setup.py
file.
setuptools.setup( ... entry_points={ 'console_scripts': ['example_pkg=example_pkg.__main__:main'], }, ... )
You can have different executable in this list with the format EXECUTABLE_NAME=LIBRARY.FILE:FUNCTION
After the installation, calling example_pkg
will run your software if your $PATH
is correctly configured.
Adding dependencies to your package
As your project will grow more complex, you will split it into different file for code clarity.
Your __init__.py
file will need to contain a list of all the .py
files in the example_pkg
repository:
#!/usr/bin/env python3 # -*-coding:Utf-8 -* """ idr library """ name = "midr" __all__ = ["__main__", "idr", "samic", "archimedean", "archimedean_plots", "log", "narrowpeak", "raw_matrix", "auxiliary"]
As you don’t want to reinvent the wheel, you may also import other python library (which could be installed with pip
). You can specify a list of these libraries in the setup.py
file:
setuptools.setup( ... install_requires=[ 'cmake>=3.18' 'scipy>=1.3', 'numpy>=1.16', 'pynverse>=0.1', 'pandas>=0.25.0', 'mpmath>=1.1.0', 'matplotlib>=3.0.0' ], ... )
Don’t forget to specify the version of each dependency to ensure that the function you use are present in the installed library.
If, some packages are required for the installation of your package (for example here cmake
), you should also add them to the install_requires
list.
Sometimes you’ll want to use packages that are properly arranged with setuptools, but aren’t published to PyPI. In those cases, you can specify a list of one or more dependency_links
URLs where the package can be downloaded, along with some additional hints, and setuptools will find and install the package correctly.
setup( ... dependency_links=['http://github.com/user/repo/tarball/master#egg=package-1.0'] ... )
pybind11 and other unnecessary complications
Sometime, you code is slow and instead of blaming yourself for your poor algorithm, you can blame python. pybind1 allows you to do just that.
pybind11 is a lightweight header-only library that exposes C++ types in Python and vice versa, mainly to create Python bindings of existing C++ code. Its goals and syntax are similar to the excellent Boost.Python library by David Abrahams: to minimize boilerplate code in traditional extension modules by inferring type information using compile-time introspection.
So great you can now have a lightweight interface to recode some of your function into C/C++. But what about packaging ? setuptools
almost only understand python (it can compile simple C/C++ code).
The https://github.com/pybind/cmake_example repository gives an example on how to use cmake within a setup.py
script.
Ideally we want to:
- use
setuptools
to build standards pypi packages - use
cmake
to compile complex C/C++ library - be able to include loots of C/C++ libraries (because writing C/C++ code is a pain, and some people do it better than ourselves)
Simple C/C++ code
The https://github.com/pybind/cmake_example repository gives an example on how to use cmake within a setup.py
script.
The idea is to write a CMakeExtension
class from the Extension
class to rewrite the default Extention
attributes (we don’t want setuptools
to try to do it’s own compilation on top of our cmake
compilation). And then use the information retrieved by CMakeExtension
to run cmake
as a subprocess in with a CMakeBuild
class.
import os import re import sys import platform import subprocess from setuptools import setup, Extension from setuptools.command.build_ext import build_ext from distutils.version import LooseVersion class CMakeExtension(Extension): def __init__(self, name, sourcedir=''): Extension.__init__(self, name, sources=[]) self.sourcedir = os.path.abspath(sourcedir) class CMakeBuild(build_ext): def run(self): try: out = subprocess.check_output(['cmake', '--version']) except OSError: raise RuntimeError("CMake must be installed to build the following extensions: " + ", ".join(e.name for e in self.extensions)) if platform.system() == "Windows": cmake_version = LooseVersion(re.search(r'version\s*([\d.]+)', out.decode()).group(1)) if cmake_version < '3.1.0': raise RuntimeError("CMake >= 3.1.0 is required on Windows") for ext in self.extensions: self.build_extension(ext) def build_extension(self, ext): extdir = os.path.abspath(os.path.dirname(self.get_ext_fullpath(ext.name))) # required for auto-detection of auxiliary "native" libs if not extdir.endswith(os.path.sep): extdir += os.path.sep cmake_args = ['-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=' + extdir, '-DPYTHON_EXECUTABLE=' + sys.executable] cfg = 'Debug' if self.debug else 'Release' build_args = ['--config', cfg] if platform.system() == "Windows": cmake_args += ['-DCMAKE_LIBRARY_OUTPUT_DIRECTORY_{}={}'.format(cfg.upper(), extdir)] if sys.maxsize > 2**32: cmake_args += ['-A', 'x64'] build_args += ['--', '/m'] else: cmake_args += ['-DCMAKE_BUILD_TYPE=' + cfg] build_args += ['--', '-j2'] env = os.environ.copy() env['CXXFLAGS'] = '{} -DVERSION_INFO=\\"{}\\"'.format(env.get('CXXFLAGS', ''), self.distribution.get_version()) if not os.path.exists(self.build_temp): os.makedirs(self.build_temp) subprocess.check_call(['cmake', ext.sourcedir] + cmake_args, cwd=self.build_temp, env=env) subprocess.check_call(['cmake', '--build', '.'] + build_args, cwd=self.build_temp) setup( name='cmake_example', version='0.0.1', author='Dean Moldovan', author_email='dean0x7d@gmail.com', description='A test project using pybind11 and CMake', long_description='', ext_modules=[CMakeExtension('cmake_example')], cmdclass=dict(build_ext=CMakeBuild), zip_safe=False, )
You just need to as your CMakeList.txt
in your src/
folder and your .cpp
file in a folder of your choise within the src/
folder (here src/
).
cmake_minimum_required(VERSION 2.8.12)
project(cmake_example)
add_subdirectory(pybind11)
pybind11_add_module(cmake_example src/main.cpp)
Finally, you want your main.cpp
file to be included in your package (by default only the .py
files are going to be included). Therefore, you have to write a MANIFEST.in
in your src/
:
include README.md LICENSE
global-include CMakeLists.txt *.cmake
recursive-include src *
recursive-include pybind11/include *.h
From now on we will use the example of the midr project (https://gitbio.ens-lyon.fr/LBMC/sbdm/midr)
Built Distribution vs Source Archive
Built Distribution
A Distribution format containing files and metadata that only need to be moved to the correct location on the target system, to be installed. Wheel is such a format, whereas distutil’s Source Distribution is not, in that it requires a build step before it can be installed. This format does not imply that Python files have to be precompiled (Wheel intentionally does not include compiled Python files).
Advantages:
- Quick to install
Disadvantages:
- Can be system specific (especially with C/C++ dependencies)
Source Archive
An archive containing the raw source code for a Release, prior to creation of a Source Distribution or Built Distribution.
Advantages:
- Easily build on any systems
Disadvantages:
- You have to compile everything with each installation
Manylinux project
Linux comes in many variants and flavors, such as Debian, CentOS, Fedora, and Pacman. Each of these may use slight variations in shared libraries, such as libncurses
, and core C libraries, such as glibc
.
If you’re writing a C/C++ extension, then this could create a problem. A source file written in C and compiled on Ubuntu Linux isn’t guaranteed to be executable on a CentOS machine or an Arch Linux distribution. Do you need to build a separate wheel for each and every Linux variant?
The goal of the manylinux project is to provide a convenient way to distribute binary Python extensions as wheels on Linux. This effort has produced PEP 513 which is further enhanced by PEP 571 defining
manylinux2010_x86_64
andmanylinux2010_i686
platform tags.PEP 513 defined
manylinux1_x86_64
andmanylinux1_i686
platform tags and the wheels were built on Centos5. Centos5 reached End of Life (EOL) on March 31st, 2017 and thus PEP 571 was proposed.
Which mean that instead of having Built distribution file like that midr-1.3.9-cp38-cp38-linux_x86_64.whl
which won’t be accepted by pypi, we will get at midr-1.3.9-cp36-cp36m-manylinux1_x86_64.whl
file.
For this we will build the package within a manylinux container (hosted on quay.io)
docker run -it --volume $(pwd):/root/ quay.io/pypa/manylinux1_x86_64
cd /root/
The image has different version of python installed in /opt/python/
cd /root/
/opt/python/cp38-cp38/bin/pip3 install cmake
PATH=$PATH:/opt/_internal/cpython-3.8.5/bin/
/opt/python/cp36-36mu/pip wheel ./ -w output
will produce a binary wheel in /output. However, this will still not be a manylinux wheel, since it is possible to build wheels that accidentally depend on other libraries.
The auditwheel tool will take that wheel, audit it, and copy it to a manylinux name:
auditwheel repair output/midr*whl -w output
INFO:auditwheel.main_repair:Repairing midr-1.3.9-cp38-cp38-linux_x86_64.whl
INFO:auditwheel.wheeltools:Previous filename tags: linux_x86_64
INFO:auditwheel.wheeltools:New filename tags: manylinux1_x86_64
INFO:auditwheel.wheeltools:Previous WHEEL info tags: cp38-cp38-linux_x86_64
INFO:auditwheel.wheeltools:New WHEEL info tags: cp38-cp38-manylinux1_x86_64
INFO:auditwheel.main_repair:
Fixed-up wheel written to /root/output/midr-1.3.9-cp38-cp38-manylinux1_x86_64.whl
Then we can exit the container and fix the rights of the output folder (maybe use singularity next time ?):
sudo chown -R $USER:$USER output
mv output/* dist/
twine upload dist/midr-1.3.9-cp38-cp38-manylinux1_x86_64.whl --skip-existing --verbose
GL & HF