分发包与导入包¶
Distribution package vs. import package
“包”这个词通常指代多个不同的概念。本页面旨在澄清 Python 打包中两个不同但相关的含义,即“发行包”(distribution package)和“导入包”(import package)之间的区别。
A number of different concepts are commonly referred to by the word "package". This page clarifies the differences between two distinct but related meanings in Python packaging, "distribution package" and "import package".
什么是分发包?¶
What's a distribution package?
发行包(distribution package)是指一个可以安装的软件包。大多数情况下,这与“项目”(project)是同义的。当你输入 pip install pkg
,或者在 pyproject.toml
文件中写入 dependencies = ["pkg"]
时, pkg
就是发行包的名称。当你在 PyPI 上搜索或浏览时,看到的通常是一个发行包的列表, PyPI 是最广为人知的用于安装 Python 库和工具的集中源。另一方面,术语“发行包”有时也可以用来指代包含某个项目特定版本的文件。
需要注意的是,在 Linux 环境中,“发行包”(distribution package)通常被简称为“distro 包”或“包”,是由 Linux 发行版 的系统包管理器提供的,这与 Python 打包中的定义是不同的。
A distribution package is a piece of software that you can install.
Most of the time, this is synonymous with "project". When you type pip
install pkg
, or when you write dependencies = ["pkg"]
in your
pyproject.toml
, pkg
is the name of a distribution package. When
you search or browse the PyPI, the most widely known centralized source for
installing Python libraries and tools, what you see is a list of distribution
packages. Alternatively, the term "distribution package" can be used to
refer to a specific file that contains a certain version of a project.
Note that in the Linux world, a "distribution package", most commonly abbreviated as "distro package" or just "package", is something provided by the system package manager of the Linux distribution, which is a different meaning.
什么是导入包?¶
What's an import package?
导入包(import package)是一个 Python 模块。因此,当你在 Python 代码中写 import pkg
或 from pkg import func
时, pkg
就是一个导入包的名称。更准确地说,导入包是可以包含子模块的特殊 Python 模块。例如, numpy
包包含了像 numpy.linalg
和 numpy.fft
这样的模块。通常,导入包是文件系统中的一个目录,其中包含 .py
文件作为模块,并且包含子目录作为子包(subpackage)。
只要你安装了提供该包的发行包,就可以使用该导入包。
An import package is a Python module. Thus, when you write import
pkg
or from pkg import func
in your Python code, pkg
is the
name of an import package. More precisely, import packages are special
Python modules that can contain submodules. For example, the numpy
package contains modules like numpy.linalg
and
numpy.fft
. Usually, an import package is a directory on the file
system, containing modules as .py
files and subpackages as
subdirectories.
You can use an import package as soon as you have installed a distribution package that provides it.
分发包和导入包之间有什么联系?¶
What are the links between distribution packages and import packages?
大多数情况下,一个发行包提供一个单一的导入包(或非包模块),且两者的名称是相匹配的。例如,执行 pip install numpy
后,你可以通过 import numpy
来导入该包。
然而,这仅仅是一种约定。PyPI 和其他包索引 并不强制 发行包的名称与它所提供的导入包之间存在任何关系。(这意味着你不能盲目地安装 PyPI 包 foo
,即使你在代码中看到 import foo
;这可能会安装一个意外的,甚至是恶意的包。)
一个发行包也可以提供一个与其名称不同的导入包。一个例子是流行的图像处理库 Pillow。它的发行包名称是 Pillow
,但它提供的导入包是 PIL
。这是出于历史原因:Pillow 最初是 PIL 库的一个分支,因此保留了导入名称 PIL
,以便现有的 PIL 用户可以轻松切换到 Pillow。更一般来说,现有库的分支是导致发行包和导入包名称不同的常见原因。
在一个给定的包索引(如 PyPI)上,发行包的名称必须是唯一的。另一方面,导入包没有这样的要求。多个发行包可以提供相同名称的导入包。再次强调,分支是造成这种情况的常见原因。
反过来,一个发行包也可以提供多个导入包,尽管这种情况较少见。例如,attrs
发行包同时提供了一个带有新 API 的 attrs
导入包和一个较旧但仍被支持的 attr
导入包。
Most of the time, a distribution package provides one single import
package (or non-package module), with a matching name. For example,
pip install numpy
lets you import numpy
.
However, this is only a convention. PyPI and other package indices do not
enforce any relationship between the name of a distribution package and the
import packages it provides. (A consequence of this is that you cannot blindly
install the PyPI package foo
if you see import foo
; this may install an
unintended, and potentially even malicious package.)
A distribution package could provide an import package with a different
name. An example of this is the popular Pillow library for image
processing. Its distribution package name is Pillow
, but it provides
the import package PIL
. This is for historical reasons: Pillow
started as a fork of the PIL library, thus it kept the import name
PIL
so that existing PIL users could switch to Pillow with little
effort. More generally, a fork of an existing library is a common reason
for differing names between the distribution package and the import
package.
On a given package index (like PyPI), distribution package names must be unique. On the other hand, import packages have no such requirement. Import packages with the same name can be provided by several distribution packages. Again, forks are a common reason for this.
Conversely, a distribution package can provide several import packages,
although this is less common. An example is the attrs distribution
package, which provides both an attrs
import package with a newer
API, and an attr
import package with an older but supported API.
分发包名称和导入包名称如何比较?¶
How do distribution package names and import package names compare?
导入包的名称应该是有效的 Python 标识符(详细规则可以参见 Python 文档中的 exact rules)[#non-identifier-mod-name1]_ 。特别地,它们使用下划线 _
作为单词分隔符,并且名称是区分大小写的。
另一方面,发行包的名称可以使用连字符 -
或下划线 _
。它们还可以包含点号 .
,有时用于打包一个 namespace package 的子包。对于大多数用途,发行包名称对大小写和 -
与 _
的差异不敏感,例如,pip install Awesome_Package
和 pip install awesome-package
是等效的(具体规则见 name normalization specification)。
Import packages should have valid Python identifiers as their name (the
exact rules are found in the Python
documentation) [2]. In particular, they use underscores _
as word
separator and they are case-sensitive.
On the other hand, distribution packages can use hyphens -
or
underscores _
. They can also contain dots .
, which is sometimes
used for packaging a subpackage of a namespace package. For most purposes, they are insensitive
to case and to -
vs. _
differences, e.g., pip install
Awesome_Package
is the same as pip install awesome-package
(the
precise rules are given in the name normalization specification).