源分发格式¶
Source distribution format
当前的标准源代码分发格式通过分发档案中存在的 pyproject.toml
文件来标识。 这种分发的布局最初在 PEP 517 中指定,并在此处正式记录。
还有传统的源代码分发格式,它是由标准库中的 distutils
模块在执行 setup.py sdist 时的行为隐式定义的。本文档不尝试对该格式进行标准化,除了提到如果传统源代码分发包含一个使用元数据版本 2.2 或更高版本的 PKG-INFO
文件,那么它必须遵循元数据规范中定义的适用于源代码分发的规则。
源代码分发也被称为 sdists (简写)。
The current standard format of source distribution format is identified by the
presence of a pyproject.toml
file in the distribution archive. The layout
of such a distribution was originally specified in PEP 517 and is formally
documented here.
There is also the legacy source distribution format, implicitly defined by the
behaviour of distutils
module in the standard library, when executing
setup.py sdist. This document does not attempt to standardise this
format, except to note that if a legacy source distribution contains a
PKG-INFO
file using metadata version 2.2 or later, then it MUST follow
the rules applicable to source distributions defined in the metadata
specification.
Source distributions are also known as sdists for short.
源代码树¶
Source trees
源代码树 是一组文件和目录的集合——类似于版本控制系统的检出——其中包含一个 pyproject.toml
文件,可以用来从其中的文件和目录构建源代码分发。 PEP 517 和 PEP 518 规定了符合源代码树定义的要求,即 pyproject.toml
必须包含什么内容,才能被视为源代码树。
A source tree is a collection of files and directories -- like a version
control system checkout -- which contains a pyproject.toml
file that
can be use to build a source distribution from the contained files and
directories. PEP 517 and PEP 518 specify what is required to meet the
definition of what pyproject.toml
must contain for something to be
deemed a source tree.
源分发文件名¶
Source distribution file name
源代码分发文件的文件名在 PEP 625 中进行了标准化。文件名必须采用 {name}-{version}.tar.gz
格式,其中 {name}
按照与二进制分发相同的规则进行规范化(参见 二进制分发格式), {version}
是项目版本的规范化形式(参见 版本标识符)。
文件名中的名称和版本部分 必须 与文件中包含的元数据中的值匹配。
生成源代码分发文件的代码 必须 给文件一个符合该规范的文件名。这包括 build backend 的 build_sdist
钩子。
处理源代码分发文件的代码 可以 通过 .tar.gz
后缀和文件名中恰好 一个 连字符来识别源代码分发文件。这样做的代码可以直接从文件名中提取分发的名称和版本,而无需进一步验证。
The file name of a sdist was standardised in PEP 625. The file name must be in
the form {name}-{version}.tar.gz
, where {name}
is normalised according to
the same rules as for binary distributions (see 二进制分发格式),
and {version}
is the canonicalized form of the project version (see
版本标识符).
The name and version components of the filename MUST match the values stored in the metadata contained in the file.
Code that produces a source distribution file MUST give the file a name that matches
this specification. This includes the build_sdist
hook of a
build backend.
Code that processes source distribution files MAY recognise source distribution files
by the .tar.gz
suffix and the presence of precisely one hyphen in the filename.
Code that does this may then use the distribution name and version from the filename
without further verification.
源分发文件格式¶
Source distribution file format
.tar.gz
源代码分发(sdist)包含一个名为 {name}-{version}
的顶级目录(例如 foo-1.0
),该目录包含包的源代码文件。名称和版本 必须 与文件中存储的元数据匹配。该目录还必须包含一个符合 pyproject.toml 规范 中定义格式的 pyproject.toml
文件,以及一个包含元数据的 PKG-INFO
文件,该格式在 核心元数据规范 规范中进行了描述。元数据 必须 至少符合版本 2.2 的元数据规范。
源代码分发文件中不要求或不定义其他内容。构建系统可以在源代码分发中存储它们构建项目所需的任何信息。
tar 包应使用现代的 POSIX.1-2001 pax tar 格式,该格式指定了基于 UTF-8 的文件名。特别地,源代码分发文件必须能够通过标准库的 tarfile 模块使用 r:gz
打开标志进行读取。
A .tar.gz
source distribution (sdist) contains a single top-level directory
called {name}-{version}
(e.g. foo-1.0
), containing the source files of
the package. The name and version MUST match the metadata stored in the file.
This directory must also contain a pyproject.toml
in the format defined in
pyproject.toml 规范, and a PKG-INFO
file containing
metadata in the format described in the 核心元数据规范 specification. The
metadata MUST conform to at least version 2.2 of the metadata specification.
No other content of a sdist is required or defined. Build systems can store whatever information they need in the sdist to build the project.
The tarball should use the modern POSIX.1-2001 pax tar format, which specifies UTF-8 based file names. In particular, source distribution files must be readable using the standard library tarfile module with the open flag 'r:gz'.
源分发存档功能¶
Source distribution archive features
由于直接提取 tar 文件是危险的,并且结果具有平台特异性,因此源代码分发文件的归档功能是有限制的。
Because extracting tar files as-is is dangerous, and the results are platform-specific, archive features of source distributions are limited.
使用数据过滤器解包¶
Unpacking with the data filter
在提取源代码分发文件时,工具必须使用 tarfile.data_filter()
(例如 TarFile.extractall(..., filter='data')
),或者遵循下面的 没有数据过滤器的解包 部分。
作为例外,对于没有 hasattr(tarfile, 'data_filter') <tarfile.data_filter>`(参见 :pep:`706()
)的 Python 解释器,通常使用该过滤器(直接或间接)的工具可以向用户发出警告并忽略此规范。在这种情况下,工具可以根据可用性(例如完全信任归档文件)和安全性(例如拒绝解包)之间的权衡来决定。
When extracting a source distribution, tools MUST either use
tarfile.data_filter()
(e.g. TarFile.extractall(..., filter='data')
), OR
follow the Unpacking without the data filter section below.
As an exception, on Python interpreters without hasattr(tarfile, 'data_filter')
(PEP 706), tools that normally use that filter (directly on indirectly)
MAY warn the user and ignore this specification.
The trade-off between usability (e.g. fully trusting the archive) and
security (e.g. refusing to unpack) is left up to the tool in this case.
不使用数据过滤器解包¶
Unpacking without the data filter
不直接使用 data
过滤器的工具(例如出于向后兼容性、允许附加功能或未使用 Python)必须遵循本节内容。
(截至本文写作时, data
过滤器也遵循本节,但未来可能会不同步。)
以下文件在 sdist 存档中无效。遇到此类条目时,工具应通知用户, 不得解包该条目,并且 可以中止并报告失败 :
会被放置在目标目录之外的文件。
指向目标目录之外的符号链接(或硬链接)。
设备文件(包括管道)。
以下内容也是无效的。工具可以像上述那样处理它们,但 不要求必须这样做:
文件名或链接目标中包含
..
的文件。指向不属于归档文件的文件的链接。
工具可以将链接(符号链接或硬链接)解包为常规文件, 并使用归档中的内容。
解包 sdist 存档时:
文件名中的前导斜杠必须被去掉。 (如今这是
tar
解包的标准行为。)对于每个
mode
(Unix 权限)位,工具必须执行以下操作之一:使用平台的默认值来创建新文件/目录(分别适用于文件和目录),
根据归档设置该位,或
对于非可执行文件使用
rw-r--r--
(0o644
), 对于可执行文件和目录使用rwxr-xr-x
(0o755
)。
高位
mode
(setuid、setgid、sticky)必须被清除。推荐保留用户 可执行 位。
Tools that do not use the data
filter directly (e.g. for backwards
compatibility, allowing additional features, or not using Python) MUST follow
this section.
(At the time of this writing, the data
filter also follows this section,
but it may get out of sync in the future.)
The following files are invalid in an sdist archive. Upon encountering such an entry, tools SHOULD notify the user, MUST NOT unpack the entry, and MAY abort with a failure:
Files that would be placed outside the destination directory.
Links (symbolic or hard) pointing outside the destination directory.
Device files (including pipes).
The following are also invalid. Tools MAY treat them as above, but are NOT REQUIRED to do so:
Files with a
..
component in the filename or link target.Links pointing to a file that is not part of the archive.
Tools MAY unpack links (symbolic or hard) as regular files, using content from the archive.
When extracting sdist archives:
Leading slashes in file names MUST be dropped. (This is nowadays standard behaviour for
tar
unpacking.)For each
mode
(Unix permission) bit, tools MUST either:use the platform's default for a new file/directory (respectively),
set the bit according to the archive, or
use the bit from
rw-r--r--
(0o644
) for non-executable files orrwxr-xr-x
(0o755
) for executable files and directories.
High
mode
bits (setuid, setgid, sticky) MUST be cleared.It is RECOMMENDED to preserve the user executable bit.
更多提示¶
Further hints
鼓励工具作者考虑 tarfile
文档中的 进一步验证的提示 如何应用于他们的工具。
Tool authors are encouraged to consider how hints for further verification in tarfile
documentation apply to their tool.
历史¶
History
November 2020: The original version of this specification was approved through PEP 643.
July 2021: Defined what a source tree is.
September 2022: The filename of a source distribution was standardized through PEP 625.
August 2023: Source distribution archive features were standardized through PEP 721.