邻接表关系

Adjacency List Relationships

邻接列表 模式是一种常见的关系模式,其中一个表包含对自身的外键引用,换句话说就是一个 自引用关系 。这是在平面表中表示层次数据的最常见方法。其他方法包括 嵌套集合 ,有时称为“修改的先序”,以及 物化路径 。尽管修改的先序在SQL查询中的流畅性上具有吸引力,但由于并发性、降低复杂性以及修改的先序对于能够完全加载子树到应用空间的应用程序没有太大优势的原因,邻接列表模型可能是大多数层次存储需求中最合适的模式。

参见

本节详细介绍了单表版本的自引用关系。有关使用第二个表作为关联表的自引用关系,请参阅 自引用多对多关系 部分。

在本例中,我们将使用一个名为 Node 的映射类来表示树结构:

class Node(Base):
    __tablename__ = "node"
    id = mapped_column(Integer, primary_key=True)
    parent_id = mapped_column(Integer, ForeignKey("node.id"))
    data = mapped_column(String(50))
    children = relationship("Node")

使用这种结构,一个如下的图:

root --+---> child1
       +---> child2 --+--> subchild1
       |              +--> subchild2
       +---> child3

将表示为如下数据:

id       parent_id     data
---      -------       ----
1        NULL          root
2        1             child1
3        1             child2
4        3             subchild1
5        3             subchild2
6        1             child3

这里的 relationship() 配置与“普通”一对多关系的工作方式相同,唯一的例外是“方向”,即关系是一对多还是多对一,默认情况下假定为一对多。要建立多对一关系,需要添加一个额外的指令,称为 relationship.remote_side,它是一个 Column 或一组 Column 对象,指示那些应被视为“远程”的列:

class Node(Base):
    __tablename__ = "node"
    id = mapped_column(Integer, primary_key=True)
    parent_id = mapped_column(Integer, ForeignKey("node.id"))
    data = mapped_column(String(50))
    parent = relationship("Node", remote_side=[id])

在上面的代码中, id 列被应用为 parent relationship()relationship.remote_side,从而将 parent_id 确立为“本地”端,关系然后表现为多对一。

一如既往,可以使用两个由 relationship.back_populates 连接的 relationship() 构造将两个方向组合成双向关系:

class Node(Base):
    __tablename__ = "node"
    id = mapped_column(Integer, primary_key=True)
    parent_id = mapped_column(Integer, ForeignKey("node.id"))
    data = mapped_column(String(50))
    children = relationship("Node", back_populates="parent")
    parent = relationship("Node", back_populates="children", remote_side=[id])

参见

邻接表 - 更新后的 SQLAlchemy 2.0 实例

The adjacency list pattern is a common relational pattern whereby a table contains a foreign key reference to itself, in other words is a self referential relationship. This is the most common way to represent hierarchical data in flat tables. Other methods include nested sets, sometimes called “modified preorder”, as well as materialized path. Despite the appeal that modified preorder has when evaluated for its fluency within SQL queries, the adjacency list model is probably the most appropriate pattern for the large majority of hierarchical storage needs, for reasons of concurrency, reduced complexity, and that modified preorder has little advantage over an application which can fully load subtrees into the application space.

参见

This section details the single-table version of a self-referential relationship. For a self-referential relationship that uses a second table as an association table, see the section 自引用多对多关系.

In this example, we’ll work with a single mapped class called Node, representing a tree structure:

class Node(Base):
    __tablename__ = "node"
    id = mapped_column(Integer, primary_key=True)
    parent_id = mapped_column(Integer, ForeignKey("node.id"))
    data = mapped_column(String(50))
    children = relationship("Node")

With this structure, a graph such as the following:

root --+---> child1
    +---> child2 --+--> subchild1
    |              +--> subchild2
    +---> child3

Would be represented with data such as:

id       parent_id     data
---      -------       ----
1        NULL          root
2        1             child1
3        1             child2
4        3             subchild1
5        3             subchild2
6        1             child3

The relationship() configuration here works in the same way as a “normal” one-to-many relationship, with the exception that the “direction”, i.e. whether the relationship is one-to-many or many-to-one, is assumed by default to be one-to-many. To establish the relationship as many-to-one, an extra directive is added known as relationship.remote_side, which is a Column or collection of Column objects that indicate those which should be considered to be “remote”:

class Node(Base):
    __tablename__ = "node"
    id = mapped_column(Integer, primary_key=True)
    parent_id = mapped_column(Integer, ForeignKey("node.id"))
    data = mapped_column(String(50))
    parent = relationship("Node", remote_side=[id])

Where above, the id column is applied as the relationship.remote_side of the parent relationship(), thus establishing parent_id as the “local” side, and the relationship then behaves as a many-to-one.

As always, both directions can be combined into a bidirectional relationship using two relationship() constructs linked by relationship.back_populates:

class Node(Base):
    __tablename__ = "node"
    id = mapped_column(Integer, primary_key=True)
    parent_id = mapped_column(Integer, ForeignKey("node.id"))
    data = mapped_column(String(50))
    children = relationship("Node", back_populates="parent")
    parent = relationship("Node", back_populates="children", remote_side=[id])

参见

邻接表 - working example, updated for SQLAlchemy 2.0

复合邻接表

Composite Adjacency Lists

邻接列表关系的一个子类别是一个罕见的情况,其中一个特定列在连接条件的“本地(local)”和“远程(remote)”两侧都存在。下面的 Folder 类是一个示例;使用复合主键, account_id 列指向自身,以指示与父文件夹属于同一帐户的子文件夹;而 folder_id 则指向该帐户中的特定文件夹:

class Folder(Base):
    __tablename__ = "folder"
    __table_args__ = (
        ForeignKeyConstraint(
            ["account_id", "parent_id"], ["folder.account_id", "folder.folder_id"]
        ),
    )

    account_id = mapped_column(Integer, primary_key=True)
    folder_id = mapped_column(Integer, primary_key=True)
    parent_id = mapped_column(Integer)
    name = mapped_column(String)

    parent_folder = relationship(
        "Folder", back_populates="child_folders", remote_side=[account_id, folder_id]
    )

    child_folders = relationship("Folder", back_populates="parent_folder")

在上面的代码中,我们将 account_id 传递到 relationship.remote_side 列表中。relationship() 识别出这里的 account_id 列在两侧都有,并将“远程”列与 folder_id 列对齐,后者被识别为仅存在于“远程”一侧。

A sub-category of the adjacency list relationship is the rare case where a particular column is present on both the “local” and “remote” side of the join condition. An example is the Folder class below; using a composite primary key, the account_id column refers to itself, to indicate sub folders which are within the same account as that of the parent; while folder_id refers to a specific folder within that account:

class Folder(Base):
    __tablename__ = "folder"
    __table_args__ = (
        ForeignKeyConstraint(
            ["account_id", "parent_id"], ["folder.account_id", "folder.folder_id"]
        ),
    )

    account_id = mapped_column(Integer, primary_key=True)
    folder_id = mapped_column(Integer, primary_key=True)
    parent_id = mapped_column(Integer)
    name = mapped_column(String)

    parent_folder = relationship(
        "Folder", back_populates="child_folders", remote_side=[account_id, folder_id]
    )

    child_folders = relationship("Folder", back_populates="parent_folder")

Above, we pass account_id into the relationship.remote_side list. relationship() recognizes that the account_id column here is on both sides, and aligns the “remote” column along with the folder_id column, which it recognizes as uniquely present on the “remote” side.

自引用查询策略

Self-Referential Query Strategies

自引用结构的查询与其他查询一样工作:

# 获取所有名为 'child2' 的节点
session.scalars(select(Node).where(Node.data == "child2"))

但是,当尝试沿着树的一个级别到下一个级别的外键进行连接时,需要特别注意。在 SQL 中,从一个表到自身的连接要求表达式的至少一侧被“别名化(aliased)”,以便可以明确引用它。

回想一下 ORM 教程中的 选择 ORM 别名aliased() 构造通常用于提供 ORM 实体的“别名”。使用这种技术从 Node 到自身的连接如下所示:

from sqlalchemy.orm import aliased

nodealias = aliased(Node)
session.scalars(
    select(Node)
    .where(Node.data == "subchild1")
    .join(Node.parent.of_type(nodealias))
    .where(nodealias.data == "child2")
).all()
SELECT node.id AS node_id, node.parent_id AS node_parent_id, node.data AS node_data FROM node JOIN node AS node_1 ON node.parent_id = node_1.id WHERE node.data = ? AND node_1.data = ? ['subchild1', 'child2']

Querying of self-referential structures works like any other query:

# get all nodes named 'child2'
session.scalars(select(Node).where(Node.data == "child2"))

However extra care is needed when attempting to join along the foreign key from one level of the tree to the next. In SQL, a join from a table to itself requires that at least one side of the expression be “aliased” so that it can be unambiguously referred to.

Recall from 选择 ORM 别名 in the ORM tutorial that the aliased() construct is normally used to provide an “alias” of an ORM entity. Joining from Node to itself using this technique looks like:

from sqlalchemy.orm import aliased

nodealias = aliased(Node)
session.scalars(
    select(Node)
    .where(Node.data == "subchild1")
    .join(Node.parent.of_type(nodealias))
    .where(nodealias.data == "child2")
).all()
SELECT node.id AS node_id, node.parent_id AS node_parent_id, node.data AS node_data FROM node JOIN node AS node_1 ON node.parent_id = node_1.id WHERE node.data = ? AND node_1.data = ? ['subchild1', 'child2']

配置自引用预加载

Configuring Self-Referential Eager Loading

在正常查询操作期间,关系的预加载(eager loading)使用从父表到子表的连接或外连接进行,这样父表及其直接子集合或引用可以从单个 SQL 语句中填充,或通过第二个语句填充所有直接子集合。SQLAlchemy 的连接和子查询预加载在连接到相关项时始终使用别名表,因此与自引用连接兼容。然而,要对自引用关系使用预加载,SQLAlchemy 需要知道应该连接和/或查询多深的层次;否则,根本不会进行预加载。此深度设置通过 relationships.join_depth 配置:

class Node(Base):
    __tablename__ = "node"
    id = mapped_column(Integer, primary_key=True)
    parent_id = mapped_column(Integer, ForeignKey("node.id"))
    data = mapped_column(String(50))
    children = relationship("Node", lazy="joined", join_depth=2)


session.scalars(select(Node)).all()
SELECT node_1.id AS node_1_id, node_1.parent_id AS node_1_parent_id, node_1.data AS node_1_data, node_2.id AS node_2_id, node_2.parent_id AS node_2_parent_id, node_2.data AS node_2_data, node.id AS node_id, node.parent_id AS node_parent_id, node.data AS node_data FROM node LEFT OUTER JOIN node AS node_2 ON node.id = node_2.parent_id LEFT OUTER JOIN node AS node_1 ON node_2.id = node_1.parent_id []

Eager loading of relationships occurs using joins or outerjoins from parent to child table during a normal query operation, such that the parent and its immediate child collection or reference can be populated from a single SQL statement, or a second statement for all immediate child collections. SQLAlchemy’s joined and subquery eager loading use aliased tables in all cases when joining to related items, so are compatible with self-referential joining. However, to use eager loading with a self-referential relationship, SQLAlchemy needs to be told how many levels deep it should join and/or query; otherwise the eager load will not take place at all. This depth setting is configured via relationships.join_depth:

class Node(Base):
    __tablename__ = "node"
    id = mapped_column(Integer, primary_key=True)
    parent_id = mapped_column(Integer, ForeignKey("node.id"))
    data = mapped_column(String(50))
    children = relationship("Node", lazy="joined", join_depth=2)


session.scalars(select(Node)).all()
SELECT node_1.id AS node_1_id, node_1.parent_id AS node_1_parent_id, node_1.data AS node_1_data, node_2.id AS node_2_id, node_2.parent_id AS node_2_parent_id, node_2.data AS node_2_data, node.id AS node_id, node.parent_id AS node_parent_id, node.data AS node_data FROM node LEFT OUTER JOIN node AS node_2 ON node.id = node_2.parent_id LEFT OUTER JOIN node AS node_1 ON node_2.id = node_1.parent_id []