6.18. AVL树实现¶
6.18. AVL Tree Implementation
现在我们已经演示了保持 AVL 树平衡将大大提高性能,让我们看看如何扩展插入新键的过程。由于所有新键都作为叶节点插入树中,我们知道新叶的平衡因子为零,因此对刚插入的节点没有新的要求。但一旦新叶被添加,我们必须更新其父节点的平衡因子。新叶对父节点平衡因子的影响取决于叶节点是左孩子还是右孩子。如果新节点是右孩子,父节点的平衡因子将减少 1。如果新节点是左孩子,则父节点的平衡因子将增加 1。这个规则可以递归应用于新节点的祖父节点,并可能应用于每个祖先,一直到树的根节点。由于这是一个递归过程,让我们检查更新平衡因子的两个基本情况:
- 递归调用已到达树的根节点。
- 父节点的平衡因子已调整为零。你应该确认,一旦子树的平衡因子为零,那么其祖先节点的平衡状态不会改变。
我们将实现 AVL 树作为 BinarySearchTree
的子类。首先,我们将重写 _put
方法,并编写一个新的 update_balance
辅助方法。这些方法见 Listing 1
。你会注意到,_put
的定义与简单的二叉搜索树完全相同,只是在第 9 行和第 17 行添加了对 update_balance
的调用。
def _put(self, key, value, current_node):
if key < current_node.key:
if current_node.left_child:
self._put(key, value, current_node.left_child)
else:
current_node.left_child = AVLTreeNode(
key, value, 0, parent=current_node
)
self.update_balance(current_node.left_child)
else:
if current_node.right_child:
self._put(key, value, current_node.right_child)
else:
current_node.right_child = AVLTreeNode(
key, value, 0, parent=current_node
)
self.update_balance(current_node.right_child)
def update_balance(self, node):
if node.balance_factor > 1 or node.balance_factor < -1:
self.rebalance(node)
return
if node.parent:
if node.is_left_child():
node.parent.balance_factor += 1
elif node.is_right_child():
node.parent.balance_factor -= 1
if node.parent.balance_factor != 0:
self.update_balance(node.parent)
新的 update_balance
方法是大部分工作的核心。它实现了我们刚才描述的递归过程。首先检查当前节点是否不平衡到需要重新平衡(第 20 行)。如果是这种情况,则进行重新平衡,不需要进一步更新父节点。如果当前节点不需要重新平衡,则调整父节点的平衡因子。如果父节点的平衡因子非零,则算法继续递归地调用 update_balance
来更新树中向上的节点,直到根节点。
当需要对树进行重新平衡时,我们怎么做呢?高效的重新平衡是使 AVL 树正常工作而不牺牲性能的关键。为了将 AVL 树重新平衡,我们将对树执行一个或多个旋转操作。
要理解旋转是什么,让我们看看一个非常简单的例子。考虑 Figure 3
左侧的树。这个树不平衡,平衡因子为 -2。为了将这棵树平衡过来,我们将围绕节点 A 的子树进行左旋转。
进行左旋转的基本步骤如下:
- 提升右孩子(B)为子树的根。
- 将旧根(A)移动到新根的左孩子。
- 如果新根(B)已经有左孩子,则将其设置为新左孩子(A)的右孩子。注意:由于新根(B)是 A 的右孩子,因此 A 的右孩子在此时必定为空。这允许我们添加一个新节点作为右孩子,而不需要进一步考虑。
虽然这个过程在概念上比较简单,但代码的细节有点复杂,因为我们需要按正确的顺序移动所有内容,以确保二叉搜索树的所有属性都得到保留。此外,我们需要确保适当地更新所有父指针。
让我们看一个稍微复杂的树来说明右旋转。Figure 4
的左侧展示了一个左重且根节点的平衡因子为 2 的树。
进行右旋转的基本步骤如下:
- 提升左孩子(C)为子树的根。
- 将旧根(E)移动到新根的右孩子。
- 如果新根(C)已经有右孩子(D),则将其设置为新右孩子(E)的左孩子。注意:由于新根(C)是 E 的左孩子,因此 E 的左孩子在此时必定为空。这允许我们添加一个新节点作为左孩子,而不需要进一步考虑。
现在你已经了解了旋转的基本概念,接下来我们来看代码。Listing 2
展示了左旋转的代码(rotate_right
方法与 rotate_left
对称,因此我们将留给你研究 rotate_right
的代码)。在第 2 行,我们创建了一个临时变量来跟踪子树的新根。如前所述,新根是之前根的右孩子。现在已经将右孩子的引用存储在这个临时变量中,我们将旧根的右孩子替换为新根的左孩子。
下一步是调整两个节点的父指针。如果 new_root
有左孩子,则新左孩子的父节点变为旧根。新根的父节点设置为旧根的父节点。如果旧根是整棵树的根,则我们必须将树的根指针设置为这个新根。否则,如果旧根是左孩子,则我们将左孩子的父节点更改为新根;否则,我们将右孩子的父节点更改为新根(第 10-13 行)。最后,我们将旧根的父节点设置为新根。这是复杂的记账工作,因此我们鼓励你在查看 Figure 3
的同时跟踪这个函数。
def rotate_left(self, rotation_root):
new_root = rotation_root.right_child
rotation_root.right_child = new_root.left_child
if new_root.left_child:
new_root.left_child.parent = rotation_root
new_root.parent = rotation_root.parent
if rotation_root.is_root():
self._root = new_root
else:
if rotation_root.is_left_child():
rotation_root.parent.left_child = new_root
else:
rotation_root.parent.right_child = new_root
new_root.left_child = rotation_root
rotation_root.parent = new_root
rotation_root.balance_factor = (
rotation_root.balance_factor + 1 - min(new_root.balance_factor, 0)
)
new_root.balance_factor = (
new_root.balance_factor + 1 + max(rotation_root.balance_factor, 0)
)
最后,第 16-21 行需要一些解释。在这些行中,我们更新了旧根和新根的平衡因子。由于所有其他移动涉及整个子树的移动,因此所有其他节点的平衡因子都不会受到旋转的影响。但我们如何在不完全重新计算新子树高度的情况下更新平衡因子?Figure 5
和下面的推导应该能让你相信这些行是正确的。
Figure 5
展示了左旋转。B 和 D 是关键节点,A、C、E 是它们的子树。令 \(h_x\) 表示以节点 \(x\) 为根的子树的高度。根据定义,我们知道以下内容:
\(\begin{align} new\_bal(B) &= h_A - h_C \\ old\_bal(B) &= h_A - h_D \end{align}\)
但我们知道 D 的旧高度也可以表示为 \(1 + max(h_C, h_E)\),即 D 的高度比其两个孩子的最大高度多一。记住,\(h_C\) 和 \(h_E\) 没有变化。因此,让我们将其代入第二个方程,得到:
\(old\_bal(B) = h_A - (1 + max(h_C,h_E))\)
然后减去两个方程。以下步骤进行减法,并使用一些代数简化 \(new\_bal(B)\) 的方程。
$\begin{align} new_bal(B) - old_bal(B) &= h_A - h_C - (h_A - (1 + max(h_C,h_E))) \ new_bal(B) - old_bal(B) &=
h_A - h_C - h_A + (1 + max(h_C,h_E)) \ new_bal(B) - old_bal(B) &= h_A - h_A + 1 + max(h_C,h_E) - h_C \ new_bal(B) - old_bal(B) &= 1 + max(h_C,h_E) - h_C \end{align}$
接下来我们将 \(old\_bal(B)\) 移到方程的右侧,并利用 \(max(a,b)-c = max(a-c, b-c)\)。
\(new\_bal(B) = old\_bal(B) + 1 + max(h_C - h_C ,h_E - h_C)\)
但 \(h_E - h_C\) 与 \(-old\_bal(D)\) 相同。所以我们可以使用另一个恒等式 \(max(-a,-b) = -min(a,b)\)。所以我们可以使用以下步骤完成 \(new\_bal(B)\) 的推导:
\(\begin{align} new\_bal(B) &= old\_bal(B) + 1 + max(0 , -old\_bal(D)) \\ new\_bal(B) &= old\_bal(B) + 1 - min(0 , old\_bal(D)) \end{align}\)
现在我们已经将所有部分转换为已知的形式。如果我们记住 B 是 rotation_root
,而 D 是 new_root
,那么我们可以看到这正好对应于 Listing 2
第 16-18 行的声明,即:
rotation_root.balance_factor = (
rotation_root.balance_factor + 1 - min(new_root.balance_factor, 0)
)
类似的推导也可以给出右旋转后的节点 D 的平衡因子的方程。我们留给你作为练习。
现在你可能认为我们已经完成了。我们知道如何进行左旋转和右旋转,也知道何时进行左旋转或右旋转。但看看 Figure 6
。由于节点 A 的平衡因子为 -2,我们应该进行左旋转。但当我们在 A 周围进行左旋转时会发生什么呢?
Figure 7 <fig_badrotate>
向我们展示了,在左旋转之后,我们现在在另一个方向上不平衡。如果我们进行右旋转来纠正这种情况,我们将重新回到原点。
为了解决这个问题,我们必须使用以下规则集:
- 如果子树需要左旋转来使其平衡,首先检查右孩子的平衡因子。如果右孩子是左重的,则在右孩子上先进行右旋转,然后再进行原始的左旋转。
- 如果子树需要右旋转来使其平衡,首先检查左孩子的平衡因子。如果左孩子是右重的,则在左孩子上先进行左旋转,然后再进行原始的右旋转。
Figure 8
展示了这些规则如何解决 Figure 6
和 Figure 7
中遇到的困境。首先在节点 C 上进行右旋转,将树放置在一个位置,在 A 上进行左旋转将整个子树重新平衡。
实现这些规则的代码可以在我们的 rebalance
方法中找到,如 Listing 3
所示。上面的规则第 1 条由第 2 行开始的 if
语句实现。规则第 2 条由第 8 行开始的 elif
语句实现。
Listing 3 | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
discussion questions
让你有机会对需要先进行左旋转后进行右旋转的树进行重新平衡。此外,讨论问题还提供了机会对比 Figure 8
中的树进行一些更复杂的重新平衡。
通过始终保持树的平衡,我们可以确保 get
方法的运行时间为 \(O(\log_2(n))\)。但问题是我们的 put
方法的代价是多少?让我们分解 put
执行的操作。由于新节点作为叶子节点插入,更新所有父节点的平衡因子将需要最多 \(log_2(n)\) 次操作,每个树的层级一次。如果发现子树不平衡,则需要最多两个旋转来将树重新平衡。但每次旋转的时间复杂度是 \(O(1)\),因此即使是 put
操作也保持在 \(O(\log_2(n))\)。
此时,我们已经实现了一个功能完整的 AVL 树,除非你需要删除节点的能力。我们将节点的删除及其后续的更新和重新平衡留给你作为练习。
Now that we have demonstrated that keeping an AVL tree in balance is going to be a big performance improvement, let's look at how we will augment the procedure to insert a new key into the tree. Since all new keys are inserted into the tree as leaf nodes and we know that the balance factor for a new leaf is zero, there are no new requirements for the node that has just been inserted. But once the new leaf is added, we must update the balance factor of its parent. How this new leaf affects the parent’s balance factor depends on whether the leaf node is a left child or a right child. If the new node is a right child, the balance factor of the parent will be reduced by one. If the new node is a left child, then the balance factor of the parent will be increased by one. This rule can be applied recursively to the grandparent of the new node, and possibly to every ancestor, all the way up to the root of the tree. Since this is a recursive procedure, let's examine the two base cases for updating balance factors:
- The recursive call has reached the root of the tree.
- The balance factor of the parent has been adjusted to zero. You should convince yourself that once a subtree has a balance factor of zero, then the balance of its ancestor nodes does not change.
We will implement the AVL tree as a subclass of BinarySearchTree
. To begin, we will override the _put
method and write a new update_balance
helper method. These methods are shown in Listing 1
. You will notice that the definition for _put
is exactly the same as in simple binary search trees except for the addition of the calls to update_balance
on lines 9 and 17.
def _put(self, key, value, current_node):
if key < current_node.key:
if current_node.left_child:
self._put(key, value, current_node.left_child)
else:
current_node.left_child = AVLTreeNode(
key, value, 0, parent=current_node
)
self.update_balance(current_node.left_child)
else:
if current_node.right_child:
self._put(key, value, current_node.right_child)
else:
current_node.right_child = AVLTreeNode(
key, value, 0, parent=current_node
)
self.update_balance(current_node.right_child)
def update_balance(self, node):
if node.balance_factor > 1 or node.balance_factor < -1:
self.rebalance(node)
return
if node.parent:
if node.is_left_child():
node.parent.balance_factor += 1
elif node.is_right_child():
node.parent.balance_factor -= 1
if node.parent.balance_factor != 0:
self.update_balance(node.parent)
The new update_balance
method is where most of the work is done. This implements the recursive procedure we just described. It first checks to see if the current node is out of balance enough to require rebalancing (line 20). If that is the case then the rebalancing is done and no further updating to parents is required. If the current node does not require rebalancing then the balance factor of the parent is adjusted. If the balance factor of the parent is nonzero then the algorithm continues to work its way up the tree toward the root by recursively calling update_balance
on the parent.
When a rebalancing of the tree is necessary, how do we do it? Efficient rebalancing is the key to making the AVL Tree work well without sacrificing performance. In order to bring an AVL Tree back into balance, we will perform one or more rotations on the tree.
To understand what a rotation is, let's look at a very simple example. Consider the tree in the left half of Figure 3
. This tree is out of balance with a balance factor of -2. To bring this tree into balance we will use a left rotation around the subtree rooted at node A.
To perform a left rotation we essentially do the following:
- Promote the right child (B) to be the root of the subtree.
- Move the old root (A) to be the left child of the new root.
- If new root (B) already has a left child, then make it the right child of the new left child (A). Note: since the new root (B) was the right child of A, the right child of A is guaranteed to be empty at this point. This allows us to add a new node as the right child without any further consideration.
While this procedure is fairly easy in concept, the details of the code are a bit tricky since we need to move things around in just the right order so that all properties of a binary search tree are preserved. Furthermore, we need to make sure to update all of the parent pointers appropriately.
Let's look at a slightly more complicated tree to illustrate the right rotation. The left side of Figure 4
shows a tree that is left-heavy and with a balance factor of 2 at the root.
To perform a right rotation we essentially do the following:
- Promote the left child (C) to be the root of the subtree.
- Move the old root (E) to be the right child of the new root.
- If the new root (C) already has a right child (D) then make it the left child of the new right child (E). Note: since the new root (C) was the left child of E, the left child of E is guaranteed to be empty at this point. This allows us to add a new node as the left child without any further consideration.
Now that you have seen the rotations and have the basic idea of how a rotation works let us look at the code. Listing 2
shows the code for the left rotation (the rotate_right
method is symmetrical to rotate_left
so we will leave it to you to study the code for rotate_right
). In line 2 we create a temporary variable to keep track of the new root of the subtree. As we said before, the new root is the right child of the previous root. Now that a reference to the right child has been stored in this temporary variable, we replace the right child of the old root with the left child of the new.
The next step is to adjust the parent pointers of the two nodes. If new_root
has a left child then the new parent of the left child becomes the old root. The parent of the new root is set to the parent of the old root. If the old root was the root of the entire tree then we must set the root of the tree to point to this new root. Otherwise, if the old root is a left child then we change the parent of the left child to point to the new root; otherwise we change the parent of the right child to point to the new root. (lines 10-13). Finally we set the parent of the old root to be the new root. This is a lot of complicated bookkeeping, so we encourage you to trace through this function while looking at Figure 3
.
def rotate_left(self, rotation_root):
new_root = rotation_root.right_child
rotation_root.right_child = new_root.left_child
if new_root.left_child:
new_root.left_child.parent = rotation_root
new_root.parent = rotation_root.parent
if rotation_root.is_root():
self._root = new_root
else:
if rotation_root.is_left_child():
rotation_root.parent.left_child = new_root
else:
rotation_root.parent.right_child = new_root
new_root.left_child = rotation_root
rotation_root.parent = new_root
rotation_root.balance_factor = (
rotation_root.balance_factor + 1 - min(new_root.balance_factor, 0)
)
new_root.balance_factor = (
new_root.balance_factor + 1 + max(rotation_root.balance_factor, 0)
)
Finally, lines 16-21 require some explanation. In these lines we update the balance factors of the old and the new root. Since all the other moves involve moving entire subtrees, the balance factors of all other nodes are unaffected by the rotation. But how can we update the balance factors without completely recalculating the heights of the new subtrees? Figure 5
and the following derivation should convince you that these lines are correct.
Figure 5
shows a left rotation. B and D are the pivotal nodes and A, C, E are their subtrees. Let \(h_x\) denote the height of a particular subtree rooted at node \(x\). By definition we know the following:
\(\begin{align} new\_bal(B) &= h_A - h_C \\ old\_bal(B) &= h_A - h_D \end{align}\)
But we know that the old height of D can also be given by \(1 + max(h_C, h_E)\), that is, the height of D is one more than the maximum height of its two children. Remember that \(h_C\) and \(h_E\) have not changed. So, let us substitute that in to the second equation, which gives us
\(old\_bal(B) = h_A - (1 + max(h_C,h_E))\)
and then subtract the two equations. The following steps do the subtraction and use some algebra to simplify the equation for \(new\_bal(B)\).
\(\begin{align} new\_bal(B) - old\_bal(B) &= h_A - h_C - (h_A - (1 + max(h_C,h_E))) \\ new\_bal(B) - old\_bal(B) &= h_A - h_C - h_A + (1 + max(h_C,h_E)) \\ new\_bal(B) - old\_bal(B) &= h_A - h_A + 1 + max(h_C,h_E) - h_C \\ new\_bal(B) - old\_bal(B) &= 1 + max(h_C,h_E) - h_C \end{align}\)
Next we will move \(old\_bal(B)\) to the right-hand side of the equation and make use of the fact that \(max(a,b)-c = max(a-c, b-c)\).
\(new\_bal(B) = old\_bal(B) + 1 + max(h_C - h_C ,h_E - h_C)\)
But \(h_E - h_C\) is the same as \(-old\_bal(D)\). So we can use another identity that says \(max(-a,-b) = -min(a,b)\). So we can finish our derivation of \(new\_bal(B)\) with the following steps:
\(\begin{align} new\_bal(B) &= old\_bal(B) + 1 + max(0 , -old\_bal(D)) \\ new\_bal(B) &= old\_bal(B) + 1 - min(0 , old\_bal(D)) \end{align}\)
Now we have all of the parts in terms that we readily know. If we remember that B is rotation_root
and D is new_root
then we can see this corresponds exactly to the statement on lines 16-18 in Listing 2
, or:
rotation_root.balance_factor = (
rotation_root.balance_factor + 1 - min(new_root.balance_factor, 0)
)
A similar derivation gives us the equation for the updated node D as well as the balance factors after a right rotation. We leave these as an exercise for you.
Now you might think that we are done. We know how to do our left and right rotations, and we know when we should do a left or right rotation. But take a look at Figure 6
. Since node A has a balance factor of -2 we should do a left rotation. But what happens when we do the left rotation around A?
Figure 7 <fig_badrotate>
shows us that after the left rotation we are now out of balance the other way. If we do a right rotation to correct the situation we are right back where we started.
To correct this problem we must use the following set of rules:
- If a subtree needs a left rotation to bring it into balance, first check the balance factor of the right child. If the right child is left-heavy, then do a right rotation on right child followed by the original left rotation.
- If a subtree needs a right rotation to bring it into balance, first check the balance factor of the left child. If the left child is right-heavy, then do a left rotation on the left child followed by the original right rotation.
Figure 8
shows how these rules solve the dilemma we encountered in Figure 6
and Figure 7
. Starting with a right rotation around node C puts the tree in a position where the left rotation around A brings the entire subtree back into balance.
The code that implements these rules can be found in our rebalance
method, which is shown in Listing 3
. Rule number 1 from above is implemented by the if
statement starting on line 2. Rule number 2 is implemented by the elif
statement starting on line 8.
Listing 3 | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
The discussion questions
provide you the opportunity to rebalance a tree that requires a left rotation followed by a right. In addition the discussion questions provide you with the opportunity to rebalance some trees that are a little more complex than the tree in Figure 8
.
By keeping the tree in balance at all times, we can ensure that the get
method will run in order \(O(log_2(n))\) time. But the question is at what cost to our put
method? Let us break this down into the operations performed by put
. Since a new node is inserted as a leaf, updating the balance factors of all the parents will require a maximum of \(log_2(n)\) operations, one for each level of the tree. If a subtree is found to be out of balance, a maximum of two rotations are required to bring the tree back into balance. But each of the rotations works in \(O(1)\) time, so even our put
operation remains \(O(log_2(n))\).
At this point we have implemented a functional AVL tree, unless you need the ability to delete a node. We leave the deletion of the node and subsequent updating and rebalancing as an exercise for you.
创建日期: 2024年9月9日