5.4. 二分搜索¶

5.4. The Binary Search

中文英文

如果我们巧妙地进行比较，就可以更好地利用有序列表。在顺序搜索中，当我们与第一个项进行比较时，最多还有 \(n-1\) 个项需要查找，如果第一个项不是我们要找的项。那么，二分搜索将从中间项开始进行检查。如果该项正是我们要找的项，那么搜索结束。如果它不是正确的项，我们可以利用列表的有序性来消除一半的剩余项。如果我们要找的项大于中间项，那么整个左半部分（包括中间项）都可以从进一步的考虑中排除。如果我们要找的项在列表中，它一定在右半部分。

接下来我们可以对右半部分重复这个过程。先从中间项开始，将它与我们要找的项进行比较。同样地，我们要么找到它，要么再次将列表一分为二，从而排除一大部分可能的搜索范围。图3 展示了这个算法如何快速找到值 54。完整的函数如 CodeLens 3 所示。

活动: CodeLens 有序列表的二分搜索 (search3)
def binary_search(a_list, item):
    first = 0
    last = len(a_list) - 1

    while first <= last:
        midpoint = (first + last) // 2
        if a_list[midpoint] == item:
            return True
        elif item < a_list[midpoint]:
            last = midpoint - 1
        else:
            first = midpoint + 1

    return False


test_list = [0, 1, 2, 8, 13, 17, 19, 32, 42]

print(binary_search(test_list, 3))
print(binary_search(test_list, 13))

在进行分析之前，我们应该注意到这个算法是“分而治之”策略的一个极好例子。“分而治之”意味着我们将问题分解为更小的部分，以某种方式解决这些小部分，然后将整个问题重新组合起来得到结果。当我们对列表执行二分搜索时，首先检查中间项。如果我们要找的项小于中间项，我们可以简单地对原列表的左半部分执行二分搜索。同样地，如果该项更大，我们可以对右半部分进行二分搜索。无论哪种情况，这都是对二分搜索函数的递归调用，传递的是一个更小的列表。CodeLens 4 显示了递归版本的二分搜索。

活动: CodeLens 二分搜索--递归版本 (search4)
def binary_search_rec(a_list, item):
    if len(a_list) == 0:
        return False
    midpoint = len(a_list) // 2
    if a_list[midpoint] == item:
        return True
    elif item < a_list[midpoint]:
        return binary_search_rec(a_list[:midpoint], item)
    else:
        return binary_search_rec(a_list[midpoint + 1 :], item)


test_list = [0, 1, 2, 8, 13, 17, 19, 32, 42]

print(binary_search_rec(test_list, 3))
print(binary_search_rec(test_list, 13))

It is possible to take greater advantage of the ordered list if we are clever with our comparisons. In the sequential search, when we compare against the first item, there are at most \(n-1\) more items to look through if the first item is not what we are looking for. Instead of searching the list in sequence, a binary search will start by examining the middle item. If that item is the one we are searching for, we are done. If it is not the correct item, we can use the ordered nature of the list to eliminate half of the remaining items. If the item we are searching for is greater than the middle item, we know that the entire first (left) half of the list as well as the middle item can be eliminated from further consideration. The item, if it is in the list, must be in the second (right) half.

We can then repeat the process with the left half. Start at the middle item and compare it against what we are looking for. Again, we either find it or split the list in half, therefore eliminating another large part of our possible search space. Figure 3 shows how this algorithm can quickly find the value 54. The complete function is shown in CodeLens 3.

Image title — Figure 3: Binary Search of an Ordered List of Integers

Activity: CodeLens Binary Search of an Ordered List (search3)
def binary_search(a_list, item):
    first = 0
    last = len(a_list) - 1

    while first <= last:
        midpoint = (first + last) // 2
        if a_list[midpoint] == item:
            return True
        elif item < a_list[midpoint]:
            last = midpoint - 1
        else:
            first = midpoint + 1

    return False


test_list = [0, 1, 2, 8, 13, 17, 19, 32, 42]

print(binary_search(test_list, 3))
print(binary_search(test_list, 13))

Before we move on to the analysis, we should note that this algorithm is a great example of a divide and conquer strategy. Divide and conquer means that we divide the problem into smaller pieces, solve the smaller pieces in some way, and then reassemble the whole problem to get the result. When we perform a binary search of a list, we first check the middle item. If the item we are searching for is less than the middle item, we can simply perform a binary search of the left half of the original list. Likewise, if the item is greater, we can perform a binary search of the right half. Either way, this is a recursive call to the binary search function passing a smaller list. CodeLens 4 shows this recursive version.

Activity: CodeLens A Binary Search--Recursive Version (search4)
def binary_search_rec(a_list, item):
    if len(a_list) == 0:
        return False
    midpoint = len(a_list) // 2
    if a_list[midpoint] == item:
        return True
    elif item < a_list[midpoint]:
        return binary_search_rec(a_list[:midpoint], item)
    else:
        return binary_search_rec(a_list[midpoint + 1 :], item)


test_list = [0, 1, 2, 8, 13, 17, 19, 32, 42]

print(binary_search_rec(test_list, 3))
print(binary_search_rec(test_list, 13))

5.4.1. 二分搜索的分析¶

5.4.1. Analysis of Binary Search

中文英文

为了分析二分搜索算法，我们需要记住每次比较都会排除大约一半的剩余项。那么，这个算法最多需要多少次比较来检查整个列表呢？如果我们从 \(n\) 个项开始，第一次比较后大约还剩下 \(\frac{n}{2}\) 个项。第二次比较后，大约还剩下 \(\frac{n}{4}\) 个项。接着是 \(\frac{n}{8}\)，\(\frac{n}{16}\)，依此类推。我们能将列表分割多少次呢？表3 帮助我们找到答案。

表3：二分搜索的表格分析

比较次数	剩余项的近似数量
1	\(\frac {n}{2}\)
2	\(\frac {n}{4}\)
3	\(\frac {n}{8}\)
...
\(i\)	\(\frac {n}{2^i}\)

当我们将列表分割足够多次时，最终会剩下一个项。这个项要么是我们要找的，要么不是。不管是哪种情况，我们的搜索都结束了。达到这个点所需的比较次数是 \(i\)，满足 \(\frac {n}{2^i} = 1\)。解出 \(i\) 可以得到 \(i = \log{n}\)。因此，最大比较次数是相对于列表项数的对数，也就是说，二分搜索的时间复杂度是 \(O(\log{n})\)。

还有一个需要解决的分析问题。在上面的递归解决方案中，递归调用

binary_search_rec(a_list[:midpoint], item)

使用了切片操作符来创建列表的左半部分并将其传递给下一个调用（右半部分也是如此）。我们之前的分析假设切片操作是常数时间的。但实际上在 Python 中，它的时间复杂度是 \(O(k)\)。这意味着使用切片的二分搜索不会严格在对数时间内运行。不过，这可以通过传递列表及其起始和结束索引来解决。这些索引可以像 列表3 中那样计算。我们将这个实现留作练习。

尽管二分搜索通常比顺序搜索更好，但需要注意的是，对于较小的 \(n\) 值，排序的额外开销可能不值得。事实上，我们应该总是考虑排序是否成本效益高。如果我们可以排序一次并多次搜索，排序的成本就不那么显著了。然而，对于大型列表，哪怕只排序一次，其成本可能也会非常高，以至于从一开始就执行顺序搜索可能是最好的选择。

自检

Q 3Q 4

假设你有如下排序列表 [3, 5, 6, 8, 11, 12, 14, 15, 17, 18]，并使用递归二分搜索算法。以下哪组数字正确显示了查找键值 8 的比较顺序？

答案 a: 11, 5, 6, 8
答案 b: 12, 6, 11, 8
答案 c: 3, 5, 6, 8
答案 d: 18, 12, 6, 8

正确答案: b

反馈 a: 你可能犯了“差一”错误。记住列表的第一个位置是索引 0。
反馈 b: 二分搜索从中间开始，每次将列表对半分。
反馈 c: 二分搜索不会从开头开始顺序搜索，而是从中间开始，每次比较后将列表对半分。
反馈 d: 你似乎是从末尾开始并每次将列表对半分。

假设你有如下排序列表 [3, 5, 6, 8, 11, 12, 14, 15, 17, 18]，并使用递归二分搜索算法。以下哪组数字正确显示了查找键值 16 的比较顺序？

答案 a: 11, 14, 17
答案 b: 18, 17, 15
答案 c: 14, 17, 15
答案 d: 12, 17, 15

正确答案: d

反馈 a: 你可能犯了“差一”错误。记住列表的第一个位置是索引 0。
反馈 b: 记住二分搜索从中间开始并每次将列表对半分。
反馈 c: 你可能有点偏差了，请注意你是否使用整数运算正确计算了中点。
反馈 d: 二分搜索从中间开始，每次将列表对半分。当列表为空时，搜索结束。

To analyze the binary search algorithm, we need to recall that each comparison eliminates about half of the remaining items from consideration. What is the maximum number of comparisons this algorithm will require to check the entire list? If we start with \(n\) items, about \(\frac{n}{2}\) items will be left after the first comparison. After the second comparison, there will be about \(\frac{n}{4}\). Then \(\frac{n}{8}\), \(\frac{n}{16}\), and so on. How many times can we split the list? Table 3 helps us to see the answer.

Table 3: Tabular Analysis for a Binary Search

Comparisons	Approximate Number of Items Left
1	\(\frac {n}{2}\)
2	\(\frac {n}{4}\)
3	\(\frac {n}{8}\)
...
\(i\)	\(\frac {n}{2^i}\)

When we split the list enough times, we end up with a list that has just one item. Either that is the item we are looking for or it is not. Either way, we are done. The number of comparisons necessary to get to this point is \(i\) where \(\frac {n}{2^i} =1\). Solving for \(i\) gives us \(i=\log{n}\). The maximum number of comparisons is logarithmic with respect to the number of items in the list. Therefore, the binary search is \(O(\log{n})\).

One additional analysis issue needs to be addressed. In the recursive solution shown above, the recursive call

binary_search_rec(a_list[:midpoint], item)

uses slicing operator to create the left half of the list that is then passed to the next invocation (similarly for the right half as well). The analysis that we did above assumed that slicing takes constant time. However, we know that in Python it is actually \(O(k)\). This means that the binary search using slicing will not perform in strict logarithmic time. Luckily this can be remedied by passing the list along with the starting and ending indices. The indices can be calculated as we did in Listing 3. We leave this implementation as an exercise.

Even though a binary search is generally better than a sequential search, it is important to note that for small values of \(n\), the additional cost of sorting is probably not worth it. In fact, we should always consider whether it is cost effective to take on the extra work of sorting to gain searching benefits. If we can sort once and then search many times, the cost of the sort is not so significant. However, for large lists, sorting even once can be so expensive that simply performing a sequential search from the start may be the best choice.

Self Check

Q 3Q 4

Suppose you have the following sorted list [3, 5, 6, 8, 11, 12, 14, 15, 17, 18] and are using the recursive binary search algorithm. Which group of numbers correctly shows the sequence of comparisons used to find the key 8.

answer a: 11, 5, 6, 8
answer b: 12, 6, 11, 8
answer c: 3, 5, 6, 8
answer d: 18, 12, 6, 8

correct: b

feedback a: Looks like you might be guilty of an off-by-one error. Remember the first position is index 0.
feedback b: Binary search starts at the midpoint and halves the list each time.
feedback c: Binary search does not start at the beginning and search sequentially, its starts in the middle and halves the list after each compare.
feedback d: It appears that you are starting from the end and halving the list each time.

Suppose you have the following sorted list [3, 5, 6, 8, 11, 12, 14, 15, 17, 18] and are using the recursive binary search algorithm. Which group of numbers correctly shows the sequence of comparisons used to search for the key 16?

answer a: 11, 14, 17
answer b: 18, 17, 15
answer c: 14, 17, 15
answer d: 12, 17, 15

correct: d

feedback a: Looks like you might be guilty of an off-by-one error. Remember the first position is index 0.
feedback b: Remember binary search starts in the middle and halves the list.
feedback c: Looks like you might be off by one, be careful that you are calculating the midpont using integer arithmetic.
feedback d: Binary search starts at the midpoint and halves the list each time. It is done when the list is empty.

最后更新: 2024年9月13日
创建日期: 2024年9月9日