不稳定的测试¶

Flaky tests

“间歇性”测试是指表现出间歇性或偶发失败的测试，其行为似乎是非确定性的。有时它会通过，有时会失败，且原因不明。本页讨论了可以帮助识别、修复或缓解这些问题的 pytest 特性以及其他一般策略。

不稳定的测试为何是个问题¶

Why flaky tests are a problem

间歇性测试在使用持续集成（CI）服务器时尤其麻烦，因为在合并新的代码更改之前，所有测试都必须通过。如果测试结果不是可靠的信号——即测试失败并不意味着代码更改导致了测试失败——开发人员可能会对测试结果产生不信任，从而忽视真正的失败。此外，这也会浪费时间，因为开发人员必须重新运行测试套件并调查虚假的失败。

潜在根本原因¶

Potential root causes

系统状态¶

System state

广义而言，间歇性测试表明该测试依赖于一些未得到适当控制的系统状态——测试环境没有得到充分隔离。高级别的测试更容易出现间歇性，因为它们依赖于更多的状态。

间歇性测试有时会在测试套件并行运行时出现（例如使用 pytest-xdist）。这可能表明某个测试依赖于测试顺序。

可能是另一个测试未能正确清理自身，留下数据，从而导致间歇性测试失败。
间歇性测试依赖于一个未清理自身的数据，而在并行运行中，之前的测试并不总是存在。
修改全局状态的测试通常无法并行运行。

过于严格的断言¶

Overly strict assertion

过于严格的断言可能会导致浮点数比较和时间问题。此时， pytest.approx() 非常有用。

Overly strict assertions can cause problems with floating point comparison as well as timing issues. pytest.approx() is useful here.

线程安全¶

Thread safety

pytest 是单线程的，始终在同一线程中顺序执行测试，从不自行生成任何线程。

即使在运行并行测试的插件情况下，例如 pytest-xdist，通常也是通过生成多个进程来批量运行测试，而不是使用多个线程。

当然，测试和夹具可以在其测试工作流程中自行生成线程（例如，一个在后台启动服务器线程的夹具，或一个执行生成代码并生成线程的测试），但需要注意以下几点：

确保最终等待任何生成的线程——例如在测试结束时或夹具的拆卸过程中。
避免从多个线程使用 pytest 提供的原语（如 pytest.warns、pytest.raises 等），因为它们不是线程安全的。

如果您的测试套件使用线程并且您看到不稳定的测试结果，请不要排除测试隐式使用 pytest 本身的全局状态的可能性。

pytest is single-threaded, executing its tests always in the same thread, sequentially, never spawning any threads itself.

Even in case of plugins which run tests in parallel, for example pytest-xdist, usually work by spawning multiple processes and running tests in batches, without using multiple threads.

It is of course possible (and common) for tests and fixtures to spawn threads themselves as part of their testing workflow (for example, a fixture that starts a server thread in the background, or a test which executes production code that spawns threads), but some care must be taken:

Make sure to eventually wait on any spawned threads – for example at the end of a test, or during the teardown of a fixture.
Avoid using primitives provided by pytest (pytest.warns(), pytest.raises(), etc) from multiple threads, as they are not thread-safe.

If your test suite uses threads and your are seeing flaky test results, do not discount the possibility that the test is implicitly using global state in pytest itself.

其他一般策略¶

Other general strategies

拆分测试套件¶

Split up test suites

将单个测试套件拆分为两个（例如单元测试与集成测试）是常见的做法，并仅将单元测试套件用作 CI 门。这也有助于保持构建时间的可控，因为高层次的测试往往更慢。然而，这意味着可能会合并导致构建失败的代码，因此需要额外关注集成测试结果的监控。

失败时的视频/屏幕截图¶

Video/screenshot on failure

对于 UI 测试，这些信息对于了解测试失败时 UI 的状态非常重要。可以将 pytest-splinter 与像 pytest-bdd 这样的插件一起使用，并且可以在测试失败时自动保存截图，这有助于定位故障原因。

删除或重写测试¶

Delete or rewrite the test

如果功能已经被其他测试覆盖，那么可以考虑移除该测试。如果没有覆盖，那么可以尝试在更低的层级重写测试，以消除不稳定性或使其源头更加明显。

隔离¶

Quarantine

Mark Lapierre 在2018年的一篇文章中讨论了隔离测试的优缺点。

失败时重新运行的 CI 工具¶

CI tools that rerun on failure

Azure Pipelines（Azure云CI/CD工具，前身为Visual Studio Team Services或VSTS）具有识别不稳定测试和重新运行失败测试的功能。

研究¶

Research

这是一个有限的列表，请提交问题或拉取请求以扩展它！

Gao, Zebao, Yalan Liang, Myra B. Cohen, Atif M. Memon, and Zhen Wang. “Making system user interactive tests repeatable: When and what should we control?.” 在 软件工程 (ICSE), 2015 IEEE/ACM 第37届IEEE国际会议, 第1卷, 第55-65页。IEEE, 2015. PDF
Palomba, Fabio, and Andy Zaidman. “Does refactoring of test smells induce fixing flaky tests?.” 在 软件维护与演变 (ICSME), 2017 IEEE国际会议, 第1-12页。IEEE, 2017. PDF in Google Drive
Bell, Jonathan, Owolabi Legunsen, Michael Hilton, Lamyaa Eloussi, Tifany Yung, and Darko Marinov. “DeFlaker: Automatically detecting flaky tests.” 在 2018年国际软件工程会议论文集. 2018. PDF
Dutta, Saikat and Shi, August and Choudhary, Rutvik and Zhang, Zhekun and Jain, Aryaman and Misailovic, Sasa. “Detecting flaky tests in probabilistic and machine learning applications.” 在 第29届ACM SIGSOFT国际软件测试与分析研讨会 (ISSTA), 第211-224页。ACM, 2020. PDF

资源¶

Resources

Eradicating Non-Determinism in Tests by Martin Fowler, 2011
No more flaky tests on the Go team by Pavan Sudarshan, 2012
The Build That Cried Broken: Building Trust in your Continuous Integration Tests 演讲（视频）by Angie Jones at SeleniumConf Austin 2017
Test and Code Podcast: Flaky Tests and How to Deal with Them by Brian Okken 和 Anthony Shaw, 2018
微软：
How we approach testing VSTS to enable continuous delivery by Brian Harry MS, 2017
Eliminating Flaky Tests 博客和演讲（视频）by Munil Shah, 2017
谷歌：
Flaky Tests at Google and How We Mitigate Them by John Micco, 2016
Where do Google’s flaky tests come from? by Jeff Listfield, 2017

不稳定的测试¶