创建项目#

Creating projects

在本教程中,我们将演示如何使用 environment.yml 文件,在 conda 中设置一个新的 Python 项目。此文件可帮助你追踪依赖项,并便于与他人共享项目。我们将介绍如何创建项目、添加简单的 Python 程序,以及如何随着需求增加更新依赖项。

In this tutorial, we will walk through how to set up a new Python project in conda using an environment.yml file. This file will help you keep track of your dependencies and share your project with others. We cover how to create your project, add a simple Python program and update it with new dependencies.

需求#

Requirements

要跟随本教程操作,你需要一个已安装且可用的 conda 环境。如果尚未安装,请参考我们的 安装指南 获取安装说明。

本教程大量依赖使用终端(在 Windows 上是命令提示符或 PowerShell),因此你应熟悉如 cdls 等基本命令的使用。

To follow along, you will need a working conda installation. Please head over to our installation guide for instructions on how to get conda installed if you do not have it.

This tutorial relies heavily on using your computer's terminal (Command Prompt or PowerShell on Windows), so it is also important to have a working familiarity with using basic commands such as cd and ls.

创建项目文件#

Creating the project's files

首先,我们需要一个目录来存放项目的所有文件。可以使用以下命令创建该目录:

mkdir my-project

在该目录下,我们将创建一个新的 environment.yml 文件,用于声明该 Python 项目的依赖项。请使用你偏好的文本编辑器(如 VSCode、PyCharm、vim 等)创建此文件,并添加以下内容:

name: my-project
channels:
- defaults
dependencies:
- python

我们来简单解释一下该文件的各个部分:

Name#

环境名称。在这里我们使用了 "my-project",但你可以根据需要自定义名称。

Channels#

指定 conda 搜索软件包的通道。我们这里使用了 defaults,你也可以添加如 conda-forgebioconda 等其他通道。

Dependencies#

声明项目所需的所有依赖项。目前我们仅添加了 python,后续会继续补充。

To start off, we will need a directory that will contain the files for our project. This can be created with the following command:

mkdir my-project

In this directory, we will now create a new environment.yaml file, which will hold the dependencies for our Python project. In your text editor (e.g. VSCode, PyCharm, vim, etc.), create this file and add the following:

name: my-project
channels:
- defaults
dependencies:
- python

Let's briefly go over what each part of this file means.

Name#

The name of your environment. Here, we have chosen the name "my-project", but this can be anything you want.

Channels#

Channels specify where you want conda to search for packages. We have chosen the defaults channel, but others such as conda-forge or bioconda are also possible to list here.

Dependencies#

All the dependencies that you need for your project. So far, we have just added python because we know it will be a Python project. We will add more later.

创建环境#

Creating our environment

现在我们已经编写了一个基础的 environment.yml 文件,可以基于它创建并激活一个环境。运行以下命令:

conda env create --file environment.yml
conda activate my-project

Now that we have written a basic environment.yml file, we can create and activate an environment from it. To do so, run the following commands:

conda env create --file environment.yml
conda activate my-project

创建 Python 应用程序#

Creating our Python application

此时我们拥有了一个已安装 Python 的新环境,可以开始编写一个简单的 Python 程序。在项目目录中,创建一个 main.py 文件,并添加如下内容:

def main():
    print("Hello, conda!")


if __name__ == "__main__":
    main()

可以通过以下命令运行该程序:

python main.py
Hello, conda!

With our new environment with Python installed, we can create a simple Python program. In your project folder, create a main.py file and add the following:

def main():
    print("Hello, conda!")


if __name__ == "__main__":
    main()

We can run our simple Python program by running the following command:

python main.py
Hello, conda!

使用新的依赖项更新项目#

Updating our project with new dependencies

如果你希望项目功能超出上面的简单示例,可以使用 conda 通道中的数千个可用软件包。接下来我们将添加一个新依赖项,从网络获取数据并进行简单分析。

本例中我们使用 Pandas 软件包进行数据分析。要将其添加至项目,需要更新 environment.yml 文件如下:

name: my-project
channels:
- defaults
dependencies:
- python
- pandas  # <-- 这是我们新增的依赖项

编辑完成后,运行以下命令安装新依赖项:

conda env update --file environment.yml

现在依赖项已安装完毕,我们将下载一份用于分析的数据。我们选用的是美国环保署的 Walkability Index 数据集,可从 data.gov 网站获取。使用以下命令下载:

curl -O https://edg.epa.gov/EPADataCommons/public/OA/EPA_SmartLocationDatabase_V3_Jan_2021_Final.csv

提示

如果你没有安装 curl,也可以通过浏览器访问上述链接手动下载文件。

我们的分析目标是了解多少比例的美国居民生活在高可步行性区域。我们可以借助 pandas 库轻松完成这一分析。以下是可能的实现代码:

import pandas as pd


def main():
    """
    回答以下问题:

    有多少比例的美国居民生活在高可步行性的社区?

    “15.26” 是被认为高可步行区域的指数阈值。
    """
    csv_file = "./EPA_SmartLocationDatabase_V3_Jan_2021_Final.csv"
    highly_walkable = 15.26

    df = pd.read_csv(csv_file)

    total_population = df["TotPop"].sum()
    highly_walkable_pop = df[df["NatWalkInd"] >= highly_walkable]["TotPop"].sum()

    percentage = (highly_walkable_pop / total_population) * 100.0

    print(
        f"{percentage:.2f}% of U.S. residents live in highly" "walkable neighborhoods."
    )


if __name__ == "__main__":
    main()

将上述代码更新到你的 main.py 文件并运行,你应该能看到如下输出:

python main.py
10.69% of Americans live in highly walkable neighborhoods

If you want your project to do more than the simple example above, you can use one of the thousands of available packages on conda channels. To demonstrate this, we will add a new dependency so that we can pull in some data from the internet and perform a basic analysis.

To perform the data analysis, we will be relying on the Pandas package. To add this to our project, we will need to update our environment.yml file:

name: my-project
channels:
- defaults
dependencies:
- python
- pandas  # <-- This is our new dependency

Once we have done that, we can run the conda env update command to install the new package:

conda env update --file environment.yml

Now that our dependencies are installed, we will download some data to use for our analysis. For this, we will use the U.S. Environmental Protection Agency's Walkability Index dataset available on data.gov. You can download this with the following command:

curl -O https://edg.epa.gov/EPADataCommons/public/OA/EPA_SmartLocationDatabase_V3_Jan_2021_Final.csv

Tip

If you do not have curl, you can visit the above link with a web browser to download it.

For our analysis, we are interested in knowing what percentage of U.S. residents live in highly walkable areas. This is a question that we can easily answer using the pandas library. Below is an example of how you might go about doing that:

import pandas as pd


def main():
    """
    Answers the question:

    What percentage of U.S. residents live highly walkable neighborhoods?

    "15.26" is the threshold on the index for a highly walkable area.
    """
    csv_file = "./EPA_SmartLocationDatabase_V3_Jan_2021_Final.csv"
    highly_walkable = 15.26

    df = pd.read_csv(csv_file)

    total_population = df["TotPop"].sum()
    highly_walkable_pop = df[df["NatWalkInd"] >= highly_walkable]["TotPop"].sum()

    percentage = (highly_walkable_pop / total_population) * 100.0

    print(
        f"{percentage:.2f}% of U.S. residents live in highly" "walkable neighborhoods."
    )


if __name__ == "__main__":
    main()

Update your main.py file with the code above and run it. You should get the following answer:

python main.py
10.69% of Americans live in highly walkable neighborhoods

结论#

Conclusion

你已经学会了如何借助 environment.yml 文件在 conda 中创建自己的数据分析项目。随着项目的发展,你可能还会添加更多依赖项,并将 Python 代码拆分为多个文件和模块以便组织。

如需了解更多关于管理环境与 environment.yml 文件的信息,请参阅 管理环境

You have just been introduced to creating your own data analysis project by using the environment.yml file in conda. As the project grows, you may wish to add more dependencies and also better organize the Python code into separate files and modules.

For even more information about working with environments and environment.yml files, please see Managing Environments.