创建项目#
Creating projects
在本教程中,我们将演示如何使用 environment.yml
文件,在 conda 中设置一个新的 Python 项目。此文件可帮助你追踪依赖项,并便于与他人共享项目。我们将介绍如何创建项目、添加简单的 Python 程序,以及如何随着需求增加更新依赖项。
In this tutorial, we will walk through how to set up a new Python project in conda
using an environment.yml
file. This file will help you keep track of your
dependencies and share your project with others. We cover how to create your
project, add a simple Python program and update it with new dependencies.
需求#
Requirements
要跟随本教程操作,你需要一个已安装且可用的 conda 环境。如果尚未安装,请参考我们的 安装指南 获取安装说明。
本教程大量依赖使用终端(在 Windows 上是命令提示符或 PowerShell),因此你应熟悉如 cd
和 ls
等基本命令的使用。
To follow along, you will need a working conda installation. Please head over to our installation guide for instructions on how to get conda installed if you do not have it.
This tutorial relies heavily on using your computer's terminal (Command Prompt or PowerShell
on Windows), so it is also important to have a working familiarity with using basic commands
such as cd
and ls
.
创建项目文件#
Creating the project's files
首先,我们需要一个目录来存放项目的所有文件。可以使用以下命令创建该目录:
mkdir my-project
在该目录下,我们将创建一个新的 environment.yml
文件,用于声明该 Python 项目的依赖项。请使用你偏好的文本编辑器(如 VSCode、PyCharm、vim 等)创建此文件,并添加以下内容:
name: my-project
channels:
- defaults
dependencies:
- python
我们来简单解释一下该文件的各个部分:
To start off, we will need a directory that will contain the files for our project. This can be created with the following command:
mkdir my-project
In this directory, we will now create a new environment.yaml
file, which will hold the
dependencies for our Python project. In your text editor (e.g. VSCode, PyCharm, vim, etc.),
create this file and add the following:
name: my-project
channels:
- defaults
dependencies:
- python
Let's briefly go over what each part of this file means.
- Name#
The name of your environment. Here, we have chosen the name "my-project", but this can be anything you want.
- Channels#
Channels specify where you want conda to search for packages. We have chosen the
defaults
channel, but others such asconda-forge
orbioconda
are also possible to list here.- Dependencies#
All the dependencies that you need for your project. So far, we have just added
python
because we know it will be a Python project. We will add more later.
创建环境#
Creating our environment
现在我们已经编写了一个基础的 environment.yml
文件,可以基于它创建并激活一个环境。运行以下命令:
conda env create --file environment.yml
conda activate my-project
Now that we have written a basic environment.yml
file, we can create and activate an environment
from it. To do so, run the following commands:
conda env create --file environment.yml
conda activate my-project
创建 Python 应用程序#
Creating our Python application
此时我们拥有了一个已安装 Python 的新环境,可以开始编写一个简单的 Python 程序。在项目目录中,创建一个 main.py
文件,并添加如下内容:
def main():
print("Hello, conda!")
if __name__ == "__main__":
main()
可以通过以下命令运行该程序:
python main.py
Hello, conda!
With our new environment with Python installed, we can create a simple Python program.
In your project folder, create a main.py
file and add the following:
def main():
print("Hello, conda!")
if __name__ == "__main__":
main()
We can run our simple Python program by running the following command:
python main.py
Hello, conda!
使用新的依赖项更新项目#
Updating our project with new dependencies
如果你希望项目功能超出上面的简单示例,可以使用 conda 通道中的数千个可用软件包。接下来我们将添加一个新依赖项,从网络获取数据并进行简单分析。
本例中我们使用 Pandas 软件包进行数据分析。要将其添加至项目,需要更新 environment.yml
文件如下:
name: my-project
channels:
- defaults
dependencies:
- python
- pandas # <-- 这是我们新增的依赖项
编辑完成后,运行以下命令安装新依赖项:
conda env update --file environment.yml
现在依赖项已安装完毕,我们将下载一份用于分析的数据。我们选用的是美国环保署的 Walkability Index 数据集,可从 data.gov 网站获取。使用以下命令下载:
curl -O https://edg.epa.gov/EPADataCommons/public/OA/EPA_SmartLocationDatabase_V3_Jan_2021_Final.csv
提示
如果你没有安装 curl
,也可以通过浏览器访问上述链接手动下载文件。
我们的分析目标是了解多少比例的美国居民生活在高可步行性区域。我们可以借助 pandas
库轻松完成这一分析。以下是可能的实现代码:
import pandas as pd
def main():
"""
回答以下问题:
有多少比例的美国居民生活在高可步行性的社区?
“15.26” 是被认为高可步行区域的指数阈值。
"""
csv_file = "./EPA_SmartLocationDatabase_V3_Jan_2021_Final.csv"
highly_walkable = 15.26
df = pd.read_csv(csv_file)
total_population = df["TotPop"].sum()
highly_walkable_pop = df[df["NatWalkInd"] >= highly_walkable]["TotPop"].sum()
percentage = (highly_walkable_pop / total_population) * 100.0
print(
f"{percentage:.2f}% of U.S. residents live in highly" "walkable neighborhoods."
)
if __name__ == "__main__":
main()
将上述代码更新到你的 main.py
文件并运行,你应该能看到如下输出:
python main.py
10.69% of Americans live in highly walkable neighborhoods
If you want your project to do more than the simple example above, you can use one of the thousands of available packages on conda channels. To demonstrate this, we will add a new dependency so that we can pull in some data from the internet and perform a basic analysis.
To perform the data analysis, we will be relying on the Pandas
package. To add this to our project, we will need to update our environment.yml
file:
name: my-project
channels:
- defaults
dependencies:
- python
- pandas # <-- This is our new dependency
Once we have done that, we can run the conda env update
command to install the new package:
conda env update --file environment.yml
Now that our dependencies are installed, we will download some data to use for our analysis. For this, we will use the U.S. Environmental Protection Agency's Walkability Index dataset available on data.gov. You can download this with the following command:
curl -O https://edg.epa.gov/EPADataCommons/public/OA/EPA_SmartLocationDatabase_V3_Jan_2021_Final.csv
Tip
If you do not have curl
, you can visit the above link with a web browser to download it.
For our analysis, we are interested in knowing what percentage of U.S. residents live in highly
walkable areas. This is a question that we can easily answer using the pandas
library.
Below is an example of how you might go about doing that:
import pandas as pd
def main():
"""
Answers the question:
What percentage of U.S. residents live highly walkable neighborhoods?
"15.26" is the threshold on the index for a highly walkable area.
"""
csv_file = "./EPA_SmartLocationDatabase_V3_Jan_2021_Final.csv"
highly_walkable = 15.26
df = pd.read_csv(csv_file)
total_population = df["TotPop"].sum()
highly_walkable_pop = df[df["NatWalkInd"] >= highly_walkable]["TotPop"].sum()
percentage = (highly_walkable_pop / total_population) * 100.0
print(
f"{percentage:.2f}% of U.S. residents live in highly" "walkable neighborhoods."
)
if __name__ == "__main__":
main()
Update your main.py
file with the code above and run it. You should get the following
answer:
python main.py
10.69% of Americans live in highly walkable neighborhoods
结论#
Conclusion
你已经学会了如何借助 environment.yml
文件在 conda 中创建自己的数据分析项目。随着项目的发展,你可能还会添加更多依赖项,并将 Python 代码拆分为多个文件和模块以便组织。
如需了解更多关于管理环境与 environment.yml
文件的信息,请参阅 管理环境。
You have just been introduced to creating your own data analysis project by using
the environment.yml
file in conda. As the project grows, you may wish to add more dependencies
and also better organize the Python code into separate files and modules.
For even more information about working with environments and environment.yml
files,
please see Managing Environments.