重新编写视觉模块记录¶

约 3412 个字 • 265 行代码

由于以前的视觉部分的代码看起来很💩(感觉代码文件的结构很混乱，不够易读)，于是打算重新编写新的。

我先是分别询问了chatgpt和copilot关于如何在摄像头一直保持开启的状态下，切换使用不同的功能

对话记录

ChatGPTGithub Copilot

User:

i have a question, There is a competition where I need to keep the camera on all the time, but different stages require the use of the camera to complete different tasks. For example, the first stage requires gesture recognition, the second stage requires face pose estimation, and both stages also require object detection. I can implement all three functions mentioned above separately (meaning that i can only implement one function when camera turned on), but I would like to ask how to switch between using different functions while keeping the camera on? how should I organize my code?

ChatGPT:

To switch between different functions (gesture recognition, face pose estimation, and object detection) while keeping the camera on, you can use a modular approach in your code. Here's a general strategy:

Initialize the Camera: Set up the camera to capture frames continuously.
Create Functions for Each Task: Write separate functions for gesture recognition, face pose estimation, and object detection.
Control Flow to Switch Between Tasks: Use a control mechanism (e.g., a state machine, a flag, or a command listener) to determine which function to call on each frame.
Main Loop: In the main loop, capture frames and call the appropriate function based on the current task.

Here's a simplified example in Python using OpenCV and a basic state machine:

import cv2

# Initialize the camera
cap = cv2.VideoCapture(0)

# Define the functions for each task
def gesture_recognition(frame):
    # Implement gesture recognition here
    pass

def face_pose_estimation(frame):
    # Implement face pose estimation here
    pass

def object_detection(frame):
    # Implement object detection here
    pass

# State to control which function to use
current_task = "gesture_recognition"

# Main loop
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    if current_task == "gesture_recognition":
        gesture_recognition(frame)
    elif current_task == "face_pose_estimation":
        face_pose_estimation(frame)
    elif current_task == "object_detection":
        object_detection(frame)

    # Display the frame
    cv2.imshow('Frame', frame)

    # Check for user input to switch tasks
    key = cv2.waitKey(1) & 0xFF
    if key == ord('g'):
        current_task = "gesture_recognition"
    elif key == ord('f'):
        current_task = "face_pose_estimation"
    elif key == ord('o'):
        current_task = "object_detection"
    elif key == ord('q'):
        break

# Release the camera and close windows
cap.release()
cv2.destroyAllWindows()

Explanation:

Initialize the Camera:
- cap = cv2.VideoCapture(0) initializes the camera.
Define Functions:
- gesture_recognition, face_pose_estimation, and object_detection are placeholders for your actual implementation.
Control Flow:
- current_task is a variable that controls which function to call.
- User input (cv2.waitKey) switches between tasks.
Main Loop:
- Continuously captures frames.
- Calls the appropriate function based on the current task.
- Displays the frame and listens for user input to switch tasks.

This approach ensures that the camera remains on and the system can switch between different tasks based on the control flow. You can modify the control mechanism to suit the specific requirements of your competition.

RonaldLN:

i have a question, There is a competition where I need to keep the camera on all the time, but different stages require the use of the camera to complete different tasks. For example, the first stage requires gesture recognition, the second stage requires face pose estimation, and both stages also require object detection. I can implement all three functions mentioned above separately (meaning that i can only implement one function when camera turned on), but I would like to ask how to switch between using different functions while keeping the camera on? how should I organize my code?

GitHub Copilot:

要在保持摄像头开启的情况下切换不同的功能（如手势识别、面部姿态估计和物体检测），你可以采用以下步骤组织你的代码：

初始化摄像头：首先，初始化摄像头，确保它在整个应用程序中保持开启状态。
定义功能模块：为手势识别、面部姿态估计和物体检测各定义一个函数或类。这样可以在需要时调用相应的功能。
创建功能切换逻辑：根据比赛的阶段或用户的输入，决定何时切换到不同的功能。这可能涉及到监听某种信号或检查某个条件。
循环捕获帧：在一个循环中不断从摄像头捕获帧，并根据当前的功能需求，将捕获的帧传递给相应的功能模块处理。
处理结果：每个功能模块处理完帧之后，根据需要进行结果的显示、存储或进一步处理。

以下是一个简化的伪代码示例：

def init_camera():
    # 初始化摄像头
    pass

def gesture_recognition(frame):
    # 处理帧进行手势识别
    pass

def face_pose_estimation(frame):
    # 处理帧进行面部姿态估计
    pass

def object_detection(frame):
    # 处理帧进行物体检测
    pass

def main():
    camera = init_camera()
    current_stage = 1  # 假设1为手势识别，2为面部姿态估计，3为物体检测

    while True:
        frame = camera.capture_frame()

        if current_stage == 1:
            gesture_recognition(frame)
        elif current_stage == 2:
            face_pose_estimation(frame)
        elif current_stage == 3:
            object_detection(frame)

        # 根据某些条件更新current_stage来切换功能
        # 例如，基于用户输入或时间限制

if __name__ == "__main__":
    main()

这个结构允许你在不关闭摄像头的情况下，根据比赛的不同阶段或其他条件，灵活地切换不同的处理功能。

参考它们给出的回答，打算将每个功能封装成一个函数或者类，在主循环中需要开启使用功能就把当前从摄像头获取到的帧传入具体的函数或者类即可。

并且打算借助git的submodule功能，让代码看的更清晰且更容易维护

于是先创建了 vision 的仓库并clone，然后创建了 face_orientation_recognition 和 face_identification 两个仓库，再在 vision 中添加submodule

git submodule add face_orientation_recognition.git src/scripts/face_orientation_recognition

git submodule add face_identification.git src/scripts/face_identification

而由于不是很清楚python的包中的文件结构要如何组织才能比较清晰，所以询问了一下chatgpt

对话记录

User:

在python自己创建的包中，一般modules和utils 的子文件夹中，通常的含义和用途是什么？

ChatGPT:

在Python包中，modules和utils子文件夹通常有特定的用途和含义，尽管它们的确切用途可能会因项目而异。以下是它们的一些常见用途：

`modules` 文件夹

核心功能：
- modules 文件夹通常包含实现软件或库的核心功能的模块。这些模块可以是不同的组件或功能单元，它们一起组成了整个软件系统。
分离关注点：
- 每个模块可能专注于特定的功能或任务，例如数据处理、网络通信、用户认证等。这种分离有助于提高代码的可维护性和可读性。
命名约定：
- 这个文件夹的名称“modules”并没有强制性的约定，但它通常被用来表明这些文件是程序的主要组成部分。

`utils` 文件夹

实用工具：
- utils 文件夹通常包含各种实用工具函数或辅助函数，这些函数不是程序的核心功能，但在多个地方都可能会用到。它们通常是一些通用的、可以复用的代码片段。
代码复用：
- 将常用的辅助函数放在utils文件夹中，可以避免代码重复，提高代码复用性。例如，字符串处理、日期处理、文件操作等通用功能通常会放在这里。
组织辅助代码：
- 这个文件夹的存在使得主要模块的代码更加简洁，因为辅助功能被移到了一个独立的地方。这也有助于更好地组织代码。

具体例子

假设我们有一个名为my_package的包，下面是一个可能的文件夹结构：

my_package/
├── __init__.py
├── modules/
│   ├── data_processing.py
│   ├── network.py
│   └── authentication.py
└── utils/
    ├── string_utils.py
    ├── date_utils.py
    └── file_utils.py

data_processing.py、network.py和authentication.py可能包含处理数据、网络通信和用户认证的核心代码。
string_utils.py、date_utils.py和file_utils.py包含各种字符串处理、日期处理和文件操作的辅助函数。

这种结构有助于代码的模块化和组织，使得项目更容易维护和扩展。

User:

还有哪些常用和常见的子文件夹名呢？

ChatGPT:

在Python项目中，还有一些常见的子文件夹名称，它们通常用于特定的用途或组织代码。以下是一些常见的子文件夹名称及其用途：

常见子文件夹名称及用途

config：
- 用于存放配置文件和配置相关的代码。
- 例如，数据库配置、API密钥、应用程序设置等。
tests：
- 用于存放单元测试、集成测试和其他类型的测试代码。
- 通常包含各种测试用例来确保代码的正确性。
scripts：
- 用于存放各种脚本文件，这些脚本通常是用来执行一些特定任务或操作的。
- 例如，数据迁移脚本、自动化脚本等。
docs：
- 用于存放项目的文档和说明文件。
- 例如，用户手册、API文档、设计文档等。
examples：
- 用于存放示例代码和示例应用程序。
- 帮助用户了解如何使用库或框架。
data：
- 用于存放数据文件，例如CSV文件、JSON文件、数据库文件等。
- 常用于数据处理和机器学习项目。
assets：
- 用于存放静态资源，例如图片、样式表、JavaScript文件等。
- 常用于Web项目。
models：
- 用于存放数据模型相关的代码。
- 例如，ORM模型、机器学习模型等。
migrations：
- 用于存放数据库迁移文件。
- 常用于Django、Flask等使用ORM的Web框架。
lib：
- 用于存放外部库或依赖项。
- 例如，自定义的第三方库或依赖项。
bin：
- 用于存放可执行文件和脚本。
- 例如，命令行工具和执行脚本。

示例项目结构

下面是一个假设的Python项目结构示例：

my_project/
├── config/
│   ├── settings.py
│   └── secrets.py
├── data/
│   ├── input_data.csv
│   └── processed_data.json
├── docs/
│   ├── README.md
│   └── API.md
├── examples/
│   ├── example1.py
│   └── example2.py
├── lib/
│   └── custom_library.py
├── models/
│   ├── user_model.py
│   └── product_model.py
├── modules/
│   ├── data_processing.py
│   ├── network.py
│   └── authentication.py
├── scripts/
│   ├── migrate_data.py
│   └── cleanup.py
├── tests/
│   ├── test_data_processing.py
│   ├── test_network.py
│   └── test_authentication.py
└── utils/
    ├── string_utils.py
    ├── date_utils.py
    └── file_utils.py

这种结构清晰地组织了项目的各个部分，使得项目易于导航和维护。不同的文件夹根据其用途分类，使得代码管理更加高效。

User:

lib , models, modules, scripts, utils, config 这几个之间的区别，你能再详细说一下吗？

ChatGPT:

当然，以下是对 lib、models、modules、scripts、utils 和 config 这几个文件夹之间的详细区别和用途解释：

`lib`

用途： - 用于存放外部库或自定义的库文件。这些库文件可能是从外部引入的第三方库，也可能是项目中自定义的、可复用的代码模块。

典型内容： - 第三方库的包装器 - 自定义的工具库 - 不依赖于项目具体业务逻辑的通用库

示例：

lib/
├── custom_library.py
└── third_party_wrapper.py

`models`

用途： - 用于存放数据模型相关的代码。这些模型通常表示项目中使用的数据结构，例如数据库表、机器学习模型等。

典型内容： - ORM（对象关系映射）模型（如Django或SQLAlchemy的模型） - 机器学习模型的定义和训练代码 - 数据结构定义

示例：

models/
├── user_model.py
├── product_model.py
└── ml_model.py

`modules`

用途： - 用于存放实现软件或库的核心功能的模块。这些模块是项目的主要组成部分，包含主要的业务逻辑和功能实现。

典型内容： - 各种功能模块，如数据处理、网络通信、用户认证等 - 每个模块通常专注于特定的功能或任务

示例：

modules/
├── data_processing.py
├── network.py
└── authentication.py

`scripts`

用途： - 用于存放各种脚本文件，这些脚本通常是用来执行特定任务或操作的，例如数据迁移、自动化任务等。

典型内容： - 数据迁移脚本 - 自动化脚本 - 启动和部署脚本

示例：

scripts/
├── migrate_data.py
└── cleanup.py

`utils`

用途： - 用于存放各种实用工具函数或辅助函数，这些函数不是项目的核心功能，但在多个地方都可能会用到，提供辅助支持。

典型内容： - 通用的工具函数，例如字符串处理、日期处理、文件操作等 - 可以被多个模块复用的代码

示例：

utils/
├── string_utils.py
├── date_utils.py
└── file_utils.py

`config`

用途： - 用于存放配置文件和配置相关的代码。这些配置文件通常包含项目的各种设置和参数，例如数据库配置、API密钥等。

典型内容： - 应用程序设置 - 环境变量配置 - 安全和敏感信息（通常需要安全管理）

示例：

config/
├── settings.py
└── secrets.py

总结对比

lib：外部或自定义的库文件，不特定于项目的业务逻辑。
models：数据模型定义和相关代码，表示项目中的数据结构。
modules：核心功能模块，包含主要业务逻辑。
scripts：执行特定任务或操作的脚本文件。
utils：通用工具函数和辅助函数，提供支持和复用。
config：配置文件和相关代码，管理项目设置和参数。

这种分类和组织方式有助于保持项目结构清晰，便于维护和扩展。

编写 `face_orientation_recognition`¶

于是将 head-pose-estimation 中的 face_detection.py mark_detection.py pose_estimation.py 移动到 face_orientation_recognition/modules/ 下，utils.py assets/ 移动到 face_orientation_recognition/ 下

参照 main.py ，在 face_orientation_recognition/ 下编写 orientation_recognition.py

orientation_recognition.py

import cv2
import numpy as np

from modules.face_detection import FaceDetector
from modules.mark_detection import MarkDetector
from modules.pose_estimation import PoseEstimator
from utils import refine


class OrientationRecognizer:
    def __init__(self, frame_width, frame_height):
        ...

    def __call__(self, frame):
        ...


def get_orientation_angle(pose):
    ...

def get_adjustment(angle):
    ...

然后在 vision/ 下测试使用这个包

vision/test.py
import cv2
from face_orientation_recognition.face_orientation import OrientationRecognizer

cap = cv2.VideoCapture(0)

frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

estimator = OrientationRecognizer(frame_width, frame_height)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    has_face, estimated_frame, angle, adjustment = estimator(frame)

    if has_face:
        cv2.imshow('frame', estimated_frame)
        print(f'angle: {angle}, adjustment: {adjustment}')
    else:
        cv2.imshow('frame', frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

出现了报错

Traceback (most recent call last):
  File "c:\Github\vision\src\scripts\test.py", line 2, in <module>
    from face_orientation_recognition.face_orientation import OrientationRecognizer
  File "c:\Github\vision\src\scripts\face_orientation_recognition\face_orientation.py", line 10, in <module>
    from modules.face_detection import FaceDetector
ModuleNotFoundError: No module named 'modules'

发现是需要使用相对导入

我参考了yolov8中的代码

ultralytics/ultralytics/trackers/byte_tracker.py at 5d479c73c2933fa5020632e1b2d5b77220dce156 · ultralytics/ultralytics (github.com)
byte_tracker.py
# Ultralytics YOLO 🚀, AGPL-3.0 license

import numpy as np

from ..utils import LOGGER
from ..utils.ops import xywh2ltwh
from .basetrack import BaseTrack, TrackState
from .utils import matching
from .utils.kalman_filter import KalmanFilterXYAH

将 face_orientation.py 修改成相对导入

import cv2
import numpy as np

from .modules.face_detection import FaceDetector
from .modules.mark_detection import MarkDetector
from .modules.pose_estimation import PoseEstimator
from .utils import refine

再次运行，出现了导入模型文件的报错

Traceback (most recent call last):
  File "c:\Github\vision\src\scripts\test.py", line 9, in <module>  
    estimator = OrientationRecognizer(frame_width, frame_height)    
  File "c:\Github\vision\src\scripts\face_orientation_recognition\face_orientation.py", line 35, in __init__
    self.face_detector = FaceDetector("assets/face_detector.onnx")  
  File "c:\Github\vision\src\scripts\face_orientation_recognition\modules\face_detection.py", line 67, in __init__
    assert os.path.exists(model_file), f"File not found: {model_file}"
AssertionError: File not found: assets/face_detector.onnx

于是询问copilot

对话记录

RonaldLN:

@terminal /explain #terminalLastCommand

Github Copilot:

根据您提供的终端最后运行命令的上下文，错误信息表明在初始化 OrientationRecognizer 对象时，尝试加载 face_detector.onnx 模型文件失败，因为无法在指定的路径 assets/face_detector.onnx 找到该文件。

这个问题通常是因为文件路径不正确或者文件确实不存在于预期的位置。解决这个问题的方法包括：

检查文件是否存在：首先，确保 face_detector.onnx 文件确实存在于项目的 assets 目录下。你需要在 C:\Github\vision 目录下查找 assets 目录，并确认 face_detector.onnx 文件是否在那里。
正确设置文件路径：如果 assets 目录和 face_detector.onnx 文件都存在，但仍然出现错误，可能是因为程序运行时的当前工作目录（CWD）不是你预期的目录。在这种情况下，你可以使用绝对路径或者相对于当前脚本文件的路径来引用模型文件。例如，如果你的脚本位于 c:\Github\vision\src\scripts 目录下，你可以这样修改代码：
```
import os

# 获取当前脚本文件的目录
current_dir = os.path.dirname(os.path.abspath(__file__))
# 构建模型文件的绝对路径
model_path = os.path.join(current_dir, "..", "..", "assets", "face_detector.onnx")

self.face_detector = FaceDetector(model_path)
```
这段代码使用 os.path 模块来构建 face_detector.onnx 文件的绝对路径，无论当前的工作目录是什么，都能正确地找到文件。
确保路径分隔符正确：在 Windows 系统上，路径分隔符是反斜杠 \，而在 UNIX-like 系统上（包括 Linux 和 macOS），路径分隔符是正斜杠 /。虽然 Python 通常可以正确处理这些差异，但在某些情况下，确保使用正确的路径分隔符可能会避免路径问题。os.path.join 方法可以自动处理这种差异，所以推荐使用。

通过上述方法之一，你应该能够解决找不到模型文件的问题。

于是进行修改

face_orientation.py

import os
...
class OrientationRecognizer:
    def __init__(self, frame_width, frame_height):
        ...
        # 获取当前脚本文件的目录
        current_dir = os.path.dirname(os.path.abspath(__file__))

        # Setup a face detector to detect human faces.
        self.face_detector = FaceDetector(os.path.join(current_dir, "assets", "face_detector.onnx"))

        # Setup a mark detector to detect landmarks.
        self.mark_detector = MarkDetector(os.path.join(current_dir, "assets", "face_landmarks.onnx"))
        ...

modules/pose_estimation.py

import os


# 获取当前脚本文件的目录
current_dir = os.path.dirname(os.path.abspath(__file__))
# 模型文件的绝对路径
model_path = os.path.join(current_dir, "..", "assets", "model.txt")

...

class PoseEstimator:
    ...
    def _get_full_model_points(self, filename=model_path):
        ...

再次运行，就能够使用摄像头进行检测了

然后继续写 face_identification ，有了 face_orientation_recognition 的经验，这个就简单很多了，很快就弄完了

最后更新: 2024-07-05
创建日期: 2024-07-05

重新编写视觉模块记录¶

Explanation:

modules 文件夹

utils 文件夹

具体例子

常见子文件夹名称及用途

示例项目结构

lib

models

modules

scripts

utils

config

总结对比

编写 face_orientation_recognition¶

评论

`modules` 文件夹

`utils` 文件夹

`lib`

`models`

`modules`

`scripts`

`utils`

`config`

编写 `face_orientation_recognition`¶