实现检测人脸朝向相对相机偏移程度的记录¶

约 2642 个字 • 287 行代码

由于比赛需要机器人走到志愿者的正面后，再向志愿者询问，所以打算通过检测人脸朝向相对于相机的偏移程度(人脸正对相机认为不偏移)，来判断机器人是否到达志愿者的正面

于是通过询问Microsoft Copilot，找到了一个比较好用的github的仓库

yinguobing/head-pose-estimation: Realtime human head pose estimation with ONNXRuntime and OpenCV. (github.com)

借助github copilot来理解代码

对话记录

RonaldLN:

@workspace /explain Write an explanation for the active selection as paragraphs of text.

main.py

"""Demo code showing how to estimate human head pose.

There are three major steps:
1. Detect and crop the human faces in the video frame.
2. Run facial landmark detection on the face image.
3. Estimate the pose by solving a PnP problem.

For more details, please refer to:
https://github.com/yinguobing/head-pose-estimation
"""
from argparse import ArgumentParser

import cv2

from face_detection import FaceDetector
from mark_detection import MarkDetector
from pose_estimation import PoseEstimator
from utils import refine

# Parse arguments from user input.
parser = ArgumentParser()
parser.add_argument("--video", type=str, default=None,
                    help="Video file to be processed.")
parser.add_argument("--cam", type=int, default=0,
                    help="The webcam index.")
args = parser.parse_args()


print(__doc__)
print("OpenCV version: {}".format(cv2.__version__))


def run():
    # Before estimation started, there are some startup works to do.

    # Initialize the video source from webcam or video file.
    video_src = args.cam if args.video is None else args.video
    cap = cv2.VideoCapture(video_src)
    print(f"Video source: {video_src}")

    # Get the frame size. This will be used by the following detectors.
    frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

    # Setup a face detector to detect human faces.
    face_detector = FaceDetector("assets/face_detector.onnx")

    # Setup a mark detector to detect landmarks.
    mark_detector = MarkDetector("assets/face_landmarks.onnx")

    # Setup a pose estimator to solve pose.
    pose_estimator = PoseEstimator(frame_width, frame_height)

    # Measure the performance with a tick meter.
    tm = cv2.TickMeter()

    # Now, let the frames flow.
    while True:

        # Read a frame.
        frame_got, frame = cap.read()
        if frame_got is False:
            break

        # If the frame comes from webcam, flip it so it looks like a mirror.
        if video_src == 0:
            frame = cv2.flip(frame, 2)

        # Step 1: Get faces from current frame.
        faces, _ = face_detector.detect(frame, 0.7)

        # Any valid face found?
        if len(faces) > 0:
            tm.start()

            # Step 2: Detect landmarks. Crop and feed the face area into the
            # mark detector. Note only the first face will be used for
            # demonstration.
            face = refine(faces, frame_width, frame_height, 0.15)[0]
            x1, y1, x2, y2 = face[:4].astype(int)
            patch = frame[y1:y2, x1:x2]

            # Run the mark detection.
            marks = mark_detector.detect([patch])[0].reshape([68, 2])

            # Convert the locations from local face area to the global image.
            marks *= (x2 - x1)
            marks[:, 0] += x1
            marks[:, 1] += y1

            # Step 3: Try pose estimation with 68 points.
            pose = pose_estimator.solve(marks)

            tm.stop()

            # All done. The best way to show the result would be drawing the
            # pose on the frame in realtime.

            # Do you want to see the pose annotation?
            pose_estimator.visualize(frame, pose, color=(0, 255, 0))

            # Do you want to see the axes?
            pose_estimator.draw_axes(frame, pose)

            # Do you want to see the marks?
            # mark_detector.visualize(frame, marks, color=(0, 255, 0))

            # Do you want to see the face bounding boxes?
            # face_detector.visualize(frame, faces)

        # Draw the FPS on screen.
        cv2.rectangle(frame, (0, 0), (90, 30), (0, 0, 0), cv2.FILLED)
        cv2.putText(frame, f"FPS: {tm.getFPS():.0f}", (10, 20),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255))

        # Show preview.
        cv2.imshow("Preview", frame)
        if cv2.waitKey(1) == 27:
            break


if __name__ == '__main__':
    run()

Github Copilot:

已使用 1 引用

main.py: 1-124

这段代码是一个演示如何估计人类头部姿态的示例。它主要分为三个步骤：首先，检测并裁剪视频帧中的人脸；其次，对人脸图像运行面部标记检测；最后，通过解决PnP（Perspective-n-Point）问题来估计姿态。这个过程使用了OpenCV库和一些自定义的模块，如 FaceDetector 、 MarkDetector 和 PoseEstimator 。

首先，代码通过 argparse 库解析用户输入的参数，这些参数指定了要处理的视频文件或摄像头索引。然后，它初始化视频源，可以是摄像头或视频文件，并获取视频帧的尺寸。接下来，它初始化面部检测器、标记检测器和姿态估计器，这些都是基于预先训练的模型。

在主循环中，代码不断读取视频帧，并首先检查是否成功获取到帧。如果是从摄像头获取的帧，会将其翻转以模拟镜子效果。然后，它执行面部检测，并对检测到的每个面部进行标记检测和姿态估计。这里，它只处理第一个检测到的面部作为演示。标记检测的结果是68个面部标记点，这些点被用于姿态估计。姿态估计完成后，可以选择性地在视频帧上绘制姿态注释、坐标轴、标记点和面部边界框。

此外，代码还使用 cv2.TickMeter 来测量性能，即计算每秒可以处理多少帧（FPS）。最后，处理后的帧被显示在屏幕上，如果用户按下 Esc 键，则退出循环。

整个过程展示了如何结合使用面部检测、标记检测和姿态估计技术来估计视频中人类头部的姿态，这对于人机交互、增强现实等应用场景非常有用。

RonaldLN:

@workspace /explain Write an explanation for the active selection as paragraphs of text.

pose_estimation.py

    def solve(self, points):
        """Solve pose with all the 68 image points
        Args:
            points (np.ndarray): points on image.

        Returns:
            Tuple: (rotation_vector, translation_vector) as pose.
        """

        if self.r_vec is None:
            (_, rotation_vector, translation_vector) = cv2.solvePnP(
                self.model_points_68, points, self.camera_matrix, self.dist_coeefs)
            self.r_vec = rotation_vector
            self.t_vec = translation_vector

        (_, rotation_vector, translation_vector) = cv2.solvePnP(
            self.model_points_68,
            points,
            self.camera_matrix,
            self.dist_coeefs,
            rvec=self.r_vec,
            tvec=self.t_vec,
            useExtrinsicGuess=True)

        return (rotation_vector, translation_vector)

Github Copilot:

已使用 3 引用

pose_estimation.py: 49-73
.../cv2/__init__pyi: 5994
.../cv2/__init__pyi: 5996

这段代码定义了一个名为 solve 的方法，它的目的是使用OpenCV的 cv2.solvePnP 函数来解决头部姿态估计问题。这个方法接受一个名为 points 的参数，这是一个 np.ndarray 类型，代表图像上的68个面部标记点。

首先，方法检查 self.r_vec 方法，或者之前没有成功的旋转向量和平移向量的估计。在这种情况下，它会调用 cv2.solvePnP 函数，传入模型点（ self.model_points_68 ，这是一个预定义的68个点的3D模型），图像点（ points ），相机矩阵（ self.camera_matrix ），和畸变系数（ self.dist_coeefs ）。这次调用不使用外部猜测（ useExtrinsicGuess 参数），因为没有先前的旋转向量和平移向量可用作参考。调用结果是一个旋转向量和一个平移向量，它们被存储在 self.r_vec 和 self.t_vec 中，以便后续使用。

接下来，无论 self.r_vec 是否为 None ，代码都会再次调用 cv2.solvePnP 函数。这次调用传入相同的模型点、图像点、相机矩阵和畸变系数，但还额外传入了 self.r_vec 作为旋转向量和平移向量的初始估计，并将 useExtrinsicGuess 设置为 True 。这意味着函数会使用这些初始估计值作为解决PnP问题的起点，这通常可以提高估计的准确性和效率。

最后，方法返回更新后的旋转向量和平移向量作为一个元组。这两个向量描述了从模型坐标系到相机坐标系的变换，可以用来估计头部的姿态。

总的来说，这段代码通过两次调用 cv2.solvePnP ，一次可能不使用外部猜测，一次使用外部猜测，来提高头部姿态估计的准确性和稳定性。

然后询问如何获取到偏移的角度

对话记录

RonaldLN:

如何获取到人脸相对于相机偏移的角度(人脸正对相机视为偏移为0)

GitHub Copilot:

已使用 1 引用

main.py: 1-124

要获取人脸相对于相机的偏移角度（其中人脸正对相机视为偏移为0），可以通过解决一个称为“姿态估计”（Pose Estimation）的问题来实现。在提供的代码中，已经包含了姿态估计的步骤，主要通过 PoseEstimator 类完成。这个类通过解决一个称为PnP（Perspective-n-Point）问题来估计头部姿态，即通过已知的几个3D点在2D图像上的投影，来估计相机视角下这些3D点的空间位置和姿态。

以下是获取人脸相对于相机偏移角度的步骤概述：

检测人脸：使用 FaceDetector 类检测视频帧中的人脸。
检测面部标记：使用 MarkDetector 类检测人脸上的关键点（例如眼角、鼻尖等）。
姿态估计：使用 PoseEstimator 类，基于面部标记的2D坐标和预定义的3D模型点，解决PnP问题来估计头部姿态。

姿态估计的结果通常包括旋转和平移向量。旋转向量可以转换为欧拉角（Roll, Pitch, Yaw），分别表示绕X轴、Y轴和Z轴的旋转，这些角度可以用来表示人脸相对于相机的偏移角度。

Roll（滚转角）：表示头部的左右倾斜。
Pitch（俯仰角）：表示头部的上下倾斜。
Yaw（偏航角）：表示头部的左右转动。

在提供的代码中， pose_estimator.solve(marks) 方法返回的 pose 对象应该包含了旋转向量，你可以将这个旋转向量转换为欧拉角来获取偏移角度。转换可以使用OpenCV的 cv2.Rodrigues 函数和一些数学转换来完成。

以下是如何从旋转向量获取欧拉角的示例代码：

import cv2
import numpy as np

# 假设rotation_vector是从pose_estimator.solve(marks)获取的旋转向量
rotation_vector = pose[0]  # 这里的pose[0]应该替换为实际获取的旋转向量

# 将旋转向量转换为旋转矩阵
rotation_matrix, _ = cv2.Rodrigues(rotation_vector)

# 从旋转矩阵计算欧拉角
yaw, pitch, roll = cv2.decomposeProjectionMatrix(rotation_matrix)[-1]

print(f"Yaw (偏航角): {yaw}")
print(f"Pitch (俯仰角): {pitch}")
print(f"Roll (滚转角): {roll}")

请注意，这里的 pose[0] 应该替换为实际从 pose_estimator.solve(marks) 获取的旋转向量。此外，根据实际情况，可能需要对计算出的角度进行适当的转换或调整，以确保它们与你的应用场景相匹配。

将copilot提供的代码添加到了 main.py 中

main.py
# Any valid face found?
if len(faces) > 0:
    tm.start()

    # Step 2: Detect landmarks. Crop and feed the face area into the
    # mark detector. Note only the first face will be used for
    # demonstration.
    face = refine(faces, frame_width, frame_height, 0.15)[0]
    x1, y1, x2, y2 = face[:4].astype(int)
    patch = frame[y1:y2, x1:x2]

    # Run the mark detection.
    marks = mark_detector.detect([patch])[0].reshape([68, 2])

    # Convert the locations from local face area to the global image.
    marks *= (x2 - x1)
    marks[:, 0] += x1
    marks[:, 1] += y1

    # Step 3: Try pose estimation with 68 points.
    pose = pose_estimator.solve(marks)

    rotation_vector = pose[0]

    # 将旋转向量转换为旋转矩阵
    rotation_matrix, _ = cv2.Rodrigues(rotation_vector)

    # 从旋转矩阵计算欧拉角
    yaw, pitch, roll = cv2.decomposeProjectionMatrix(projection_matrix)[-1]

    print(f"Yaw (偏航角): {yaw}")
    print(f"Pitch (俯仰角): {pitch}")
    print(f"Roll (滚转角): {roll}")

    tm.stop()

但运行后出现报错

Traceback (most recent call last):
  File "C:\Github\head-pose-estimation\main.py", line 135, in <module>
    run()
  File "C:\Github\head-pose-estimation\main.py", line 100, in run
    yaw, pitch, roll = cv2.decomposeProjectionMatrix(rotation_matrix)[-1]
cv2.error: OpenCV(4.9.0) D:\a\opencv-python\opencv-python\opencv\modules\calib3d\src\calibration.cpp:3353: error: (-209:Sizes of input arguments do not match) Size of projection matrix must be 3x4! in function 'cvDecomposeProjectionMatrix'

继续询问copilot

对话记录

RonaldLN:

cv2.error: OpenCV(4.9.0) D:\a\opencv-python\opencv-python\opencv\modules\calib3d\src\calibration.cpp:3353: error: (-209:Sizes of input arguments do not match) Size of projection matrix must be 3x4! in function 'cvDecomposeProjectionMatrix'

how to fix it?

Github Copilot:

要解决这个问题，你需要确保传递给 cv2.decomposeProjectionMatrix 的矩阵大小为3x4。在你的代码中，rotation_matrix 是通过 cv2.Rodrigues 从旋转向量转换得到的，其大小为3x3。cv2.decomposeProjectionMatrix 期望的是一个投影矩阵，而不仅仅是一个旋转矩阵。

一个常见的解决方案是将旋转矩阵扩展为一个3x4的矩阵，通过添加一个3x1的零向量（或适当的平移向量，如果你有的话）作为第四列。这里，我们将使用一个零向量作为示例：

import numpy as np
import cv2

# 假设rotation_vector是从pose_estimator.solve(marks)获取的旋转向量
rotation_vector = pose[0]  # 这里的pose[0]应该替换为实际获取的旋转向量

# 将旋转向量转换为旋转矩阵
rotation_matrix, _ = cv2.Rodrigues(rotation_vector)

# 创建一个3x4的矩阵，其中包含旋转矩阵和一个3x1的零向量作为第四列
projection_matrix = np.hstack((rotation_matrix, np.zeros((3, 1))))

# 现在，projection_matrix的大小是3x4，可以传递给cv2.decomposeProjectionMatrix
_, _, _, _, _, _, euler_angles = cv2.decomposeProjectionMatrix(projection_matrix)

yaw, pitch, roll = euler_angles.flatten()

print(f"Yaw (偏航角): {yaw}")
print(f"Pitch (俯仰角): {pitch}")
print(f"Roll (滚转角): {roll}")

这段代码首先将旋转矩阵和一个3x1的零向量水平堆叠（ np.hstack ）来形成一个3x4的投影矩阵。然后，这个3x4的矩阵被传递给 cv2.decomposeProjectionMatrix ，这样就不会再出现尺寸不匹配的错误了。

于是参考它的回答，对原来的代码稍加修改

main.py
# Step 3: Try pose estimation with 68 points.
pose = pose_estimator.solve(marks)

rotation_vector = pose[0]

# 将旋转向量转换为旋转矩阵
rotation_matrix, _ = cv2.Rodrigues(rotation_vector)

# 创建一个3x4的矩阵，其中包含旋转矩阵和一个3x1的零向量作为第四列
projection_matrix = np.hstack((rotation_matrix, np.zeros((3, 1))))

# 从旋转矩阵计算欧拉角
yaw, pitch, roll = cv2.decomposeProjectionMatrix(projection_matrix)[-1]

print(f"Yaw (偏航角): {yaw}")
print(f"Pitch (俯仰角): {pitch}")
print(f"Roll (滚转角): {roll}")

再次运行就可以显示欧拉角了。

继续询问copilot如何通过欧拉角计算出偏移角度

对话记录

RonaldLN:

如何通过欧拉角来计算人脸相对于相机偏移的角度？(从而能够使机器人移动到人的正面)

Github Copilot:

已使用 1 引用

main.py: 104-109

要通过欧拉角计算人脸相对于相机的偏移角度，并使机器人移动到人的正面，你可以遵循以下步骤：

理解欧拉角：欧拉角是用来描述一个物体在三维空间中的方向的。在这个场景中，它们描述的是人脸相对于相机的方向。通常，欧拉角包括偏航角（Yaw）、俯仰角（Pitch）和滚转角（Roll）。
- 偏航角（Yaw）：描述的是左右转动的角度。如果偏航角大于某个阈值，说明人脸偏向相机的一侧，机器人需要向相反的方向移动。
- 俯仰角（Pitch）：描述的是上下倾斜的角度。如果俯仰角大于某个阈值，说明人脸偏向相机的上方或下方，机器人可能需要调整其高度或倾斜角度。
- 滚转角（Roll）：描述的是侧倾的角度。通常情况下，对于机器人来说，滚转角的调整不是必须的，除非机器人的头部设计允许侧倾。
计算偏移角度：基于偏航角和俯仰角，你可以计算出机器人需要移动的方向和距离。具体的计算方法取决于机器人的移动能力和所需的精确度。
设计移动策略：
- 如果偏航角较大，机器人需要左转或右转来调整方向，直到偏航角减小到接近零（或一个很小的阈值内）。
- 如果俯仰角较大，机器人需要上升或下降（如果可能的话），或者调整其摄像头的倾斜角度，直到俯仰角减小到接近零。
实施调整：根据计算出的偏移角度，生成控制机器人移动的命令。这可能包括转动角度、前进距离或调整高度等。

以下是一个简化的示例代码，展示如何根据偏航角和俯仰角来决定机器人的移动方向：

def adjust_robot_position(yaw, pitch):
    # 定义偏航角和俯仰角的阈值
    yaw_threshold = 10  # 偏航角阈值，单位为度
    pitch_threshold = 5  # 俯仰角阈值，单位为度

    # 根据偏航角调整方向
    if yaw > yaw_threshold:
        print("向右转")
        # 发送向右转的命令到机器人
    elif yaw < -yaw_threshold:
        print("向左转")
        # 发送向左转的命令到机器人

    # 根据俯仰角调整高度或倾斜角度
    if pitch > pitch_threshold:
        print("向下调整")
        # 发送向下调整的命令到机器人或调整摄像头角度
    elif pitch < -pitch_threshold:
        print("向上调整")
        # 发送向上调整的命令到机器人或调整摄像头角度

# 假设已经通过某种方式获取了偏航角和俯仰角
adjust_robot_position(yaw, pitch)

这个示例仅提供了一个基本的框架。实际的实现细节将取决于你的机器人平台的具体命令和控制接口。

然后经过测试发现，人脸向左或向右转影响的是之前代码中的 pitch ，于是大概编写代码

        if len(faces) > 0:
            ...
            # 从旋转矩阵计算欧拉角
            yaw, pitch, roll = cv2.decomposeProjectionMatrix(projection_matrix)[-1]

            print(f"Pitch: {pitch}")

            # 调整机器人的位置
            adjust_robot_position(pitch)


def adjust_robot_position(angle):
    # 定义阈值
    angle_threshold = 5  # 偏航角阈值，单位为度

    # 根据偏航角调整方向
    if angle > angle_threshold:
        print("向右转")
        # 发送向右转的命令到机器人
    elif angle < -angle_threshold:
        print("向左转")
        # 发送向左转的命令到机器人

最后更新: 2024-07-02
创建日期: 2024-07-02

实现检测人脸朝向相对相机偏移程度的记录¶

评论