更加实用的姿态估计

本文将通过强调推理过程中的一个关键问题来讨论如何使姿势估计算法更有效，并讨论如何缓解这个问题。还介绍了一个示例，使得姿势估计变得更加实用。

关键词：human pose-estimation, jitter, low-pass filter, signal.

人体姿势估计是计算机视觉中非常具有挑战性的问题之一，其目标是定位人体关键点（如臀部、肩部、手腕等）。

它有无数的应用程序，包括AR、基于VR的游戏（如Microsoft Kinect）、交互式健身、治疗、运动捕捉等。结果的逐帧平滑对于这些应用程序的任何用途都至关重要。

抖动问题

几乎每种姿态估计算法在推理过程中都存在抖动问题。点周围关键点的高频振荡是噪声信号的特征，称为抖动。

抖动原因可归因于我们在整个视频输入的帧级别上执行这些推断。这些连续的帧具有不同的遮挡（以及一系列复杂的姿势）。另一个原因可能是训练数据中注释的不一致性导致姿势估计的不确定性。抖动会带来以下问题：

故障数据和噪声数据会导致算法性能下降。
关键点太过嘈杂，无法在生产环境中构建任何有用的功能和应用程序。
获得假阳性数据点的概率很高。
例如：假设你想使用姿势估计建立一个静止记分器（对于做冥想的人来说），这些抖动会显著影响分数,导致结果不准确。

抖动问题的解决方案

信号处理提供了两种主要的方法来衰减信号中的噪声。低通滤波器：将信号中的所有频率衰减到指定阈值频率以下，并使其余信号保持不变的滤波器。

高通滤波器：一种滤波器，将信号中的所有频率衰减到指定阈值频率以上，并使其余信号保持不变。

我们的自然运动是低频信号，而抖动是高频信号。因此，为了解决抖动问题，我们可以使用低通滤波器来过滤所有更高频率的信号。

解决抖动问题的其他方法包括使用神经网络进行姿势优化。其中一个例子是SmoothNet。然而，LPF更容易实现和使用。LPF的另一个变体是One Euro滤波器，它在实时过滤噪声信号方面也非常强大。

Movenet姿态估计

让我们从一些代码开始，让LPF在python中工作。为了在本博客中进行说明，我使用了Tensorflow的Movenet姿势估计模型。这个模型非常快速和准确。

现在，让我们考虑一些将用于推理的简单函数。可以从此处下载tflite模型：

https://tfhub.dev/google/lite-model/movenet/singlepose/thunder/tflite/float16/4?lite-format=tflite

tf.lite中提供了用于在tflite上运行推理的Python API。（参考：使用tflite在python中加载并运行模型）。

整个代码都可以在我的GitHub存储库中找到：

https://github.com/aakash2016/blog-codes/tree/master/motion-detection

# Initialize the TFLite interpreter
input_size = 
256
interpreter = tf.lite.Interpreter(model_path=
"model.tflite")

interpreter.allocate_tensors()


# Movenet model: Runs detection on an input image
defmovenet(input_image):
# TF Lite format expects tensor type of uint8.
    input_image = tf.cast(input_image, dtype=tf.uint8)


# Get input and output tensors.
    input_details = interpreter.get_input_details()

    output_details = interpreter.get_output_details()


    interpreter.set_tensor(input_details[
0][
'index'], input_image.numpy())

    interpreter.invoke() 
# Invoke inference.

# Get the model prediction.
    kps = interpreter.get_tensor(output_details[
0][
'index'])


return kps


# Obtain inference from the Movenet model
defget_inference(image):
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)


# Padding and Resizing the input image.
    image = pad(image, input_size, input_size)

    image = cv2.resize(image, (input_size, input_size))

    input_image = image


# Movenet expects a [1, height, width, 3] tensor input
    input_image = np.expand_dims(input_image, axis=
0)


# Run model inference.
    kps = movenet(input_image)[
0]


return kps[
0], image

可以在这里找到整个Python脚本:

https://github.com/aakash2016/blog-codes/blob/master/motion-detection/inference/movenet_infer.py

使用以下命令在本地运行推理（首先，在克隆后执行“cd motion-detection”）：

python -m inference.movenet_infer — path file.mp4 — lpf n

让我们看看使用Movenet模型的示例推断结果：

显然，推断看起来相当准确，延迟也很小。

现在，让我们回到一开始看到的抖动示例，看看如何解决抖动问题。为了便于演示，我们使用了低通滤波器。我们还可以使用Python-Scipy中流行的信号处理库，该库支持不同类型的低通滤波器（例如signal.lfilter模块）。

1€ LPF的使用情况如下所示：

whileTrue:

    old_curr_kp, image = get_inference(frame2)

    curr_kp = [x[:] 
for x 
in old_curr_kp]  
# deepcopy

if j == 
0:

        x_track = [OneEuroFilter(j, curr_kp[k][
0], 
0.6, 
0.015) 
for k 
in range(num_kps)]  
# track for all keypoints
        y_track = [OneEuroFilter(j, curr_kp[k][
1], 
0.6, 
0.015) 
for k 
in range(num_kps)]


if lpf 
and j > 
1:

for i 
in range(num_kps):

## x coordinate
            curr_kp[i][
0] = x_track[i](j, curr_kp[i][
0])


## y coordinate
            curr_kp[i][
1] = y_track[i](j, curr_kp[i][
1])


    output = draw_pose(image, curr_kp)

    output = cv2.cvtColor(output, cv2.COLOR_BGR2RGB)

    outimage = np.asarray(output, dtype=np.uint8)

    outimage = cv2.resize(outimage, size)


    prev_kp = curr_kp

    ret, frame2 = cap.read()

    cframe = cap.get(cv2.CAP_PROP_POS_FRAMES)

    j += 
1

ifnot ret:

break

    k = cv2.waitKey(
1)

if k == ord(
'q') 
or k == 
27:

break

cap.release()

cv2.destroyAllWindows()