How to Install and Run Mediapipe on UNIHIKER

Reviews 10/09/2024 6508

Introduction

Mediapipe

MediaPipe Solutions provides a set of libraries and tools that help you quickly implement artificial intelligence (AI) and machine learning (ML) technologies in your applications. You can instantly integrate these solutions into your app, customize them as needed, and use them across multiple development platforms. MediaPipe Solutions is part of the MediaPipe open-source project, so you can further modify the solution code to meet the specific needs of your application.

With the MediaPipe Object Detector task, you can detect the presence and location of multiple objects in an image or video. For example, the object detector can identify the location of a dog in an image. This task uses a machine learning (ML) model to process image data, accepting either static images or continuous video streams as input and output a list of detection results. Each result represents an object detected in the image or video.

The Models Used by MediaPipe

EfficientDet-Lite0

The EfficientDet-Lite0 model uses the EfficientNet-Lite0 backbone with a 320x320 input size and a BiFPN feature network. This model is trained on the COCO dataset, a large object detection dataset containing 1.5 million object instances and 80 object labels. You can refer to the complete list of supported labels. EfficientDet-Lite0 is available in int8, float16, or float32 formats. This model is recommended because it strikes a balance between latency and accuracy. It is both accurate and lightweight, making it suitable for many use cases.

EfficientDet-Lite2

The EfficientDet-Lite2 model uses the EfficientNet-Lite2 backbone with a 448x448 input size and a BiFPN feature network. This model is trained on the COCO dataset, a large object detection dataset containing 1.5 million object instances and 80 object labels. You can refer to the complete list of supported labels. EfficientDet-Lite2 is available as an int8, float16, or float32 model. It is generally more accurate than EfficientDet-Lite0, but it is also slower and requires more memory. This model is suitable for use cases where accuracy is more important than speed and size.

SSD MobileNetV2

The SSD MobileNetV2 model uses the MobileNetV2 backbone with a 256x256 input size and an SSD feature network. This model is trained on the COCO dataset, a large object detection dataset containing 1.5 million object instances and 80 object labels. You can refer to the complete list of supported labels. SSD MobileNetV2 is available in int8 and float32 formats. This model is faster and more lightweight than EfficientDet-Lite0, but typically less accurate. It is suitable for use cases that require a fast, lightweight model, even if it sacrifices some accuracy.

Custom Model

If you decide to build a model for this task, this section outlines the requirements for custom models. The custom model must be in TensorFlow Lite format and must include metadata describing the model's operational parameters.

UNIHIKER

The UNIHIKER is a new generation of domestically produced open-source hardware specifically designed for Python learning and use. It features a single-board computer architecture, integrating an LCD color screen, WiFi, Bluetooth, various common sensors, and a wealth of expansion interfaces. It also comes with a Linux operating system and a Python environment pre-installed with commonly used Python libraries, allowing teachers and students to start Python education with just two steps.

The UNIHIKER is a development board based on the RK3308 Arm 64-bit quad-core processor, with a clock speed of up to 1.2 GHz. It is equipped with 512 MB of DDR3 memory and a 16 GB eMMC storage, running Debian 10 operating system. It supports 2.4 GHz Wi-Fi and Bluetooth 4.0, utilizing the RTL8723DS chip. Additionally, the UNIHIKER integrates a GD32VF103C8T6 RISC-V coprocessor with a clock speed of 108 MHz, featuring 64 KB of Flash and 32 KB of SRAM.

The UNIHIKER includes various onboard components, such as Home and A/B buttons, a 2.8-inch touch color screen with a resolution of 240x320. The device also features a capacitive silicon microphone, a PT0603 light-sensitive phototransistor, a passive buzzer, and a blue LED. It also integrates an ICM20689 six-axis sensor, which includes a three-axis accelerometer and a three-axis gyroscope.

In terms of interfaces, the UNIHIKER offers multiple connection options. It has a USB Type-C port for programming or powering the board via a PC. There is also a USB Type-A port for connecting external USB devices. Additionally, the board includes a microSD card slot for expanded storage, a 3-pin I/O supporting 3 channels of 10-bit PWM and 2 channels of 12-bit ADC, a dedicated 4-pin I2C interface, and 19 independent I/O GPIOs compatible with micro:bit, supporting various communication protocols and functions.

Onboard Components and Interfaces of UNIHIKER Single Board Computer

Running MediaPipe on the UNIHIKER

Running MediaPipe on the UNIHIKER is significant for several reasons:

1. Education and Practice: The UNIHIKER is a domestically produced open-source hardware designed specifically for Python learning and use, pre-installed with commonly used Python libraries. It is well-suited for teaching and practicing Python. Running MediaPipe on the UNIHIKER allows students and developers to gain a more intuitive understanding and application of artificial intelligence (AI) and machine learning (ML) technologies.

2. Efficient Development: MediaPipe provides an efficient set of libraries and tools for the rapid development of AI and ML applications. The UNIHIKER, equipped with an LCD color screen, WiFi, Bluetooth, and various sensors, offers a rich development environment and hardware support, making it possible to quickly validate and showcase AI applications using MediaPipe.

3. Hardware Performance: The UNIHIKER is based on the RK3308 Arm 64-bit quad-core processor, with 512 MB of DDR3 memory and 16 GB of eMMC storage, sufficient to support the operation of simple MediaPipe models. Its built-in six-axis sensor, microphone, and other sensors also provide valuable data input sources for AI applications.

4. Open Source and Customization: As an open-source project, MediaPipe allows users to deeply customize it according to their needs. The UNIHIKER's open-source hardware design also supports customization and expansion of both hardware and software, making it suitable for various application scenarios when combined with MediaPipe.

5. Portability and Integration: The UNIHIKER is compact and fully functional, making it ideal for portable AI application development and demonstration. Running MediaPipe on it enables real-time object detection and other AI features, applicable in various scenarios such as IoT devices, smart home systems, and educational robots.

By running MediaPipe on the UNIHIKER, you can fully utilize its hardware resources and software environment to quickly implement and showcase AI applications, providing strong support for education, research, and development.

This article will demonstrate how to deploy three small object detection models using MediaPipe on the UNIHIKER, including the following sections:

- Preparing the environment for running MediaPipe on the UNIHIKER.

- Running MediaPipe object detection code on the UNIHIKER.

- Accelerating performance using the int8 quantization version of the model.

- Testing the impact of different image resolutions on the performance of these models.

Preparation: Environment Configuration for Running MediaPipe

Step1 Download mini conda

In the terminal, enter:

wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-aarch64.sh

Once the download is complete, the terminal will display:

Miniforge3-Linux-aarch64.sh’ saved [74300552/74300552]

Step2 Install mini conda

In the terminal, enter:

sudo bash Miniforge3-Linux-aarch64.sh

During the process, follow any prompts to press ENTER or type yes as needed. Once completed, the terminal will display:

Added mamba to /root/.bashrc

==> For changes to take effect, close and re-open your current shell. <==

Thank you for installing Miniforge3!

In the terminal, enter:

source ~/.bashrc

After installation is complete, in the terminal enter:

conda

The terminal will display:

Step3 Activate conda

In the terminal, enter：

conda activate

Last:

root@unihiker:

now:

(base) root@unihiker:~

Step4 Create a MediaPipe environment in Conda

In the terminal, enter：

conda create -n mediap python==3.10

Enter y

Once the environment is set up, the terminal displays:

Step5 Activate mediapipe environment

In the terminal, enter：

conda activate mediapipe

Step6 Install Mediapipe

Type in the terminal:

pip install mediapipe

Quick Start: Running Mediapipe Models

Step1 Activate the mediapipe environment

In the terminal, enter：

conda activate mediapipe

Step2 Create folder mediapipe

In the terminal, enter：

mkdir mediapipe

cd mediapipe

Step3 Download model

You can download the models you need to run from the following URL:

https://ai.google.dev/edge/mediapipe/solutions/vision/object_detector/index?hl=zh-cn#models

As introduced in the Introduction, the official site provides three object detection models:

EfficientDet-Lite0

EfficientDet-Lite2

SSDMobileNet-V2

You can download and run these three models separately.

Step4 Prepare images

We will continue using `bus.jpg` here. If you want to download it, you can enter the following command in the terminal:

wget https://ultralytics.com/images/bus.jpg

Step5 Code

Create a new file named test_pic.py and write the following code:

CODE

import numpy as np
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import cv2
import time

# Define color mappings
COLORS = [(255, 0, 0), (0, 255, 0), (0, 0, 255)]  # Red, Green, Blue
DEFAULT_COLOR = (255, 255, 255)  # White for other categories

def visualize(original_image, detection_result, scale_ratio) -> np.ndarray:
    """Draws bounding boxes on the input image and return it.
    Args:
      original_image: The original input RGB image.
      detection_result: The list of all "Detection" entities to be visualized.
      scale_ratio: The ratio of the resized image size to the original image size.
    Returns:
      Image with bounding boxes.
    """
    # Dictionary to store category-color mappings
    category_color_map = {}

    for detection in detection_result.detections:
        # Get the category of the object
        category = detection.categories[0]
        category_name = category.category_name

        # Assign a color to the category if not already assigned
        if category_name not in category_color_map:
            if len(category_color_map) < len(COLORS):
                category_color_map[category_name] = COLORS[len(category_color_map)]
            else:
                category_color_map[category_name] = DEFAULT_COLOR

        color = category_color_map[category_name]

        # Draw bounding box
        bbox = detection.bounding_box
        start_point = int(bbox.origin_x / scale_ratio), int(bbox.origin_y / scale_ratio)
        end_point = int((bbox.origin_x + bbox.width) / scale_ratio), int((bbox.origin_y + bbox.height) / scale_ratio)
        cv2.rectangle(original_image, start_point, end_point, color, 3)

        # Draw label and score
        probability = round(category.score, 2)
        result_text = category_name + ' (' + str(probability) + ')'
        text_location = (int((MARGIN + bbox.origin_x) / scale_ratio),
                         int((MARGIN + ROW_SIZE + bbox.origin_y) / scale_ratio))
        cv2.putText(original_image, result_text, text_location, cv2.FONT_HERSHEY_PLAIN,
                    FONT_SIZE, color, FONT_THICKNESS)

    return original_image

def detect_objects(model_path, image_path, target_width, output_path, show_image):
    """Detect objects in an image and save/display the result.
    Args:
      model_path: Path to the model file.
      image_path: Path to the input image file.
      target_width: The width to resize the image for inference.
      output_path: Path to save the annotated image.
      show_image: Boolean flag to display the image after processing.
    """
    # STEP 1: Import the necessary modules.
    base_options = python.BaseOptions(model_asset_path=model_path)
    options = vision.ObjectDetectorOptions(base_options=base_options,
                                           score_threshold=0.5)
    detector = vision.ObjectDetector.create_from_options(options)

    # STEP 2: Create an ObjectDetector object.
    global MARGIN, ROW_SIZE, FONT_SIZE, FONT_THICKNESS
    MARGIN = 10  # pixels
    ROW_SIZE = 10  # pixels
    FONT_SIZE = 1
    FONT_THICKNESS = 1

    # STEP 3: Read the local image
    original_image = cv2.imread(image_path)
    if original_image is None:
        print("Error: Could not read image.")
        return

    # Resize the image to a smaller size to speed up inference
    scale_ratio = target_width / original_image.shape[1]
    target_height = int(original_image.shape[0] * scale_ratio)
    resized_image = cv2.resize(original_image, (target_width, target_height))

    # Convert the image to the format required by MediaPipe.
    mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=resized_image)

    # STEP 4: Detect objects in the image.
    start_time = time.time()
    detection_result = detector.detect(mp_image)
    end_time = time.time()

    # Print inference time
    inference_time = end_time - start_time
    print(f'Inference time: {inference_time:.2f} seconds')

    # STEP 5: Process the detection result. In this case, visualize it.
    annotated_image = visualize(original_image, detection_result, scale_ratio)

    # Save the result
    cv2.imwrite(output_path, annotated_image)
    print(f'Result saved to {output_path}')

    # Display the result if needed
    if show_image:
        cv2.imshow('Object Detection', annotated_image)
        cv2.waitKey(0)
        cv2.destroyAllWindows()

# Example call
detect_objects(
    model_path='efficientdet_lite0.tflite',
    image_path='bus.jpg',
    target_width=640,
    output_path='bus_640.jpg',
    show_image=False
)

Here we use efficientdet_lite0.tflite model downloaded in the previous step.

Step6 Run the code and check the results

The EfficientDet Lite0 model takes 2.62 seconds to process a 640x640 image

You can see that the efficientdet_lite0 model takes 2.62 seconds to process an image with a resolution of 640. The processed image result is as follows:

efficientdet_lite0 model processed image

Using the efficientdet_lite2 model takes 8.47 seconds:

efficientdet_lite2 model processed image

Running the ssd_mobilenet_v2 model takes 1.88 seconds:

Summary

Different models have varying performance and speed:
efficientdet_lite0 has average accuracy.
ssd_mobilenet_v2 offers better speed and accuracy.

Optimization: Use models with int8 quantization

We can use the int8 models provided by the official source for acceleration. The steps are as follows:

Step1 Download official int8 model

URL:

efficientdet_lite0_int8.tflite: EfficientDet-Lite0 (int8)

efficientdet_lite2_int8.tflite: EfficientDet-Lite2 (int8)

ssd_mobilenet_v2_int8.tflite: SSDMobileNet-V2 (int8)

Step2 Code

CODE

import numpy as np
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import cv2
import time

# Define color mappings
COLORS = [(255, 0, 0), (0, 255, 0), (0, 0, 255)]  # Red, Green, Blue
DEFAULT_COLOR = (255, 255, 255)  # White for other categories

def visualize(original_image, detection_result, scale_ratio) -> np.ndarray:
    (... ...)

def detect_objects(model_path, image_path, target_width, output_path, show_image):
    (... ...)

# Example call
detect_objects(
    model_path='efficientdet_lite0_int8.tflite',
    image_path='bus.jpg',
    target_width=640,
    output_path='bus_640.jpg',
    show_image=False
)

Step3 Results

efficientdet_lite0_int8.tflite 1.43s：

efficientdet_lite0_int8.tflite model takes1.43s

efficientdet_lite0_int8.tflite model processed image

efficientdet_lite2_int8.tflite 4.03s：

efficientdet_lite2_int8.tflite model takes 4.03s

efficientdet_lite2_int8.tflite model processed image

ssd_mobilenet_v2_int8.tflite 1.82s：

ssd_mobilenet_v2_int8.tflite model takes 1.82s

ssd_mobilenet_v2_int8.tflite model processed image

Summary

After quantization, the speed of the two detection models shows a significant improvement, while MobileNet's speed remains largely unchanged. Accuracy is well maintained across all models.

The accuracy criteria used in this test are as follows (and similarly in the following sections):

5. Further Testing: Different Input Image Sizes

To further speed up inference, we can also reduce the size of the input images. The steps are as follows:

Step1 Code

CODE

import numpy as np
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import cv2
import time

# Define color mappings
COLORS = [(255, 0, 0), (0, 255, 0), (0, 0, 255)]  # Red, Green, Blue
DEFAULT_COLOR = (255, 255, 255)  # White for other categories

def visualize(original_image, detection_result, scale_ratio) -> np.ndarray:
    (... ...)

def detect_objects(model_path, image_path, target_width, output_path, show_image):
    (... ...)

# Example call

for r in [640, 448, 320, 256, 128, 64]:
    print(r) 
    detect_objects(
    model_path='efficientdet_lite0_int8.tflite', 
    image_path='bus.jpg', 
    target_width=r, 
    output_path=f'bus_det_l2_{r}_int8.jpg', 
    show_image=False)

Step2 Results

efficientdet_lite0.tflite：

efficientdet_lite0_int8.tflite：

efficientdet_lite2.tflite：

efficientdet_lite2_int8.tflite：

ssd_mobilenet_v2.tflite：

ssd_mobilenet_v2_int8.tflite：

Summary

It can be observed that reducing the input size does not significantly speed up the models. Additionally, the training resolutions for these three models are fixed:

- efficientdet_lite0.tflite: 320

- efficientdet_lite2.tflite: 448

- ssd_mobilenet_v2.tflite: 256

Larger input image sizes tend to reduce accuracy. Interestingly, although ssd_mobilenet_v2.tflite has the lowest training resolution, it still adapts relatively well to high-resolution images.

Conclusion

1. Running Mediapipe Object Detection Models on UNIHIKER: This article provides a detailed guide on deploying Mediapipe on UNIHIKER and tests several common models.

2. Speed Improvement with int8 Quantization on UNIHIKER: On UNIHIKER int8 quantization results in significant speed improvements for the efficientdet_lite0 and efficientdet_lite2 models, with accuracy remaining largely unchanged. However, int8 quantization did not bring speed improvements for the ssd_mobilenet_v2 model (the correctness of the official model is uncertain).

3. Model Recommendations for Object Detection with Mediapipe on UNIHIKER: Reducing input size does not significantly increase model speed, and for some models, larger input sizes decrease performance. For medium to small image resolutions (128-448), the efficientdet_lite0_int8.tflite model is recommended. For larger image resolutions (>448), the ssd_mobilenet_v2.tflite model is recommended.

4. Accuracy Based on Resolution: The efficientdet_lite0_int8.tflite model has the best accuracy with smaller resolutions. At larger resolutions, the efficientdet_lite2_int8.tflite model performs better in terms of accuracy.

5. Upcoming Tests：More tests will be conducted soon; stay tuned for updates.

If you need any help or want to join more discussions, feel free to join our Discord: https://discord.gg/PVAWBMPwsk