MediaPipe Solutions provides a set of libraries and tools that help you quickly implement artificial intelligence (AI) and machine learning (ML) technologies in your applications. You can instantly integrate these solutions into your app, customize them as needed, and use them across multiple development platforms. MediaPipe Solutions is part of the MediaPipe open-source project, so you can further modify the solution code to meet the specific needs of your application.
With the MediaPipe Object Detector task, you can detect the presence and location of multiple objects in an image or video. For example, the object detector can identify the location of a dog in an image. This task uses a machine learning (ML) model to process image data, accepting either static images or continuous video streams as input and output a list of detection results. Each result represents an object detected in the image or video.
The EfficientDet-Lite0 model uses the EfficientNet-Lite0 backbone with a 320x320 input size and a BiFPN feature network. This model is trained on the COCO dataset, a large object detection dataset containing 1.5 million object instances and 80 object labels. You can refer to the complete list of supported labels. EfficientDet-Lite0 is available in int8, float16, or float32 formats. This model is recommended because it strikes a balance between latency and accuracy. It is both accurate and lightweight, making it suitable for many use cases.
The EfficientDet-Lite2 model uses the EfficientNet-Lite2 backbone with a 448x448 input size and a BiFPN feature network. This model is trained on the COCO dataset, a large object detection dataset containing 1.5 million object instances and 80 object labels. You can refer to the complete list of supported labels. EfficientDet-Lite2 is available as an int8, float16, or float32 model. It is generally more accurate than EfficientDet-Lite0, but it is also slower and requires more memory. This model is suitable for use cases where accuracy is more important than speed and size.
The SSD MobileNetV2 model uses the MobileNetV2 backbone with a 256x256 input size and an SSD feature network. This model is trained on the COCO dataset, a large object detection dataset containing 1.5 million object instances and 80 object labels. You can refer to the complete list of supported labels. SSD MobileNetV2 is available in int8 and float32 formats. This model is faster and more lightweight than EfficientDet-Lite0, but typically less accurate. It is suitable for use cases that require a fast, lightweight model, even if it sacrifices some accuracy.
If you decide to build a model for this task, this section outlines the requirements for custom models. The custom model must be in TensorFlow Lite format and must include metadata describing the model's operational parameters.
The UNIHIKER is a new generation of domestically produced open-source hardware specifically designed for Python learning and use. It features a single-board computer architecture, integrating an LCD color screen, WiFi, Bluetooth, various common sensors, and a wealth of expansion interfaces. It also comes with a Linux operating system and a Python environment pre-installed with commonly used Python libraries, allowing teachers and students to start Python education with just two steps.
The UNIHIKER is a development board based on the RK3308 Arm 64-bit quad-core processor, with a clock speed of up to 1.2 GHz. It is equipped with 512 MB of DDR3 memory and a 16 GB eMMC storage, running Debian 10 operating system. It supports 2.4 GHz Wi-Fi and Bluetooth 4.0, utilizing the RTL8723DS chip. Additionally, the UNIHIKER integrates a GD32VF103C8T6 RISC-V coprocessor with a clock speed of 108 MHz, featuring 64 KB of Flash and 32 KB of SRAM.
The UNIHIKER includes various onboard components, such as Home and A/B buttons, a 2.8-inch touch color screen with a resolution of 240x320. The device also features a capacitive silicon microphone, a PT0603 light-sensitive phototransistor, a passive buzzer, and a blue LED. It also integrates an ICM20689 six-axis sensor, which includes a three-axis accelerometer and a three-axis gyroscope.
In terms of interfaces, the UNIHIKER offers multiple connection options. It has a USB Type-C port for programming or powering the board via a PC. There is also a USB Type-A port for connecting external USB devices. Additionally, the board includes a microSD card slot for expanded storage, a 3-pin I/O supporting 3 channels of 10-bit PWM and 2 channels of 12-bit ADC, a dedicated 4-pin I2C interface, and 19 independent I/O GPIOs compatible with micro:bit, supporting various communication protocols and functions.
Running MediaPipe on the UNIHIKER is significant for several reasons:
1. Education and Practice: The UNIHIKER is a domestically produced open-source hardware designed specifically for Python learning and use, pre-installed with commonly used Python libraries. It is well-suited for teaching and practicing Python. Running MediaPipe on the UNIHIKER allows students and developers to gain a more intuitive understanding and application of artificial intelligence (AI) and machine learning (ML) technologies.
2. Efficient Development: MediaPipe provides an efficient set of libraries and tools for the rapid development of AI and ML applications. The UNIHIKER, equipped with an LCD color screen, WiFi, Bluetooth, and various sensors, offers a rich development environment and hardware support, making it possible to quickly validate and showcase AI applications using MediaPipe.
3. Hardware Performance: The UNIHIKER is based on the RK3308 Arm 64-bit quad-core processor, with 512 MB of DDR3 memory and 16 GB of eMMC storage, sufficient to support the operation of simple MediaPipe models. Its built-in six-axis sensor, microphone, and other sensors also provide valuable data input sources for AI applications.
4. Open Source and Customization: As an open-source project, MediaPipe allows users to deeply customize it according to their needs. The UNIHIKER's open-source hardware design also supports customization and expansion of both hardware and software, making it suitable for various application scenarios when combined with MediaPipe.
5. Portability and Integration: The UNIHIKER is compact and fully functional, making it ideal for portable AI application development and demonstration. Running MediaPipe on it enables real-time object detection and other AI features, applicable in various scenarios such as IoT devices, smart home systems, and educational robots.
By running MediaPipe on the UNIHIKER, you can fully utilize its hardware resources and software environment to quickly implement and showcase AI applications, providing strong support for education, research, and development.
This article will demonstrate how to deploy three small object detection models using MediaPipe on the UNIHIKER, including the following sections:
- Preparing the environment for running MediaPipe on the UNIHIKER.
- Running MediaPipe object detection code on the UNIHIKER.
- Accelerating performance using the int8 quantization version of the model.
- Testing the impact of different image resolutions on the performance of these models.
In the terminal, enter:
wget
https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-aarch64.sh
Once the download is complete, the terminal will display:
Miniforge3-Linux-aarch64.sh’ saved [74300552/74300552]
In the terminal, enter:
sudo bash Miniforge3-Linux-aarch64.sh
During the process, follow any prompts to press ENTER or type yes
as needed. Once completed, the terminal will display:
Added mamba to /root/.bashrc
==> For changes to take effect, close and re-open your current shell. <==
Thank you for installing Miniforge3!
In the terminal, enter:
source ~/.bashrc
After installation is complete, in the terminal enter:
conda
The terminal will display:
In the terminal, enter:
conda activate
Last:
now:
In the terminal, enter:
conda create -n mediap python==3.10
Enter y
Once the environment is set up, the terminal displays:
In the terminal, enter:
conda activate mediapipe
Type in the terminal:
pip install mediapipe
In the terminal, enter:
conda activate mediapipe
In the terminal, enter:
mkdir mediapipe
cd mediapipe
You can download the models you need to run from the following URL:
https://ai.google.dev/edge/mediapipe/solutions/vision/object_detector/index?hl=zh-cn#models
As introduced in the Introduction, the official site provides three object detection models:
You can download and run these three models separately.
We will continue using `bus.jpg` here. If you want to download it, you can enter the following command in the terminal:
wget https://ultralytics.com/images/bus.jpg
Create a new file named test_pic.py and write the following code:
import numpy as np
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import cv2
import time
# Define color mappings
COLORS = [(255, 0, 0), (0, 255, 0), (0, 0, 255)] # Red, Green, Blue
DEFAULT_COLOR = (255, 255, 255) # White for other categories
def visualize(original_image, detection_result, scale_ratio) -> np.ndarray:
"""Draws bounding boxes on the input image and return it.
Args:
original_image: The original input RGB image.
detection_result: The list of all "Detection" entities to be visualized.
scale_ratio: The ratio of the resized image size to the original image size.
Returns:
Image with bounding boxes.
"""
# Dictionary to store category-color mappings
category_color_map = {}
for detection in detection_result.detections:
# Get the category of the object
category = detection.categories[0]
category_name = category.category_name
# Assign a color to the category if not already assigned
if category_name not in category_color_map:
if len(category_color_map) < len(COLORS):
category_color_map[category_name] = COLORS[len(category_color_map)]
else:
category_color_map[category_name] = DEFAULT_COLOR
color = category_color_map[category_name]
# Draw bounding box
bbox = detection.bounding_box
start_point = int(bbox.origin_x / scale_ratio), int(bbox.origin_y / scale_ratio)
end_point = int((bbox.origin_x + bbox.width) / scale_ratio), int((bbox.origin_y + bbox.height) / scale_ratio)
cv2.rectangle(original_image, start_point, end_point, color, 3)
# Draw label and score
probability = round(category.score, 2)
result_text = category_name + ' (' + str(probability) + ')'
text_location = (int((MARGIN + bbox.origin_x) / scale_ratio),
int((MARGIN + ROW_SIZE + bbox.origin_y) / scale_ratio))
cv2.putText(original_image, result_text, text_location, cv2.FONT_HERSHEY_PLAIN,
FONT_SIZE, color, FONT_THICKNESS)
return original_image
def detect_objects(model_path, image_path, target_width, output_path, show_image):
"""Detect objects in an image and save/display the result.
Args:
model_path: Path to the model file.
image_path: Path to the input image file.
target_width: The width to resize the image for inference.
output_path: Path to save the annotated image.
show_image: Boolean flag to display the image after processing.
"""
# STEP 1: Import the necessary modules.
base_options = python.BaseOptions(model_asset_path=model_path)
options = vision.ObjectDetectorOptions(base_options=base_options,
score_threshold=0.5)
detector = vision.ObjectDetector.create_from_options(options)
# STEP 2: Create an ObjectDetector object.
global MARGIN, ROW_SIZE, FONT_SIZE, FONT_THICKNESS
MARGIN = 10 # pixels
ROW_SIZE = 10 # pixels
FONT_SIZE = 1
FONT_THICKNESS = 1
# STEP 3: Read the local image
original_image = cv2.imread(image_path)
if original_image is None:
print("Error: Could not read image.")
return
# Resize the image to a smaller size to speed up inference
scale_ratio = target_width / original_image.shape[1]
target_height = int(original_image.shape[0] * scale_ratio)
resized_image = cv2.resize(original_image, (target_width, target_height))
# Convert the image to the format required by MediaPipe.
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=resized_image)
# STEP 4: Detect objects in the image.
start_time = time.time()
detection_result = detector.detect(mp_image)
end_time = time.time()
# Print inference time
inference_time = end_time - start_time
print(f'Inference time: {inference_time:.2f} seconds')
# STEP 5: Process the detection result. In this case, visualize it.
annotated_image = visualize(original_image, detection_result, scale_ratio)
# Save the result
cv2.imwrite(output_path, annotated_image)
print(f'Result saved to {output_path}')
# Display the result if needed
if show_image:
cv2.imshow('Object Detection', annotated_image)
cv2.waitKey(0)
cv2.destroyAllWindows()
# Example call
detect_objects(
model_path='efficientdet_lite0.tflite',
image_path='bus.jpg',
target_width=640,
output_path='bus_640.jpg',
show_image=False
)
Here we use efficientdet_lite0.tflite model downloaded in the previous step.
You can see that the efficientdet_lite0
model takes 2.62 seconds to process an image with a resolution of 640. The processed image result is as follows:
Using the efficientdet_lite2 model takes 8.47 seconds:
Running the ssd_mobilenet_v2, model takes 1.88 seconds:
Different models have varying performance and speed:
efficientdet_lite0 has average accuracy.
ssd_mobilenet_v2 offers better speed and accuracy.
We can use the int8 models provided by the official source for acceleration. The steps are as follows:
URL:
efficientdet_lite0_int8.tflite
: EfficientDet-Lite0 (int8)
efficientdet_lite2_int8.tflite
: EfficientDet-Lite2 (int8)
ssd_mobilenet_v2_int8.tflite
: SSDMobileNet-V2 (int8)
import numpy as np
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import cv2
import time
# Define color mappings
COLORS = [(255, 0, 0), (0, 255, 0), (0, 0, 255)] # Red, Green, Blue
DEFAULT_COLOR = (255, 255, 255) # White for other categories
def visualize(original_image, detection_result, scale_ratio) -> np.ndarray:
(... ...)
def detect_objects(model_path, image_path, target_width, output_path, show_image):
(... ...)
# Example call
detect_objects(
model_path='efficientdet_lite0_int8.tflite',
image_path='bus.jpg',
target_width=640,
output_path='bus_640.jpg',
show_image=False
)
efficientdet_lite0_int8.tflite
1.43s:
efficientdet_lite2_int8.tflite
4.03s:
ssd_mobilenet_v2_int8.tflite
1.82s:
After quantization, the speed of the two detection models shows a significant improvement, while MobileNet's speed remains largely unchanged. Accuracy is well maintained across all models.
The accuracy criteria used in this test are as follows (and similarly in the following sections):
To further speed up inference, we can also reduce the size of the input images. The steps are as follows:
import numpy as np
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import cv2
import time
# Define color mappings
COLORS = [(255, 0, 0), (0, 255, 0), (0, 0, 255)] # Red, Green, Blue
DEFAULT_COLOR = (255, 255, 255) # White for other categories
def visualize(original_image, detection_result, scale_ratio) -> np.ndarray:
(... ...)
def detect_objects(model_path, image_path, target_width, output_path, show_image):
(... ...)
# Example call
for r in [640, 448, 320, 256, 128, 64]:
print(r)
detect_objects(
model_path='efficientdet_lite0_int8.tflite',
image_path='bus.jpg',
target_width=r,
output_path=f'bus_det_l2_{r}_int8.jpg',
show_image=False)
efficientdet_lite0.tflite
:
efficientdet_lite0_int8.tflite
:
efficientdet_lite2.tflite
:
efficientdet_lite2_int8.tflite
:
ssd_mobilenet_v2.tflite
:
ssd_mobilenet_v2_int8.tflite
:
It can be observed that reduScing the input size does not significantly speed up the models. Additionally, the training resolutions for these three models are fixed:
- `efficientdet_lite0.tflite`: 320
- `efficientdet_lite2.tflite`: 448
- `ssd_mobilenet_v2.tflite`: 256
Larger input image sizes tend to reduce accuracy. Interestingly, although `ssd_mobilenet_v2.tflite` has the lowest training resolution, it still adapts relatively well to high-resolution images.
1. Running Mediapipe Object Detection Models on UNIHIKER: This article provides a detailed guide on deploying Mediapipe on UNIHIKER and tests several common models.
2. Speed Improvement with int8 Quantization on UNIHIKER: On UNIHIKER int8 quantization results in significant speed improvements for the efficientdet_lite0
and efficientdet_lite2
models, with accuracy remaining largely unchanged. However, int8 quantization did not bring speed improvements for the ssd_mobilenet_v2
model (the correctness of the official model is uncertain).
3. Model Recommendations for Object Detection with Mediapipe on UNIHIKER: Reducing input size does not significantly increase model speed, and for some models, larger input sizes decrease performance. For medium to small image resolutions (128-448), the efficientdet_lite0_int8.tflite
model is recommended. For larger image resolutions (>448), the ssd_mobilenet_v2.tflite
model is recommended.
4. Accuracy Based on Resolution: The efficientdet_lite0_int8.tflite model has the best accuracy with smaller resolutions. At larger resolutions, the efficientdet_lite2_int8.tflite
model performs better in terms of accuracy.
5. Upcoming Tests:More tests will be conducted soon; stay tuned for updates.
If you need any help or want to join more discussions, feel free to join our Discord: https://discord.gg/PVAWBMPwsk