| dc.description.abstract | Real-time multi-person facial expression recognition (FER) is increasingly important for
applications such as telemedicine, e-learning, workplace safety, and human–computer
interaction, particularly in resource constrained CPU and edge-device environments.
Despite rapid advances, selecting FER pipelines that balance accuracy, latency, and
scalability without GPU acceleration remains a significant challenge. This paper
presents a PRISMA-guided systematic review of 34 peer-reviewed studies published
between 2015 and 2025, sourced from IEEE Xplore and ScienceDirect, with the aim
of identifying the most practical FER pipelines for real-time multi-person FER on
consumer-grade hardware with CPUs and integrated GPUs. The review evaluates
four widely adopted approaches, including MTCNN, RetinaFace, MediaPipe Face
Detection, and DeepFace, using reported metrics such as face detection accuracy,
expression classification performance, latency, frames per second (FPS), multi-face
robustness, and CPU feasibility. The analysis reveals that MediaPipe-based pipelines
are consistently reported to achieve approximately 30–60 FPS on commodity CPUs,
enabling stable multi-face tracking with low computational overhead. In contrast,
RetinaFace demonstrates higher face detection accuracy, while DeepFace-based FER
pipelines achieve higher expression classification accuracy when combined with robust
face detection. However, both typically operate at approximately 5–10 FPS on CPUs
without optimization, which limits their scalability in crowded or time-critical scenarios.
The review also identifies inconsistent evaluation protocols and incomplete hardware
reporting across studies, which hinder reproducibility and fair comparison. Overall, the
findings position MediaPipe as the most practical solution for real-time multi-person
FER on CPU and edge platforms and highlight the need for standardized evaluation
frameworks to support future research and deployment. | en_US |