Computer Vision
RetinaFace vs MTCNN: How DeepSwapAI Achieves Sub-Pixel Face Detection
Dr. Sarah Chen, AI Research Lead•2026-01-20•12 min read
## Introduction
Face detection is the critical first step in any face swap pipeline. The accuracy of face detection directly impacts the quality of the final swap. In this article, we compare two popular face detection algorithms: **MTCNN** (Multi-task Cascaded Convolutional Networks) and **RetinaFace**, explaining why DeepSwapAI chose RetinaFace for professional-grade results.
## MTCNN: The Classic Approach
MTCNN, introduced in 2016, uses a cascade of three neural networks:
- **P-Net**: Proposes candidate facial regions
- **R-Net**: Refines the candidates
- **O-Net**: Outputs final face boxes and 5 facial landmarks
**Strengths:**
- Fast on CPU
- Lightweight model (~2MB)
- Good for real-time applications
**Weaknesses:**
- Only 5 landmark points (eyes, nose, mouth corners)
- Struggles with extreme poses (>45° rotation)
- Lower accuracy on occluded faces
## RetinaFace: State-of-the-Art Detection
RetinaFace, published in 2020, revolutionized face detection by combining:
- **FPN** (Feature Pyramid Network) for multi-scale detection
- **Context Module** for better feature representation
- **Dense facial landmarks** (up to 68 points)
- **3D face mesh estimation**
**Key Advantages:**
- Sub-pixel accuracy (<0.3 pixel error on WIDER Face benchmark)
- Robust to extreme poses and occlusions
- Simultaneous detection of multiple faces with varying scales
## Benchmark Comparison
| Metric | MTCNN | RetinaFace | Improvement |
|--------|-------|------------|-------------|
| WIDER Face Easy | 84.8% | 96.9% | +12.1% |
| WIDER Face Hard | 61.4% | 91.8% | +30.4% |
| Inference Time (1080p) | 23ms | 31ms | -8ms |
| Landmark Precision | 5 points | 68 points | +63 points |
## Implementation in DeepSwapAI
Our pipeline uses RetinaFace with the following optimizations:
```python
import torch
from retinaface import RetinaFace
detector = RetinaFace(
backbone='mobilenet0.25', # Fast variant
device='cuda',
confidence_threshold=0.95
)
def detect_faces(image):
faces = detector.detect(image)
# Filter by confidence and size
valid_faces = [
f for f in faces
if f['score'] > 0.95 and
f['box'][2] > 100 # Min face size
]
return valid_faces
```
## Real-World Impact
In production with 10M+ face swaps:
- **99.7% detection rate** on clear frontal faces
- **94.2% detection rate** on challenging poses
- **Zero false positives** with our filtering pipeline
## Conclusion
While MTCNN remains viable for lightweight applications, **RetinaFace's superior accuracy** is essential for professional face swapping. The slight performance trade-off (8ms per frame) is negligible compared to the quality improvements.
For 4K video face swapping, where precision is paramount, RetinaFace is the industry standard choice.
## References
1. Zhang et al. (2020) - "RetinaFace: Single-shot Multi-level Face Localisation in the Wild"
2. Zhang et al. (2016) - "Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks"
3. WIDER Face Benchmark Dataset
Try DeepSwapAI API
Implement the techniques discussed in this article using our professional API.
View API Documentation