Computer VisionObject DetectionOcclusion RobustnessYOLOv8EfficientDetDETRPyTorchCOCO mAPData Augmentation

Occluded Pet Detection in Domestic Environments

Studied whether occlusion-aware training improves indoor cat/dog detection under partial visibility using synthetic ADE20K indoor-object occlusions and a real occluded test set; evaluated YOLOv8, EfficientDet-D0, and DETR with COCO-style mAP.

Overview

Object detection study focused on domestic indoor settings where pets are frequently partially visible (occluded by furniture/household objects). Used Oxford-IIIT Pets (7,349 images, 37 breeds) with bounding boxes derived from segmentation masks, and compared clean training vs synthetic occlusion training. Training split: 3,312 images; validation: 368 images; clean test: 3,669 images. Evaluated transfer to a real occluded test set of 74 manually annotated images (personal + open-source), measuring COCO-style metrics.

Evaluate whether synthetic occlusion augmentation transfers to real-world indoor occlusions, and how different detector families behave under partial visibility when hyperparameters/architectures are held constant.

Your Role

What I Built

End-to-end training + evaluation pipeline (data prep, training runs, metrics, and qualitative analysis)
Synthetic occlusion generator: overlay ADE20K segmented indoor objects onto pet images with opacity blending
Dataset engineering: derive bounding boxes from segmentation masks and unify cat/dog breeds into a single “Pet” class
Real occluded indoor test set: collection + manual annotation + evaluation scripts

What I Owned End-to-End

Experiment design comparing clean vs occlusion-aware training under fixed training budgets
Occlusion augmentation realism controls (object-category filtering, minimum size thresholds, random placement with guaranteed overlap %)
Cross-model evaluation and analysis for YOLOv8, EfficientDet-D0, and DETR (including explaining failure modes)
Reporting: quantitative mAP tables + qualitative examples on real occlusions

Technical Highlights

Architecture Decisions

YOLOv8 trained with pretrained weights (single-stage detector via Ultralytics)
EfficientDet-D0 with pretrained initialization (EfficientNet backbone + BiFPN multi-scale fusion)
DETR explored (transformer-based set prediction); evaluated but excluded from further runs due to poor convergence under the project budget
No architecture changes applied to isolate the effect of occlusion-aware training

Algorithms / Protocols / Constraints

COCO-style evaluation on the real occluded set: mAP@0.50 and mAP@[0.50:0.95] (plus precision/recall where applicable)
Synthetic occlusion augmentation: segmented ADE20K objects overlaid at random locations; ensured a minimum overlap percentage
Indoor-object filtering with some leakage due to large category volume (pragmatic filtering)
Annotation consistency: boxes derived directly from Oxford-IIIT segmentation masks

Optimization Strategies

Pretrained initialization to stabilize convergence under limited compute budgets
Kept hyperparameters fixed to focus conclusions on the augmentation effect, not tuning
Selected checkpoints based on validation performance (clean validation set) to avoid test leakage

Tech Stack

PythonPyTorchUltralytics (YOLOv8)EfficientDetDETRCOCO Evaluation MetricsKaggle GPU

Results / Learnings

What Worked

Built and evaluated an occlusion-aware training pipeline combining Oxford-IIIT Pets + ADE20K synthetic occlusions
Created a real occluded indoor test set (74 images) to measure transfer under realistic household occlusions
YOLOv8 on real occluded set: Clean Training mAP@0.50=0.471, mAP@[0.50:0.95]=0.229; Occlusion-Aware Training mAP@0.50=0.345, mAP@[0.50:0.95]=0.152
EfficientDet-D0 on real occluded set: Clean Training mAP@0.50=0.098, mAP@[0.50:0.95]=0.036; Occlusion-Aware Training mAP@0.50=0.112, mAP@[0.50:0.95]=0.043
DETR on real occluded set: mAP@0.50=0.079, mAP@[0.50:0.95]=0.016 (poor convergence under the available training budget)

What I Learned

Synthetic occlusion augmentation can improve clean/validation metrics but may not transfer to real occlusion geometry/lighting
YOLOv8 was more robust to partial cues (e.g., faces/ears) even without explicit occlusion-aware training
EfficientDet improved slightly with occlusion-aware training on the real occluded set, but absolute localization quality remained low
Transformer-based detectors can be impractical without longer training schedules and/or larger datasets

Tradeoffs Considered

Prioritized isolating augmentation effects (fixed architectures/hyperparams) over maximizing peak performance
Used a small real occluded test set for realism, trading off statistical generalization
Synthetic occlusions provide controlled coverage but introduce a domain gap vs real household occlusion patterns

← View All Projects

← Distributed Fault-Tolerant IRC-Style Chat System Genetic Algorithm Hyperparameter Tuning Framework →