← Back to Projects
Computer VisionObject DetectionOcclusion RobustnessYOLOv8EfficientDetDETRPyTorchCOCO mAPData Augmentation

Occluded Pet Detection in Domestic Environments

Studied whether occlusion-aware training improves indoor cat/dog detection under partial visibility using synthetic ADE20K indoor-object occlusions and a real occluded test set; evaluated YOLOv8, EfficientDet-D0, and DETR with COCO-style mAP.

Overview

Object detection study focused on domestic indoor settings where pets are frequently partially visible (occluded by furniture/household objects). Used Oxford-IIIT Pets (7,349 images, 37 breeds) with bounding boxes derived from segmentation masks, and compared clean training vs synthetic occlusion training. Training split: 3,312 images; validation: 368 images; clean test: 3,669 images. Evaluated transfer to a real occluded test set of 74 manually annotated images (personal + open-source), measuring COCO-style metrics.

Evaluate whether synthetic occlusion augmentation transfers to real-world indoor occlusions, and how different detector families behave under partial visibility when hyperparameters/architectures are held constant.

Your Role

What I Built

  • End-to-end training + evaluation pipeline (data prep, training runs, metrics, and qualitative analysis)
  • Synthetic occlusion generator: overlay ADE20K segmented indoor objects onto pet images with opacity blending
  • Dataset engineering: derive bounding boxes from segmentation masks and unify cat/dog breeds into a single “Pet” class
  • Real occluded indoor test set: collection + manual annotation + evaluation scripts

What I Owned End-to-End

  • Experiment design comparing clean vs occlusion-aware training under fixed training budgets
  • Occlusion augmentation realism controls (object-category filtering, minimum size thresholds, random placement with guaranteed overlap %)
  • Cross-model evaluation and analysis for YOLOv8, EfficientDet-D0, and DETR (including explaining failure modes)
  • Reporting: quantitative mAP tables + qualitative examples on real occlusions

Technical Highlights

Architecture Decisions

  • YOLOv8 trained with pretrained weights (single-stage detector via Ultralytics)
  • EfficientDet-D0 with pretrained initialization (EfficientNet backbone + BiFPN multi-scale fusion)
  • DETR explored (transformer-based set prediction); evaluated but excluded from further runs due to poor convergence under the project budget
  • No architecture changes applied to isolate the effect of occlusion-aware training

Algorithms / Protocols / Constraints

  • COCO-style evaluation on the real occluded set: mAP@0.50 and mAP@[0.50:0.95] (plus precision/recall where applicable)
  • Synthetic occlusion augmentation: segmented ADE20K objects overlaid at random locations; ensured a minimum overlap percentage
  • Indoor-object filtering with some leakage due to large category volume (pragmatic filtering)
  • Annotation consistency: boxes derived directly from Oxford-IIIT segmentation masks

Optimization Strategies

  • Pretrained initialization to stabilize convergence under limited compute budgets
  • Kept hyperparameters fixed to focus conclusions on the augmentation effect, not tuning
  • Selected checkpoints based on validation performance (clean validation set) to avoid test leakage

Tech Stack

PythonPyTorchUltralytics (YOLOv8)EfficientDetDETRCOCO Evaluation MetricsKaggle GPU

Results / Learnings

What Worked

  • Built and evaluated an occlusion-aware training pipeline combining Oxford-IIIT Pets + ADE20K synthetic occlusions
  • Created a real occluded indoor test set (74 images) to measure transfer under realistic household occlusions
  • YOLOv8 on real occluded set: Clean Training mAP@0.50=0.471, mAP@[0.50:0.95]=0.229; Occlusion-Aware Training mAP@0.50=0.345, mAP@[0.50:0.95]=0.152
  • EfficientDet-D0 on real occluded set: Clean Training mAP@0.50=0.098, mAP@[0.50:0.95]=0.036; Occlusion-Aware Training mAP@0.50=0.112, mAP@[0.50:0.95]=0.043
  • DETR on real occluded set: mAP@0.50=0.079, mAP@[0.50:0.95]=0.016 (poor convergence under the available training budget)

What I Learned

  • Synthetic occlusion augmentation can improve clean/validation metrics but may not transfer to real occlusion geometry/lighting
  • YOLOv8 was more robust to partial cues (e.g., faces/ears) even without explicit occlusion-aware training
  • EfficientDet improved slightly with occlusion-aware training on the real occluded set, but absolute localization quality remained low
  • Transformer-based detectors can be impractical without longer training schedules and/or larger datasets

Tradeoffs Considered

  • Prioritized isolating augmentation effects (fixed architectures/hyperparams) over maximizing peak performance
  • Used a small real occluded test set for realism, trading off statistical generalization
  • Synthetic occlusions provide controlled coverage but introduce a domain gap vs real household occlusion patterns