
According to the Microsoft Azure AI Fundamentals (AI-900) official study guide and Microsoft Learn module “Identify features of Computer Vision workloads on Azure”, Object Detection is a specific computer vision capability used to identify and locate multiple types of objects within a single image. Unlike image classification, which assigns one label to an entire image, object detection identifies individual objects, their categories, and their positions using bounding boxes or polygons.
In practical terms, Object Detection combines two key outputs:
Classification – recognizing what the object is (for example, “car”, “person”, “dog”).
Localization – determining where the object appears in the image by drawing bounding boxes around it.
This technology is commonly used in scenarios such as traffic monitoring (detecting vehicles and pedestrians), retail shelf analysis (detecting products and inventory levels), and manufacturing quality control (identifying defective parts).
Microsoft’s Azure Cognitive Services – Custom Vision includes a dedicated Object Detection domain, which allows developers to train custom models to recognize multiple object types within a single image. The service uses deep learning techniques, particularly convolutional neural networks (CNNs), to process pixel patterns and spatial relationships for accurate detection.
For contrast:
Image Classification identifies only the overall category of an image (e.g., “This is a cat”).
Image Description generates captions summarizing the visual content (e.g., “A cat sitting on a couch”).
Optical Character Recognition (OCR) detects and extracts text from images, not physical objects.
Therefore, per the official AI-900 learning content and Azure documentation, when the goal is to identify multiple types of items within a single image, the correct AI workload is Object Detection.