Patent reveals the technology behind Microsoft's Captionbot


Captionbot is powered by Microsoft’s Cognitive Services. It analyses the images and gives rudimentary descriptions of what it can see using a Computer Vision API, an Emotion API and a Bing Image API. 


The patent reveals that the system has a set of information modules and a set of sentence generation modules. The set of information modules includes individual information modules configured to operate on an image or metadata associated with the image to produce image information. The set of sentence generation modules includes individual sentence generation modules configured to operate on the image information to produce a sentence caption for the image. A “sentence” for captioning an image can mean a sentence fragment, a complete sentence, and/or multiple sentences. 

As demonstrated by many online users, the system can fail miserably. 

Image Credit - twitter/David Sim
Image Credit – twitter/David Sim


Image Credit - twitter/Richard Gadsden
Image Credit – twitter/Richard Gadsden

Yet it is a huge leap forward for AI driver applications.




Patent Information
Publication number: US 9317531
Patent Title: Autocaptioning of images
Publication date: 19 Apr 2016
Filing date: 18 Oct 2012
Inventors: Simon Baker; Krishnan Ramnath;
Applicant: icrosoft Technology Licensing, LLC