Tech giant Microsoft developed an innovative image-captioning algorithm that exceeds human accuracy in specific limited tests. The AI system is used to update Microsoft’s assistant app for the visually impaired and embedded in Microsoft tools like Word, Outlook, and PowerPoint.
In these tools, the system will be used for tasks like creating alt-text for images. “Ideally, everyone includes alt text across all images in documents, be it in websites or social media. This feature enables visually handicapped people to access the content and participate in conversations,” stated Saqib Shaikh, a software engineering manager with Microsoft’s AI team. “However, people don’t indulge. So, there are applications that use the image captioning feature to fill in alt text wherever it’s missing.”
Microsoft’s Image-captioning Caption Bot
These apps include Microsoft’s own Seeing AI, which was first launched in 2017. The Seeing AI app uses computer vision to describe the world to visually impaired individuals. It can detect household items, read and scan text, describe scenes, and even identify familiar people. The app can also be used to represent images in other applications.
The new image-captioning algorithm by Microsoft will improve the performance of the Seeing AI app significantly. The algorithm not only identifies objects but also describes the correlation between them to the individual. It can look at a picture and say what items and objects it has and how they interact. According to Microsoft, the algorithm is twice as good as the earlier AI-captioning system.
Image Captioning Algorithm
The algorithm was reported to have achieved the highest ever scores on an image-captioning benchmark named “no caps.” The report listed an industry-leading scoreboard for image captioning. The no caps benchmark has 166,000+ human-generated captions explaining around 15,100 photos taken from the Open Images Dataset. These images vary from categories like people, food, scenes, and many more. The algorithms were tested on their capability to create captions for the pictures. Although Microsoft stated that the new algorithm “describes images, as well as people, do,” this is only true in cases where the feature is applied to a particular section of images.
Harsh Agrawal, one of the no-cap benchmark creators, stated, “Surpassing human performance on the no caps benchmark is not an indicator that image captioning can be termed as a solved problem.” He also noted that the metrics used to evaluate performance on no caps “roughly relate to human preferences,” and the benchmark “only covers a particular percentage of all possible visual concepts.” “Like every benchmark, [the] no caps benchmark too, is a rough indicator of the performance of the applications on the task,” stated Argawal.
Image captioning is a task that has witnessed massive improvement over the years due to the advancement in artificial intelligence and Microsoft’s algorithms state-of-the-art infrastructures. Apart from integrating the Seeing AI application in Microsoft tools like Word, Outlook, and PowerPoint, the image-captioning AI feature will also be available to users as a standalone model through Microsoft’s cloud service and AI platform Azure.