A few industry friends inquired me how to choose visual recognition APIs for enhancing their analysis capability in the image/video products. It’s hard to tell as there is no official benchmarks across the different industry providers such as Google, IBM, Microsoft, Clarifai, Amazon, etc. Meanwhile, each vendor is working very hard on their own way to improve their capabilities. It is too early to tell who is the winner now as there are also many factors paramerizing the enterprise solutions. The competition is heating up!
Recently, I just came across this blog, “How we chose our image recognition API,” reporting the benchmarks among certain visual recognition APIs. It is definitely not a rigorous test but delivers some rough messages.
The test set is from “Caltech 101”. It shows that IBM has the #1 accuracy (in terms of avg. matching count of the ground truth), #2 in response time, 1.18sec vs. 1.12sec (w/ Google), and the greatest confidence score in the correct matching tags. You can refer to the blog for more details.
Note that I am now having my sabbatical leave in IBM TJ Watson Research Center, New York, as the Visiting Scientist in Cognitive Computing. I am working in the same computer vision department who are developing the core for the visual recognition APIs. However, all my comments are based on the publicly available data.
目前在IBM TJ Watson研究中心休假研究,剛好在Watson AI的電腦視覺部門。同事們負責IBM影像辨識核心,應用了各種方法努力提昇辨識正確率並且大量增加可以辨識的物件種類情境。剛好看到有人做了對各家影像辨識API效能的評測。可以參考看看。
另外這些各家的API都有一些額度可以免費使用,鼓勵大家上去玩玩看!
https://www.youtube.com/watch?v=F6oaA6fauzY