4 years ago1,000+ Views
"Google engineers have trained a machine learning algorithm to write picture captions using the same techniques it developed for language translation." (via MIT Tech Review) Woah. They say a picture is worth a thousand words...and it looks like Google has found an easier way to come up with them. This new piece of technology will affect how search engines work in the future. If Google is able to correctly caption images online, can you imagine how much more accurate search results will be? So, how does it work? Google's translation technology translates sentences using vectors and by observing how often words are next to each other. They make the assumption that, regardless of language, specific words have the same relationship to each other. The algorithm takes an image, generates a vector with the relationship between the words that describe the image, and this vector is plugged into their existing translation algorithm to produce the caption. Then, the captions are evaluated by humans to check their accuracy. The project is called Neural Image Caption. I admit, vector space mathematics goes over my head, but this is pretty impressive. Has anyone else heard about this?
I've never heard of this, but it's incredibly impressive: I wonder how this would work from language to language, or from pictures from another culture. Isn't it more likely, then, that the picture might should something different than expected? I'm sure Google can learn, but it might be very interesting to see what it comes up with at first.
@nehapatel That's a good question. I think they can find a way to make it work since they already can translate different languages (not perfectly, but it's decent). But I agree, this might be a big thing to tackle! Cultural significance might be hard to grasp by just identifying a picture...