Computers can be trained to recognize typical participants in action events in written language, such as instruments – “knife” is a typical instrument of “cutting”. “Cake” is also something that is typically cut (a “patient” of “cutting”).
Humans are able to identify “cooking” as an action that typically takes place in a “kitchen”, or that a “kitchen” is for cooking. The problem is, other things can happen in a kitchen. Computers have particular difficulty with identifying typical location-action relationships in language – in existing work that mainly relies on machine learning over large volumes of text.
This project is about exploring ways to enhance typical location prediction through machine learning techniques (e.g. deep learning, reinforcement learning, etc) by adding image data to textual data and is part of an international collaboration with Prof. Vera Demberg at Saarland University, Germany.