Mapping instructions to actions
An agent following instructions requires a robust understanding of language and its environment. In this talk I will describe two approaches to address the problem of mapping instructions to actions. First, I will describe a semantic parsing approach, where language is mapped to an intermediate formal representation. This method allows to explicitly model context-dependent phenomena, observe what representations the system recovers, and leverage expert knowledge. In the second part, I will propose a neural network method that jointly reasons about instructions and raw visual input obtained from a camera sensor. This approach does not require intermediate representations, planning procedures, or training different models for visual and language reasoning. Training uses reinforcement learning in a few-samples regime with reward shaping to exploit training data. While the two approaches address a similar problem, they pose different challenges and provide different advantages.
Bio:
Yoav Artzi is an Assistant Professor in the Department of Computer Science and Cornell Tech at Cornell University. His research focuses on learning expressive models for natural language understanding, most recently in situated interactive scenarios. He received the best paper award in EMNLP 2015 and a Google faculty award. Yoav holds a B.Sc. summa cum laude from Tel Aviv University and a Ph.D. from the University of Washington.