Prudence Pitch

13:34 27/11/2019 |

Total post : 1,458

Amazon scientists train an NLP model to select answers to questions from a set of answer candidates better than a baseline

(Tech) They say their Transfer and Adapt (TANDA) approach, which builds on Google’s Transformer, can be effectively adapted to new domains with a small amount of training data while achieving higher accuracy than traditional techniques.



By way of refresher, Transformers are a type of neural architecture introduced in a paper coauthored by researchers at Google Brain, Google’s AI research division. As do all deep neural networks, they contain functions (neurons) arranged in interconnected layers that transmit signals from input data and slowly adjust the synaptic strength (weights) of each connection. That’s how all AI models extract features and learn to make predictions, but Transformers uniquely have attention such that every output element is connected to every input element. The weightings between them are calculated dynamically, in effect.

TANDA, then, is a two-part training methodology that (1) adapts the Transformer model to a question-answering task and (2) tailors it to specific types of questions and answers. A large-scale, general-purpose data set - Answer Sentence Natural Questions, or ASNQ - is used to prime the system, after which a fine-tuning step adapts it to a target domain. As the researchers explain, ASNQ - which is derived from the Google Natural Questions data set - is much larger in size than existing corpora of its kind, with 57,242 questions in a set used for training the AI models and 2,672 questions in a validation set. And it contains negative examples in addition to positive examples, which help the model learn to identify best answers to given questions out of similar but incorrect ones.

To validate their approach, the Amazon researchers first tapped two popular NLP frameworks - Google’s BERT and Facebook’s RoBERTa - and measured accuracy with mean average precision and mean reciprocal recall, using the entire set of candidates for each question. They report that both the BERT and RoBERTa models with fine-tuning on TANDA provide a large improvement over state of the art, and that they’re an order of magnitude less affected by the insertion of noisy data.

In a second experiment, the team built four different corpora with questions sampled from Alexa customers’ interactions. They say that using TANDA with the aforementioned RoBERTa produces an even higher improvement than with BERT, and that TANDA remains robust against noise.


Post new