Prudence Pitch

13:13 08/10/2019 | 7newstar.com

Total post : 1,195

Google Brain investigates an abstractive summarization system dubbed SummAE that is able to generalize from a small amount of training data to unseen textual examples

(Tech) While it couldn’t summarize beyond single five-sentence paragraphs, the researchers claim it significantly improves upon the baseline and represents a major step in the direction of human-level performance. The data set and code are freely available on GitHub, along with the configuration settings for the best model.

 

 

SummAE contains a denoising autoencoder that encodes sentences and paragraphs of the target text in a shared space. Guided by a decoder whose input is prepended with a token signaling whether to decode a sentence or a paragraph, the system generates summaries by decoding each sentence from the encoded paragraphs.

The researchers discovered that most traditional approaches to training the auto-encoder resulted in long, multi-sentence summaries. To encourage it to learn higher-level concepts disentangled from their original expression, the team employed two denoising approaches that increased the number of training examples substantially. They also experimented with an adversarial critic component that could distinguish between sentences and paragraphs, in addition to two pretraining tasks that encouraged the encoder to learn how sentences narratively followed within a paragraph.

The researchers trained three different variations of SummAE on the ROCStories, a corpus of self-contained, diverse, non-technical, and concise prose. They split the original 98,159 training stories into three separate collections and collected three human summaries each for 500 validation examples and 500 test examples.

After 100,000 training steps with pretraining, the team reports that the best model significantly outperformed a baseline extractive sentence generator on the Recall-Oriented Understudy for Gisting Evaluation (ROUGE), a set of metrics devised to evaluate automatic summarization.

Comment


Post new