Microsoft AI & Research today shared what it called the largest Transformer-based language generation model, and open-sourced a deep learning library called DeepSpeed to simplify distributed training of large models.
Turing NLG's parameters are as high as 17 billion, which is Megatron NVIDIAIs now the second largest Transformer model, and its parameters are OpenAI's GPT-2. Turing NLG has reached the latest level in a series of NLP tasks.
Like Google's Meena Originally using GPT-2, initially Turing NLG could only be shared in private demos.
A language generation model with Transformer architecture can predict the next word. They can be used to write stories, generate answers in complete sentences, and summarize text.
Experts from the field of AI tell VentureBeat 2019 is a groundbreaking year for NLP models using the Transformer architecture, and this approach has led to advancements in language generation and GLUE benchmarking leaders, such as Facebook's ROBERTa, Google's XLNetwith Microsoft's MT-DNN.
Also today: Microsoft open sourced DeepSpeed, a deep learning library that has been optimized for developers to provide low latency, high throughput inferences.
DeepSpeed includes a Zero Redundancy Optimizer (ZeRO) for large-scale training of models with 100 million or more parameters, which Microsoft has used in the past to train Turing NLG.
Corby Rosset, a Microsoft research application scientist at Microsoft, wrote: "In addition to saving user time by aggregating documents and emails, T-NLG can also help authors by writing and answer questions that readers may have about documents. Questions to enhance the Microsoft Office suite experience. " Blog post Nowadays.
Developers and machine learning practitioners can use DeepSpeed and ZeRO, as training large networks (such as those using the Transformer architecture) can be expensive and can run into large-scale problems.
Among other natural language AI news, Google ’s DeepMind releases compression transformers today Remote memory model and PG19 (benchmark for analyzing book-length language generation performance).