구상준 김준성 박채훈 백영민 서상우 이주홍 장성보 정다운 정욱재 홍승환 | 2020년 07월 28일 | #Machine_Learning
안녕하십니까? 여러가지로 혼란스러웠던 2020년도 반 넘게 지나가고 있습니다. 저희 사이언티스트들과 엔지니어들의 시즌 5 세미나 발표자료를 갈무리하여 올립니다.
본 세미나는 2020년 4월, 5월 동안 매주 진행되었으며, 지난 시즌과 마찬가지로 주제의 제한 없이 자유롭게 시작했습니다. 저번 시즌과 마찬가지로 Transformer를 많이 다루었는데 ‘어떻게 Transformer를 잘 활용할 수 있을까?’에 대한 답을 찾는 과정에서 비롯되었습니다. 또한 이번에는 엔지니어 두 분께서 함께 참여해주시어 ‘어떻게 하면 거대한 Transformer 모델을 성능을 떨어뜨리지 않고 잘 줄일 수 있을까?’에 대한 실증적인 논문들을 많이 발표해주시어 더 유익한 다섯 번째 시즌을 보냈습니다.
Dialogue Natural Language Inference (장성보)
Dialogue Natural Language Inference
Written by Sean Welleck et al. @ New York University & Facebook AI Research
Published @ ACL 2019
https://speakerdeck.com/scatterlab/dialogue-natural-language-inference
Unified Language Model Pre-training for Natural Language Understanding and Generation (서상우)
Unified Language Model Pre-training for Natural Language Understanding and Generation
Written by Li Dong et al. @ Microsoft Research
Published @ NeurIPS 2019
NLP 모델의 수치해석 능력 (백영민)
Do NLP Models Know Numbers? Probing Numeracy in Embeddings
Written by Eric Wallace et al. @ Allen Institute for AI, Peking University & University of California, Irvine
Published @ EMNLP 2019
Injecting Numerical Reasoning Skills into Language Models
Written by Mor Geva et al. @ Tel Aviv University & Allen Institute for AI
Published @ ACL 2020
Deep Learning for Symbolic Mathematics
Written by Guillaume Lample and Francois Charton @ Facebook AI Research
Published @ ICLR 2020
https://speakerdeck.com/scatterlab/numerical-reasoning-with-nlp-models
Knowledge Distillation for BERT (정욱재)
DynaBERT: Dynamic BERT with Adaptive Width and Depth
Written by Lu Hou et al. @ Huawei Noah’s Ark Lab
Preprinted in arXiv 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time
Written by Weijie Liu et al. @ Peking University, Tencent Research & Beijing Normal University
Published @ ACL 2020
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Written by Victor Sanh et al. @ Hugging Face
Published @ 5th Workshop on Energy Efficient Machine_Learning and Cognitive Computing - NeurIPS 2019
Patient Knowledge Distillation for BERT Model Compression
Written by Siqi Sun et al. @ Microsoft Dynamics 365 AI Research
Published @ EMNLP 2019
https://speakerdeck.com/scatterlab/knowledge-distillation-for-bert
8-Bit Quantization of Transformer Model (정욱재)
Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model
Written by Aishwarya Bhandare et al. @ Artificial intelligence Products group, Intel Corp.
Published @ Joint Workshop on On-Device Machine_Learning & Compact Deep Neural Network Representations - ICML 2019
https://speakerdeck.com/scatterlab/8-bit-quantization-of-transformer-model
Generalized ODIN: Detecting Out-of-distribution Image without Learning from Out-of-distribution Data (김준성)
Generalized ODIN: Detecting Out-of-distribution Image without Learning from Out-of-distribution Data
Written by Yen-Chang Hsu et al. @ Georgia Institute of Technology & Samsung Research America
Published @ CVPR 2020
https://speakerdeck.com/scatterlab/generalized-odin
SYNTHESIZER: Rethinking Self-Attention in Transformer Models (정다운)
SYNTHESIZER: Rethinking Self-Attention in Transformer Models
Written by Yi Tay et al. @ Google Research Mountain View
Preprinted in arXiv 2020
https://speakerdeck.com/scatterlab/synthesizer-rethinking-self-attention-in-transformer-models
Byte-level BPE: Neural Machine Translation with Byte-Level Subwords (이주홍)
Neural Machine Translation with Byte-Level Subwords
Written by Changhan Wang et al. @ Facebook AI Research, New York University & CIFAR Global Scholar
Preprinted in arXiv 2019
https://speakerdeck.com/scatterlab/neural-machine-translation-with-byte-level-subwords
Pruning Basics on Multi Head Attention-based Models (홍승환)
Are Sixteen Heads Really Better than One?
Written by Paul Michel et al. @ Carnegie Mellon University & Facebook AI Research
Published @ NeurIPS 2020
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Written by Elena Voita et al. @ Yandex, University of Amsterdam, University of Edinburgh, University of Zurich & Moscow Institute of Physics and Technology
Published @ ACL 2019
Reducing Transformer Depth on Demand with Structured Dropout
Written by Angela Fan et al. @ Facebook AI Research
Published @ ICLR 2020
https://speakerdeck.com/scatterlab/pruning-basics-on-multi-head-attention-based-models
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (박채훈)
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel et al. @ Google
Preprinted in arXiv 2019
What can neural networks reason about? (구상준)
What can neural networks reason about?
Keyulu Xu et al. @ Massachusetts Institute of Technology, University of Maryland, Institute for Advanced Study & National Institute of Informatics
Published @ ICLR 2020
https://speakerdeck.com/scatterlab/what-can-neural-networks-reason-about
마치며
2020년 봄에 진행되었던 머신러닝 세미나 자료를 공유해보았습니다. 비록 각 발표자료의 주제는 다르지만 Transformer가 가진 위력과 이 위력을 제대로 발휘하기 위해서는 어떤 노력이 뒤따라야하는지 보여주었다고 요약할 수 있겠습니다.
저희는 계속해서 사람보다 더 사람다운 인공지능을 만드는 데 온 힘을 다하고 있으며 앞으로도 그 노력의 결실을 나눌 수 있기를 바랍니다. 감사합니다.
구독해주시면 새로운 블로그와 소식이 있을 때 알려드릴게요 :)