핑퐁팀 ML 세미나, 그 다섯 번째

핑퐁 ML 리서치 사이언티스트들과 엔지니어들의 시즌 5 세미나 자료
Jul 28, 2020
핑퐁팀 ML 세미나, 그 다섯 번째

구상준 김준성 박채훈 백영민 서상우 이주홍 장성보 정다운 정욱재 홍승환 | 2020년 07월 28일 | #Machine_Learning

안녕하십니까? 여러가지로 혼란스러웠던 2020년도 반 넘게 지나가고 있습니다. 저희 사이언티스트들과 엔지니어들의 시즌 5 세미나 발표자료를 갈무리하여 올립니다.

본 세미나는 2020년 4월, 5월 동안 매주 진행되었으며, 지난 시즌과 마찬가지로 주제의 제한 없이 자유롭게 시작했습니다. 저번 시즌과 마찬가지로 Transformer를 많이 다루었는데 ‘어떻게 Transformer를 잘 활용할 수 있을까?’에 대한 답을 찾는 과정에서 비롯되었습니다. 또한 이번에는 엔지니어 두 분께서 함께 참여해주시어 ‘어떻게 하면 거대한 Transformer 모델을 성능을 떨어뜨리지 않고 잘 줄일 수 있을까?’에 대한 실증적인 논문들을 많이 발표해주시어 더 유익한 다섯 번째 시즌을 보냈습니다.

Dialogue Natural Language Inference (장성보)

  • Dialogue Natural Language Inference

    • Written by Sean Welleck et al. @ New York University & Facebook AI Research

    • Published @ ACL 2019

https://speakerdeck.com/scatterlab/dialogue-natural-language-inference

Unified Language Model Pre-training for Natural Language Understanding and Generation (서상우)

  • Unified Language Model Pre-training for Natural Language Understanding and Generation

    • Written by Li Dong et al. @ Microsoft Research

    • Published @ NeurIPS 2019

https://speakerdeck.com/scatterlab/unified-language-model-pre-training-for-natural-language-understanding-and-generation

NLP 모델의 수치해석 능력 (백영민)

  • Do NLP Models Know Numbers? Probing Numeracy in Embeddings

    • Written by Eric Wallace et al. @ Allen Institute for AI, Peking University & University of California, Irvine

    • Published @ EMNLP 2019

  • Injecting Numerical Reasoning Skills into Language Models

    • Written by Mor Geva et al. @ Tel Aviv University & Allen Institute for AI

    • Published @ ACL 2020

  • Deep Learning for Symbolic Mathematics

    • Written by Guillaume Lample and Francois Charton @ Facebook AI Research

    • Published @ ICLR 2020

https://speakerdeck.com/scatterlab/numerical-reasoning-with-nlp-models

Knowledge Distillation for BERT (정욱재)

  • DynaBERT: Dynamic BERT with Adaptive Width and Depth

    • Written by Lu Hou et al. @ Huawei Noah’s Ark Lab

    • Preprinted in arXiv 2020

  • FastBERT: a Self-distilling BERT with Adaptive Inference Time

    • Written by Weijie Liu et al. @ Peking University, Tencent Research & Beijing Normal University

    • Published @ ACL 2020

  • DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

    • Written by Victor Sanh et al. @ Hugging Face

    • Published @ 5th Workshop on Energy Efficient Machine_Learning and Cognitive Computing - NeurIPS 2019

  • Patient Knowledge Distillation for BERT Model Compression

    • Written by Siqi Sun et al. @ Microsoft Dynamics 365 AI Research

    • Published @ EMNLP 2019

https://speakerdeck.com/scatterlab/knowledge-distillation-for-bert

8-Bit Quantization of Transformer Model (정욱재)

  • Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model

    • Written by Aishwarya Bhandare et al. @ Artificial intelligence Products group, Intel Corp.

    • Published @ Joint Workshop on On-Device Machine_Learning & Compact Deep Neural Network Representations - ICML 2019

https://speakerdeck.com/scatterlab/8-bit-quantization-of-transformer-model

Generalized ODIN: Detecting Out-of-distribution Image without Learning from Out-of-distribution Data (김준성)

  • Generalized ODIN: Detecting Out-of-distribution Image without Learning from Out-of-distribution Data

    • Written by Yen-Chang Hsu et al. @ Georgia Institute of Technology & Samsung Research America

    • Published @ CVPR 2020

https://speakerdeck.com/scatterlab/generalized-odin

SYNTHESIZER: Rethinking Self-Attention in Transformer Models (정다운)

  • SYNTHESIZER: Rethinking Self-Attention in Transformer Models

    • Written by Yi Tay et al. @ Google Research Mountain View

    • Preprinted in arXiv 2020

https://speakerdeck.com/scatterlab/synthesizer-rethinking-self-attention-in-transformer-models

Byte-level BPE: Neural Machine Translation with Byte-Level Subwords (이주홍)

  • Neural Machine Translation with Byte-Level Subwords

    • Written by Changhan Wang et al. @ Facebook AI Research, New York University & CIFAR Global Scholar

    • Preprinted in arXiv 2019

https://speakerdeck.com/scatterlab/neural-machine-translation-with-byte-level-subwords

Pruning Basics on Multi Head Attention-based Models (홍승환)

  • Are Sixteen Heads Really Better than One?

    • Written by Paul Michel et al. @ Carnegie Mellon University & Facebook AI Research

    • Published @ NeurIPS 2020

  • Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

    • Written by Elena Voita et al. @ Yandex, University of Amsterdam, University of Edinburgh, University of Zurich & Moscow Institute of Physics and Technology

    • Published @ ACL 2019

  • Reducing Transformer Depth on Demand with Structured Dropout

    • Written by Angela Fan et al. @ Facebook AI Research

    • Published @ ICLR 2020

https://speakerdeck.com/scatterlab/pruning-basics-on-multi-head-attention-based-models

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (박채훈)

  • Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

    • Colin Raffel et al. @ Google

    • Preprinted in arXiv 2019

https://speakerdeck.com/scatterlab/exploring-the-limits-of-transfer-learning-with-unified-text-to-text-transformer

What can neural networks reason about? (구상준)

  • What can neural networks reason about?

    • Keyulu Xu et al. @ Massachusetts Institute of Technology, University of Maryland, Institute for Advanced Study & National Institute of Informatics

    • Published @ ICLR 2020

https://speakerdeck.com/scatterlab/what-can-neural-networks-reason-about

마치며

2020년 봄에 진행되었던 머신러닝 세미나 자료를 공유해보았습니다. 비록 각 발표자료의 주제는 다르지만 Transformer가 가진 위력과 이 위력을 제대로 발휘하기 위해서는 어떤 노력이 뒤따라야하는지 보여주었다고 요약할 수 있겠습니다.

저희는 계속해서 사람보다 더 사람다운 인공지능을 만드는 데 온 힘을 다하고 있으며 앞으로도 그 노력의 결실을 나눌 수 있기를 바랍니다. 감사합니다.

Share article
스캐터랩에 관심이 있으신가요?
구독해주시면 새로운 블로그와 소식이 있을 때 알려드릴게요 :)
Privacy Policy

Scatter Lab