ykilcher

83 Followers

58:22

Sparse Expert Models (Switch Transformers, GLAM, and more... w/ the Authors)

#nlp #sparsity #transformers This video is an interview with Barret Zoph and William Fedus of Google Brain about Sparse Expert Models. Sparse Expert models have been hugely successful at distributing parts of models, mostly Transformers, across large array of machines and use a routing function to effectively route signals between them. This means that even though these models have a huge number of parameters, the computational load for a given signal does not increase because the model is only sparsely activated. Sparse expert models, such as Switch Transformers and GLAM can scale up to trillions of parameters and bring a number of desirable properties. We discuss everything from the fundamentals, history, strengths and weaknesses, up to the current state of the art of these models. OUTLINE: 0:00 - Intro 0:30 - What are sparse expert models? 4:25 - Start of Interview 5:55 - What do you mean by sparse experts? 8:10 - How does routing work in these models? 12:10 - What is the history of sparse experts? 14:45 - What does an individual expert learn? 19:25 - When are these models appropriate? 22:30 - How comparable are sparse to dense models? 26:30 - How does the pathways system connect to this? 28:45 - What improvements did GLAM make? 31:30 - The "designing sparse experts" paper 37:45 - Can experts be frozen during training? 41:20 - Can the routing function be improved? 47:15 - Can experts be distributed beyond data centers? 50:20 - Are there sparse experts for other domains than NLP? 52:15 - Are sparse and dense models in competition? 53:35 - Where do we go from here? 56:30 - How can people get started with this? Papers: Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (https://arxiv.org/abs/2101.03961) GLaM: Efficient Scaling of Language Models with Mixture-of-Experts (https://arxiv.org/abs/2112.06905) Designing Effective Sparse Expert Models (https://arxiv.org/abs/2202.08906) Links: Merch: store.ykilcher.com TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

2 years ago 8 views

43:03

Author Interview - Transformer Memory as a Differentiable Search Index

#neuralsearch #interview #google This is an interview with the authors Yi Tay and Don Metzler. Paper Review Video: https://youtu.be/qlB0TPBQ7YY Search engines work by building an index and then looking up things in it. Usually, that index is a separate data structure. In keyword search, we build and store reverse indices. In neural search, we build nearest-neighbor indices. This paper does something different: It directly trains a Transformer to return the ID of the most relevant document. No similarity search over embeddings or anything like this is performed, and no external data structure is needed, as the entire index is essentially captured by the model's weights. The paper experiments with various ways of representing documents and training the system, which works surprisingly well! OUTLINE: 0:00 - Intro 0:50 - Start of Interview 1:30 - How did this idea start? 4:30 - How does memorization play into this? 5:50 - Why did you not compare to cross-encoders? 7:50 - Instead of the ID, could one reproduce the document itself? 10:50 - Passages vs documents 12:00 - Where can this model be applied? 14:25 - Can we make this work on large collections? 19:20 - What's up with the NQ100K dataset? 23:55 - What is going on inside these models? 28:30 - What's the smallest scale to obtain meaningful results? 30:15 - Investigating the document identifiers 34:45 - What's the end goal? 38:40 - What are the hardest problems currently? 40:40 - Final comments & how to get started Paper: https://arxiv.org/abs/2202.06991 Abstract: In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all information about the corpus is encoded in the parameters of the model. To this end, we introduce the Differentiable Search Index (DSI), a new paradigm that learns a text-to-text model that maps string queries directly to relevant docids; in other words, a DSI model answers queries directly using only its parameters, dramatically simplifying the whole retrieval process. We study variations in how documents and their identifiers are represented, variations in training procedures, and the interplay between models and corpus sizes. Experiments demonstrate that given appropriate design choices, DSI significantly outperforms strong baselines such as dual encoder models. Moreover, DSI demonstrates strong generalization capabilities, outperforming a BM25 baseline in a zero-shot setup. Authors: Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, Donald Metzler Links: Merch: store.ykilcher.com TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

2 years ago 16 views

51:51

Transformer Memory as a Differentiable Search Index (Machine Learning Research Paper Explained)

#dsi #search #google Search engines work by building an index and then looking up things in it. Usually, that index is a separate data structure. In keyword search, we build and store reverse indices. In neural search, we build nearest-neighbor indices. This paper does something different: It directly trains a Transformer to return the ID of the most relevant document. No similarity search over embeddings or anything like this is performed, and no external data structure is needed, as the entire index is essentially captured by the model's weights. The paper experiments with various ways of representing documents and training the system, which works surprisingly well! Sponsor: Diffgram https://diffgram.com?ref=yannic OUTLINE: 0:00 - Intro 0:45 - Sponsor: Diffgram 1:35 - Paper overview 3:15 - The search problem, classic and neural 8:15 - Seq2seq for directly predicting document IDs 11:05 - Differentiable search index architecture 18:05 - Indexing 25:15 - Retrieval and document representation 33:25 - Training DSI 39:15 - Experimental results 49:25 - Comments & Conclusions Paper: https://arxiv.org/abs/2202.06991 Abstract: In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all information about the corpus is encoded in the parameters of the model. To this end, we introduce the Differentiable Search Index (DSI), a new paradigm that learns a text-to-text model that maps string queries directly to relevant docids; in other words, a DSI model answers queries directly using only its parameters, dramatically simplifying the whole retrieval process. We study variations in how documents and their identifiers are represented, variations in training procedures, and the interplay between models and corpus sizes. Experiments demonstrate that given appropriate design choices, DSI significantly outperforms strong baselines such as dual encoder models. Moreover, DSI demonstrates strong generalization capabilities, outperforming a BM25 baseline in a zero-shot setup. Authors: Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, Donald Metzler Links: Merch: store.ykilcher.com TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

2 years ago 5 views

14:20

[ML News] Google's 540B PaLM Language Model & OpenAI's DALL-E 2 Text-to-Image Revolution

#mlnews #palm #dalle2 Google releases PaLM and OpenAI releases DALL-E 2 (and more news). Sponsor: Weights & BIases Start here: https://wandb.me/yannic Thumbnail credit: DALL-E 2 via Sam Altman OUTLINE 0:00 - Street interview w/ random stranger 2:25 - Intro 2:50 - PaLM - Google's 540B Pathways Language Model 7:50 - Sponsor: Weights & Biases 9:10 - OpenAI releases DALL-E 2 12:05 - Open Source Datasets and Models 13:20 - Salesforce releases CodeGen My Live Reaction to DALL-E 2: https://youtu.be/gGPv_SYVDC8 My Video on GLIDE: https://youtu.be/gwI6g1pBD84 My Video on the Pathways System: https://youtu.be/vGFaiLeoLWw References: PaLM - Google's 540B Pathways Language Model https://ai.googleblog.com/2022/04/pat... https://storage.googleapis.com/pathwa... OpenAI releases DALL-E 2 https://openai.com/dall-e-2/ https://cdn.openai.com/papers/dall-e-... https://www.instagram.com/openaidalle/ https://twitter.com/sama/status/15117... https://twitter.com/sama/media https://twitter.com/BorisMPower/statu... https://twitter.com/ariskonstant/stat... Open Source Datasets and Models https://twitter.com/multimodalart/sta... https://laion.ai/laion-5b-a-new-era-o... https://github.com/mlfoundations/open... Salesforce releases CodeGen https://github.com/salesforce/CodeGen Links: Merch: store.ykilcher.com TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

2 years ago 7 views

49:25

Author Interview - Improving Intrinsic Exploration with Language Abstractions

#reinforcementlearning #ai #explained This is an interview with Jesse Mu, first author of the paper. Original Paper Review: https://youtu.be/NeGJAUSQEJI Exploration is one of the oldest challenges for Reinforcement Learning algorithms, with no clear solution to date. Especially in environments with sparse rewards, agents face significant challenges in deciding which parts of the environment to explore further. Providing intrinsic motivation in form of a pseudo-reward is sometimes used to overcome this challenge, but often relies on hand-crafted heuristics, and can lead to deceptive dead-ends. This paper proposes to use language descriptions of encountered states as a method of assessing novelty. In two procedurally generated environments, they demonstrate the usefulness of language, which is in itself highly concise and abstractive, which lends itself well for this task. OUTLINE: 0:00 - Intro 0:55 - Paper Overview 4:30 - Aren't you just adding extra data? 9:35 - Why are you splitting up the AMIGo teacher? 13:10 - How do you train the grounding network? 16:05 - What about causally structured environments? 17:30 - Highlights of the experimental results 20:40 - Why is there so much variance? 22:55 - How much does it matter that we are testing in a video game? 27:00 - How does novelty interface with the goal specification? 30:20 - The fundamental problems of exploration 32:15 - Are these algorithms subject to catastrophic forgetting? 34:45 - What current models could bring language to other environments? 40:30 - What does it take in terms of hardware? 43:00 - What problems did you encounter during the project? 46:40 - Where do we go from here? Paper: https://arxiv.org/abs/2202.08938 Abstract: Reinforcement learning (RL) agents are particularly hard to train when rewards are sparse. One common solution is to use intrinsic rewards to encourage agents to explore their environment. However, recent intrinsic exploration methods often use state-based novelty measures which reward low-level exploration and may not scale to domains requiring more abstract skills. Instead, we explore natural language as a general medium for highlighting relevant abstractions in an environment. Unlike previous work, we evaluate whether language can improve over existing exploration methods by directly extending (and comparing to) competitive intrinsic exploration baselines: AMIGo (Campero et al., 2021) and NovelD (Zhang et al., 2021). These language-based variants outperform their non-linguistic forms by 45-85% across 13 challenging tasks from the MiniGrid and MiniHack environment suites. Authors: Jesse Mu, Victor Zhong, Roberta Raileanu, Minqi Jiang, Noah Goodman, Tim Rocktäschel, Edward Grefenstette Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

2 years ago 5 views

59:29

The Weird and Wonderful World of AI Art (w/ Author Jack Morris)

#aiart #deeplearning #clip Since the release of CLIP, the world of AI art has seen an unprecedented level of acceleration in what's possible to do. Whereas image generation had previously been mostly in the domain of scientists, now a community of professional artists, researchers, and amateurs are sending around colab notebooks and sharing their creations via social media. How did this happen? What is going on? And where do we go from here? Jack Morris and I attempt to answer some of these questions, following his blog post "The Weird and Wonderful World of AI Art" (linked below). OUTLINE: 0:00 - Intro 2:30 - How does one get into AI art? 5:00 - Deep Dream & Style Transfer: the early days of art in deep learning 10:50 - The advent of GANs, ArtBreeder and TikTok 19:50 - Lacking control: Pre-CLIP art 22:40 - CLIP & DALL-E 30:20 - The shift to shared colabs 34:20 - Guided diffusion models 37:20 - Prompt engineering for art models 43:30 - GLIDE 47:00 - Video production & Disco Diffusion 48:40 - Economics, money, and NFTs 54:15 - What does the future hold for AI art? Blog post: https://jxmo.notion.site/The-Weird-an... Jack's Blog: https://jxmo.io/ Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

2 years ago 2 views

42:25

Improving Intrinsic Exploration with Language Abstractions (Machine Learning Paper Explained)

#reinforcementlearning #ai #explained Exploration is one of the oldest challenges for Reinforcement Learning algorithms, with no clear solution to date. Especially in environments with sparse rewards, agents face significant challenges in deciding which parts of the environment to explore further. Providing intrinsic motivation in form of a pseudo-reward is sometimes used to overcome this challenge, but often relies on hand-crafted heuristics, and can lead to deceptive dead-ends. This paper proposes to use language descriptions of encountered states as a method of assessing novelty. In two procedurally generated environments, they demonstrate the usefulness of language, which is in itself highly concise and abstractive, which lends itself well for this task. OUTLINE: 0:00 - Intro 1:10 - Paper Overview: Language for exploration 5:40 - The MiniGrid & MiniHack environments 7:00 - Annotating states with language 9:05 - Baseline algorithm: AMIGo 12:20 - Adding language to AMIGo 22:55 - Baseline algorithm: NovelD and Random Network Distillation 29:45 - Adding language to NovelD 31:50 - Aren't we just using extra data? 34:55 - Investigating the experimental results 40:45 - Final comments Paper: https://arxiv.org/abs/2202.08938 Abstract: Reinforcement learning (RL) agents are particularly hard to train when rewards are sparse. One common solution is to use intrinsic rewards to encourage agents to explore their environment. However, recent intrinsic exploration methods often use state-based novelty measures which reward low-level exploration and may not scale to domains requiring more abstract skills. Instead, we explore natural language as a general medium for highlighting relevant abstractions in an environment. Unlike previous work, we evaluate whether language can improve over existing exploration methods by directly extending (and comparing to) competitive intrinsic exploration baselines: AMIGo (Campero et al., 2021) and NovelD (Zhang et al., 2021). These language-based variants outperform their non-linguistic forms by 45-85% across 13 challenging tasks from the MiniGrid and MiniHack environment suites. Authors: Jesse Mu, Victor Zhong, Roberta Raileanu, Minqi Jiang, Noah Goodman, Tim Rocktäschel, Edward Grefenstette Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

2 years ago 10 views

18:02

[ML News] GPT-3 learns to edit | Google Pathways | Make-A-Scene | CLIP meets GamePhysics | DouBlind

#mlnews #gpt3 #pathways Your updates on the latest and greatest from the depths of Machine Learning! Sponsor: Weights & Biases https://wandb.me/yannic OUTLINE: 0:00 - Intro 0:15 - Weights & Biases Report about Reports 2:45 - GPT-3 learns to edit 6:30 - Make-A-Scene: Text-to-Image with Human Priors 8:00 - Pathways: Google's new High-Performance ML scheduler 10:45 - DouBlind: Open Peer-Review 12:45 - CLIP meets GamePhysics 14:40 - Residual Quantization pushes Image Generation SOTA 16:15 - Helpful Things References: Weights & Biases Report about Reports https://wandb.ai/wandb/wandb_example/... GPT-3 learns to edit https://openai.com/blog/gpt-3-edit-in... https://beta.openai.com/playground?mo... Make-A-Scene: Text-to-Image with Human Priors https://arxiv.org/pdf/2203.13131.pdf https://www.youtube.com/watch?v=QLTyq... Pathways: Google's new High-Performance ML scheduler https://arxiv.org/pdf/2203.12533.pdf DouBlind: Open Peer-Review https://doublind.com/#web-intro https://doublind.com/search?query=kil... CLIP meets GamePhysics https://arxiv.org/pdf/2203.11096.pdf https://www.reddit.com/r/GamePhysics/... https://asgaardlab.github.io/CLIPxGam... Residual Quantization pushes Image Generation SOTA https://arxiv.org/pdf/2203.01941.pdf https://github.com/kakaobrain/rq-vae-... Helpful Things https://github.com/TDAmeritrade/stumpy https://github.com/linkedin/fasttreeshap https://github.com/vopani/jaxton https://twitter.com/mark_riedl/status... https://github.com/eilab-gt/NovGrid https://developer.nvidia.com/isaac-gym https://github.com/NVIDIA-Omniverse/I... Links: Merch: store.ykilcher.com TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

2 years ago 6 views

40:37

Author Interview - Memory-assisted prompt editing to improve GPT-3 after deployment

#nlp #gpt3 #prompt This is an interview with the authors of this work, Aman Madaan and Niket Tandon. Large language models such as GPT-3 have enabled many breakthroughs and new applications recently, but they come with an important downside: Training them is very expensive, and even fine-tuning is often difficult. This paper presents an adaptive method to improve performance of such models after deployment, without ever changing the model itself. This is done by maintaining a memory of interactions and then dynamically adapting new prompts by augmenting them with memory content. This has many applications, from non-intrusive fine-tuning to personalization. OUTLINE: 0:00 - Intro 0:45 - Paper Overview 2:00 - What was your original motivation? 4:20 - There is an updated version of the paper! 9:00 - Have you studied this on real-world users? 12:10 - How does model size play into providing feedback? 14:10 - Can this be used for personalization? 16:30 - Discussing experimental results 17:45 - Can this be paired with recommender systems? 20:00 - What are obvious next steps to make the system more powerful? 23:15 - Clarifying the baseline methods 26:30 - Exploring cross-lingual customization 31:00 - Where did the idea for the clarification prompt come from? 33:05 - What did not work out during this project? 34:45 - What did you learn about interacting with large models? 37:30 - Final thoughts Paper: https://arxiv.org/abs/2201.06009 Code & Data: https://github.com/madaan/memprompt Abstract: Large LMs such as GPT-3 are powerful, but can commit mistakes that are obvious to humans. For example, GPT-3 would mistakenly interpret "What word is similar to good?" to mean a homonym, while the user intended a synonym. Our goal is to effectively correct such errors via user interactions with the system but without retraining, which will be prohibitively costly. We pair GPT-3 with a growing memory of recorded cases where the model misunderstood the user's intents, along with user feedback for clarification. Such a memory allows our system to produce enhanced prompts for any new query based on the user feedback for error correction on similar cases in the past. On four tasks (two lexical tasks, two advanced ethical reasoning tasks), we show how a (simulated) user can interactively teach a deployed GPT-3, substantially increasing its accuracy over the queries with different kinds of misunderstandings by the GPT-3. Our approach is a step towards the low-cost utility enhancement for very large pre-trained LMs. All the code and data is available at this https URL. Authors: Aman Madaan, Niket Tandon, Peter Clark, Yiming Yang Links: Merch: store.ykilcher.com TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

2 years ago 22 views

36:41

Memory-assisted prompt editing to improve GPT-3 after deployment (Machine Learning Paper Explained)

#nlp #gpt3 #prompt Large language models such as GPT-3 have enabled many breakthroughs and new applications recently, but they come with an important downside: Training them is very expensive, and even fine-tuning is often difficult. This paper presents an adaptive method to improve performance of such models after deployment, without ever changing the model itself. This is done by maintaining a memory of interactions and then dynamically adapting new prompts by augmenting them with memory content. This has many applications, from non-intrusive fine-tuning to personalization. Sponsor: Introduction to Graph Neural Networks Course https://www.graphneuralnets.com/p/int... OUTLINE: 0:00 - Intro 0:40 - Sponsor: Introduction to GNNs Course (link in description) 1:30 - Paper Overview: Improve GPT-3 after deployment via user feedback 5:30 - Proposed memory-based architecture 13:00 - A detailed look at the components 15:00 - Example tasks 24:30 - My concerns with the example setup 26:20 - Baselines used for comparison 29:50 - Experimental Results 34:20 - Conclusion & Comments Paper: https://arxiv.org/abs/2201.06009 Code & Data: https://github.com/madaan/memprompt Abstract: Large LMs such as GPT-3 are powerful, but can commit mistakes that are obvious to humans. For example, GPT-3 would mistakenly interpret "What word is similar to good?" to mean a homonym, while the user intended a synonym. Our goal is to effectively correct such errors via user interactions with the system but without retraining, which will be prohibitively costly. We pair GPT-3 with a growing memory of recorded cases where the model misunderstood the user's intents, along with user feedback for clarification. Such a memory allows our system to produce enhanced prompts for any new query based on the user feedback for error correction on similar cases in the past. On four tasks (two lexical tasks, two advanced ethical reasoning tasks), we show how a (simulated) user can interactively teach a deployed GPT-3, substantially increasing its accuracy over the queries with different kinds of misunderstandings by the GPT-3. Our approach is a step towards the low-cost utility enhancement for very large pre-trained LMs. All the code and data is available at this https URL. Authors: Aman Madaan, Niket Tandon, Peter Clark, Yiming Yang Links: Merch: store.ykilcher.com TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

2 years ago 13 views

48:55

Author Interview - Typical Decoding for Natural Language Generation

#deeplearning #nlp #sampling This is an interview with first author Clara Meister. Paper review video hereé https://youtu.be/_EDr3ryrT_Y Modern language models like T5 or GPT-3 achieve remarkably low perplexities on both training and validation data, yet when sampling from their output distributions, the generated text often seems dull and uninteresting. Various workarounds have been proposed, such as top-k sampling and nucleus sampling, but while these manage to somewhat improve the generated samples, they are hacky and unfounded. This paper introduces typical sampling, a new decoding method that is principled, effective, and can be implemented efficiently. Typical sampling turns away from sampling purely based on likelihood and explicitly finds a trade-off between generating high-probability samples and generating high-information samples. The paper connects typical sampling to psycholinguistic theories on human speech generation, and shows experimentally that typical sampling achieves much more diverse and interesting results than any of the current methods. Sponsor: Introduction to Graph Neural Networks Course https://www.graphneuralnets.com/p/int... OUTLINE: 0:00 - Intro 0:35 - Sponsor: Introduction to GNNs Course (link in description) 1:30 - Why does sampling matter? 5:40 - What is a "typical" message? 8:35 - How do humans communicate? 10:25 - Why don't we just sample from the model's distribution? 15:30 - What happens if we condition on the information to transmit? 17:35 - Does typical sampling really represent human outputs? 20:55 - What do the plots mean? 31:00 - Diving into the experimental results 39:15 - Are our training objectives wrong? 41:30 - Comparing typical sampling to top-k and nucleus sampling 44:50 - Explaining arbitrary engineering choices 47:20 - How can people get started with this? Paper: https://arxiv.org/abs/2202.00666 Code: https://github.com/cimeister/typical-... Abstract: Despite achieving incredibly low perplexities on myriad natural language corpora, today's language models still often underperform when used to generate text. This dichotomy has puzzled the language generation community for the last few years. In this work, we posit that the abstraction of natural language as a communication channel (à la Shannon, 1948) can provide new insights into the behaviors of probabilistic language generators, e.g., why high-probability texts can be dull or repetitive. Humans use language as a means of communicating information, and do so in a simultaneously efficient and error-minimizing manner; they choose each word in a string with this (perhaps subconscious) goal in mind. We propose that generation from probabilistic models should mimic this behavior. Rather than always choosing words from the high-probability region of the distribution--which have a low Shannon information content--we sample from the set of words with information content close to the conditional entropy of our model, i.e., close to the expected information content. This decision criterion can be realized through a simple and efficient implementation, which we call typical sampling. Automatic and human evaluations show that, in comparison to nucleus and top-k sampling, typical sampling offers competitive performance in terms of quality while consistently reducing the number of degenerate repetitions. Authors: Clara Meister, Tiago Pimentel, Gian Wiher, Ryan Cotterell Links: Merch: store.ykilcher.com TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

2 years ago 19 views

48:55

Typical Decoding for Natural Language Generation (Get more human-like outputs from language models!)

#deeplearning #nlp #sampling Modern language models like T5 or GPT-3 achieve remarkably low perplexities on both training and validation data, yet when sampling from their output distributions, the generated text often seems dull and uninteresting. Various workarounds have been proposed, such as top-k sampling and nucleus sampling, but while these manage to somewhat improve the generated samples, they are hacky and unfounded. This paper introduces typical sampling, a new decoding method that is principled, effective, and can be implemented efficiently. Typical sampling turns away from sampling purely based on likelihood and explicitly finds a trade-off between generating high-probability samples and generating high-information samples. The paper connects typical sampling to psycholinguistic theories on human speech generation, and shows experimentally that typical sampling achieves much more diverse and interesting results than any of the current methods. Sponsor: Fully Connected by Weights & Biases https://wandb.ai/fully-connected OUTLINE: 0:00 - Intro 1:50 - Sponsor: Fully Connected by Weights & Biases 4:10 - Paper Overview 7:40 - What's the problem with sampling? 11:45 - Beam Search: The good and the bad 14:10 - Top-k and Nucleus Sampling 16:20 - Why the most likely things might not be the best 21:30 - The expected information content of the next word 25:00 - How to trade off information and likelihood 31:25 - Connections to information theory and psycholinguistics 36:40 - Introducing Typical Sampling 43:00 - Experimental Evaluation 44:40 - My thoughts on this paper Paper: https://arxiv.org/abs/2202.00666 Code: https://github.com/cimeister/typical-... Abstract: Despite achieving incredibly low perplexities on myriad natural language corpora, today's language models still often underperform when used to generate text. This dichotomy has puzzled the language generation community for the last few years. In this work, we posit that the abstraction of natural language as a communication channel (à la Shannon, 1948) can provide new insights into the behaviors of probabilistic language generators, e.g., why high-probability texts can be dull or repetitive. Humans use language as a means of communicating information, and do so in a simultaneously efficient and error-minimizing manner; they choose each word in a string with this (perhaps subconscious) goal in mind. We propose that generation from probabilistic models should mimic this behavior. Rather than always choosing words from the high-probability region of the distribution--which have a low Shannon information content--we sample from the set of words with information content close to the conditional entropy of our model, i.e., close to the expected information content. This decision criterion can be realized through a simple and efficient implementation, which we call typical sampling. Automatic and human evaluations show that, in comparison to nucleus and top-k sampling, typical sampling offers competitive performance in terms of quality while consistently reducing the number of degenerate repetitions. Authors: Clara Meister, Tiago Pimentel, Gian Wiher, Ryan Cotterell Links: Merch: store.ykilcher.com TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

2 years ago 5 views

48:33

One Model For All The Tasks - BLIP (Author Interview)

#blip #interview #salesforce Paper Review Video: https://youtu.be/X2k7n4FuI7c Sponsor: Assembly AI https://www.assemblyai.com/?utm_sourc... This is an interview with Junnan Li and Dongxu Li, authors of BLIP and members of Salesforce research. Cross-modal pre-training has been all the rage lately in deep learning, especially training vision and language models together. However, there are a number of issues, such as low quality datasets that limit the performance of any model trained on it, and also the fact that pure contrastive pre-training cannot be easily fine-tuned for most downstream tasks. BLIP unifies different tasks and objectives in a single pre-training run and achieves a much more versatile model, which the paper immediately uses to create, filter, clean and thus bootstrap its own dataset to improve performance even more! OUTLINE: 0:00 - Intro 0:40 - Sponsor: Assembly AI 1:30 - Start of Interview 2:30 - What's the pitch? 4:40 - How did data bootstrapping come into the project? 7:10 - How big of a problem is data quality? 11:10 - Are the captioning & filtering models biased towards COCO data? 14:40 - Could the data bootstrapping be done multiple times? 16:20 - What was the evolution of the BLIP architecture? 21:15 - Are there additional benefits to adding language modelling? 23:50 - Can we imagine a modular future for pre-training? 29:45 - Diving into the experimental results 42:40 - What did and did not work out during the research? 45:00 - How is research life at Salesforce? 46:45 - Where do we go from here? Paper: https://arxiv.org/abs/2201.12086 Code: https://github.com/salesforce/BLIP Demo: https://huggingface.co/spaces/Salesfo... Abstract: Vision-Language Pre-training (VLP) has advanced the performance for many vision-language tasks. However, most existing pre-trained models only excel in either understanding-based tasks or generation-based tasks. Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision. In this paper, we propose BLIP, a new VLP framework which transfers flexibly to both vision-language understanding and generation tasks. BLIP effectively utilizes the noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the noisy ones. We achieve state-of-the-art results on a wide range of vision-language tasks, such as image-text retrieval (+2.7% in average recall@1), image captioning (+2.8% in CIDEr), and VQA (+1.6% in VQA score). BLIP also demonstrates strong generalization ability when directly transferred to video-language tasks in a zero-shot manner. Code, models, and datasets are released at this https URL. Authors: Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

2 years ago 26 views

46:40

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding&Generation

#blip #review #ai Cross-modal pre-training has been all the rage lately in deep learning, especially training vision and language models together. However, there are a number of issues, such as low quality datasets that limit the performance of any model trained on it, and also the fact that pure contrastive pre-training cannot be easily fine-tuned for most downstream tasks. BLIP unifies different tasks and objectives in a single pre-training run and achieves a much more versatile model, which the paper immediately uses to create, filter, clean and thus bootstrap its own dataset to improve performance even more! Sponsor: Zeta Alpha https://zeta-alpha.com Use code YANNIC for 20% off! OUTLINE: 0:00 - Intro 0:50 - Sponsor: Zeta Alpha 3:40 - Paper Overview 6:40 - Vision-Language Pre-Training 11:15 - Contributions of the paper 14:30 - Model architecture: many parts for many tasks 19:50 - How data flows in the model 26:50 - Parameter sharing between the modules 29:45 - Captioning & Filtering bootstrapping 41:10 - Fine-tuning the model for downstream tasks Paper: https://arxiv.org/abs/2201.12086 Code: https://github.com/salesforce/BLIP Demo: https://huggingface.co/spaces/Salesfo... Abstract: Vision-Language Pre-training (VLP) has advanced the performance for many vision-language tasks. However, most existing pre-trained models only excel in either understanding-based tasks or generation-based tasks. Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision. In this paper, we propose BLIP, a new VLP framework which transfers flexibly to both vision-language understanding and generation tasks. BLIP effectively utilizes the noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the noisy ones. We achieve state-of-the-art results on a wide range of vision-language tasks, such as image-text retrieval (+2.7% in average recall@1), image captioning (+2.8% in CIDEr), and VQA (+1.6% in VQA score). BLIP also demonstrates strong generalization ability when directly transferred to video-language tasks in a zero-shot manner. Code, models, and datasets are released at this https URL. Authors: Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

2 years ago 22 views

33:19

[ML News] AI Threatens Biological Arms Race

#mlnews #gtc22 #ithaca GTC Registration Link: https://ykilcher.com/gtc Your regular updates on what's going on in the ML world! OUTLINE: 0:00 - Intro 0:20 - Register to Nvidia GTC and win a 3090! 4:15 - DeepMind's Ithaca deciphers Lost Ancient Texts 6:45 - Drug discovery model turns toxic 10:00 - Gary Marcus: Deep Learning is hitting a wall 19:40 - GopherCite: Backing up answers with citations 22:40 - Yoshua Bengio appointed knight of the legion of honour 23:00 - Meta AI tags parody account of Yoshua Bengio 23:40 - Building games using just natural language 24:55 - YOU.com adds writing assistant 25:45 - Horace He: How to brrr 26:35 - Karpathy: Reproducing Yann LeCun's 1989 paper 27:50 - Pig grunt emotion classifier 28:20 - AI annotates protein domain functions 29:40 - Atwood & Carmack: 10k self-driving car bet 30:50 - Helpful Things References: Register to GTC and win a 3090! https://twitter.com/NVIDIAEU/status/1... https://www.nvidia.com/gtc/keynote/?n... https://www.nvidia.com/gtc/?ncid=ref-... https://www.nvidia.com/gtc/keynote/ https://www.nvidia.com/gtc/training/ https://developer.nvidia.com/nvidia-o... DeepMind deciphers Lost Ancient Texts https://deepmind.com/blog/article/Pre... https://www.nature.com/articles/s4158... https://github.com/deepmind/ithaca https://ithaca.deepmind.com/?job=eyJy... Drug discovery model turns toxic https://www.theverge.com/2022/3/17/22... https://www.nature.com/articles/s4225... Gary Marcus: Deep Learning is hitting a wall https://nautil.us/deep-learning-is-hi... https://www.youtube.com/watch?v=fVkXE... GopherCite: Backing up answers with citations https://deepmind.com/research/publica... Yoshua Bengio appointed knight of the legion of honour https://mila.quebec/en/professor-yosh... Meta AI tags parody account https://twitter.com/MetaAI/status/150... Building games using just natural language https://andrewmayneblog.wordpress.com... YOU.com adds writing assistant https://you.com/search?q=how%20to%20w... Horace He: How to brrr https://horace.io/brrr_intro.html Karpathy: Reproducing Yann LeCun's 1989 paper https://karpathy.github.io/2022/03/14... Pig grunt emotion classifier https://science.ku.dk/english/press/n... AI annotates protein domain functions https://ai.googleblog.com/2022/03/usi... https://google-research.github.io/pro... Atwood & Carmack: 10k self-driving car bet https://blog.codinghorror.com/the-203... Helpful Things https://github.com/recognai/rubrix https://twitter.com/taiyasaki/status/... https://github.com/mosaicml/composer?... https://mujoco.org/ https://mujoco.readthedocs.io/en/late... https://github.com/deepmind/mctx?utm_... https://padl.ai/ https://github.com/LaihoE/did-it-spill https://pytorch.org/blog/pytorch-1.11... Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

2 years ago 20 views

1:05:20

Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments (Review)

#multitasklearning #biology #neuralnetworks Catastrophic forgetting is a big problem in mutli-task and continual learning. Gradients of different objectives tend to conflict, and new tasks tend to override past knowledge. In biological neural networks, each neuron carries a complex network of dendrites that mitigate such forgetting by recognizing the context of an input signal. This paper introduces Active Dendrites, which carries over the principle of context-sensitive gating by dendrites into the deep learning world. Various experiments show the benefit in combatting catastrophic forgetting, while preserving sparsity and limited parameter counts. OUTLINE: 0:00 - Introduction 1:20 - Paper Overview 3:15 - Catastrophic forgetting in continuous and multi-task learning 9:30 - Dendrites in biological neurons 16:55 - Sparse representations in biology 18:35 - Active dendrites in deep learning 34:15 - Experiments on multi-task learning 39:00 - Experiments in continual learning and adaptive prototyping 49:20 - Analyzing the inner workings of the algorithm 53:30 - Is this the same as just training a larger network? 59:15 - How does this relate to attention mechanisms? 1:02:55 - Final thoughts and comments Paper: https://arxiv.org/abs/2201.00042 Blog: https://numenta.com/blog/2021/11/08/c... ERRATA: - I was made aware of this by https://twitter.com/ChainlessCoder: "That axon you showed of the pyramidal neuron, is actually the apical dendrite of the neuron". Sorry, my bad :) Abstract: A key challenge for AI is to build embodied systems that operate in dynamically changing environments. Such systems must adapt to changing task contexts and learn continuously. Although standard deep learning systems achieve state of the art results on static benchmarks, they often struggle in dynamic scenarios. In these settings, error signals from multiple contexts can interfere with one another, ultimately leading to a phenomenon known as catastrophic forgetting. In this article we investigate biologically inspired architectures as solutions to these problems. Specifically, we show that the biophysical properties of dendrites and local inhibitory systems enable networks to dynamically restrict and route information in a context-specific manner. Our key contributions are as follows. First, we propose a novel artificial neural network architecture that incorporates active dendrites and sparse representations into the standard deep learning framework. Next, we study the performance of this architecture on two separate benchmarks requiring task-based adaptation: Meta-World, a multi-task reinforcement learning environment where a robotic agent must learn to solve a variety of manipulation tasks simultaneously; and a continual learning benchmark in which the model's prediction task changes throughout training. Analysis on both benchmarks demonstrates the emergence of overlapping but distinct and sparse subnetworks, allowing the system to fluidly learn multiple tasks with minimal forgetting. Our neural implementation marks the first time a single architecture has achieved competitive results on both multi-task and continual learning settings. Our research sheds light on how biological properties of neurons can inform deep learning systems to address dynamic scenarios that are typically impossible for traditional ANNs to solve. Authors: Abhiram Iyer, Karan Grewal, Akash Velu, Lucas Oliveira Souza, Jeremy Forest, Subutai Ahmad Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

2 years ago 23 views

56:32

Active Dendrites avoid catastrophic forgetting - Interview with the Authors

#multitasklearning #biology #neuralnetworks This is an interview with the paper's authors: Abhiram Iyer, Karan Grewal, and Akash Velu! Paper Review Video: https://youtu.be/O_dJ31T01i8 Check out Zak's course on Graph Neural Networks (discount with this link): https://www.graphneuralnets.com/p/int... Catastrophic forgetting is a big problem in mutli-task and continual learning. Gradients of different objectives tend to conflict, and new tasks tend to override past knowledge. In biological neural networks, each neuron carries a complex network of dendrites that mitigate such forgetting by recognizing the context of an input signal. This paper introduces Active Dendrites, which carries over the principle of context-sensitive gating by dendrites into the deep learning world. Various experiments show the benefit in combatting catastrophic forgetting, while preserving sparsity and limited parameter counts. OUTLINE: 0:00 - Intro 0:55 - Sponsor: GNN Course 2:30 - How did the idea come to be? 7:05 - What roles do the different parts of the method play? 8:50 - What was missing in the paper review? 10:35 - Are biological concepts viable if we still have backprop? 11:50 - How many dendrites are necessary? 14:10 - Why is there a plateau in the sparsity plot? 20:50 - How does task difficulty play into the algorithm? 24:10 - Why are there different setups in the experiments? 30:00 - Is there a place for unsupervised pre-training? 32:50 - How can we apply the online prototyping to more difficult tasks? 37:00 - What did not work out during the project? 41:30 - How do you debug a project like this? 47:10 - How is this related to other architectures? 51:10 - What other things from neuroscience are to be included? 55:50 - Don't miss the awesome ending :) Paper: https://arxiv.org/abs/2201.00042 Blog: https://numenta.com/blog/2021/11/08/c... Link to the GNN course (with discount): https://www.graphneuralnets.com/p/int... Abstract: A key challenge for AI is to build embodied systems that operate in dynamically changing environments. Such systems must adapt to changing task contexts and learn continuously. Although standard deep learning systems achieve state of the art results on static benchmarks, they often struggle in dynamic scenarios. In these settings, error signals from multiple contexts can interfere with one another, ultimately leading to a phenomenon known as catastrophic forgetting. In this article we investigate biologically inspired architectures as solutions to these problems. Specifically, we show that the biophysical properties of dendrites and local inhibitory systems enable networks to dynamically restrict and route information in a context-specific manner. Our key contributions are as follows. First, we propose a novel artificial neural network architecture that incorporates active dendrites and sparse representations into the standard deep learning framework. Next, we study the performance of this architecture on two separate benchmarks requiring task-based adaptation: Meta-World, a multi-task reinforcement learning environment where a robotic agent must learn to solve a variety of manipulation tasks simultaneously; and a continual learning benchmark in which the model's prediction task changes throughout training. Analysis on both benchmarks demonstrates the emergence of overlapping but distinct and sparse subnetworks, allowing the system to fluidly learn multiple tasks with minimal forgetting. Our neural implementation marks the first time a single architecture has achieved competitive results on both multi-task and continual learning settings. Our research sheds light on how biological properties of neurons can inform deep learning systems to address dynamic scenarios that are typically impossible for traditional ANNs to solve. Authors: Abhiram Iyer, Karan Grewal, Akash Velu, Lucas Oliveira Souza, Jeremy Forest, Subutai Ahmad Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

2 years ago 34 views

35:58

Author Interview - VOS: Learning What You Don't Know by Virtual Outlier Synthesis

#deeplearning #objectdetection #outliers An interview with the authors of "Virtual Outlier Synthesis". Watch the paper review video here: https://youtu.be/i-J4T3uLC9M Outliers are data points that are highly unlikely to be seen in the training distribution, and therefore deep neural networks have troubles when dealing with them. Many approaches to detecting outliers at inference time have been proposed, but most of them show limited success. This paper presents Virtual Outlier Synthesis, which is a method that pairs synthetic outliers, forged in the latent space, with an energy-based regularization of the network at training time. The result is a deep network that can reliably detect outlier datapoints during inference with minimal overhead. OUTLINE: 0:00 - Intro 2:20 - What was the motivation behind this paper? 5:30 - Why object detection? 11:05 - What's the connection to energy-based models? 12:15 - Is a Gaussian mixture model appropriate for high-dimensional data? 16:15 - What are the most important components of the method? 18:30 - What are the downstream effects of the regularizer? 22:00 - Are there severe trade-offs to outlier detection? 23:55 - Main experimental takeaways? 26:10 - Why do outlier detection in the last layer? 30:20 - What does it take to finish a research projects successfully? Paper: https://arxiv.org/abs/2202.01197 Code: https://github.com/deeplearning-wisc/vos Abstract: Out-of-distribution (OOD) detection has received much attention lately due to its importance in the safe deployment of neural networks. One of the key challenges is that models lack supervision signals from unknown data, and as a result, can produce overconfident predictions on OOD data. Previous approaches rely on real outlier datasets for model regularization, which can be costly and sometimes infeasible to obtain in practice. In this paper, we present VOS, a novel framework for OOD detection by adaptively synthesizing virtual outliers that can meaningfully regularize the model's decision boundary during training. Specifically, VOS samples virtual outliers from the low-likelihood region of the class-conditional distribution estimated in the feature space. Alongside, we introduce a novel unknown-aware training objective, which contrastively shapes the uncertainty space between the ID data and synthesized outlier data. VOS achieves state-of-the-art performance on both object detection and image classification models, reducing the FPR95 by up to 7.87% compared to the previous best method. Code is available at this https URL. Authors: Xuefeng Du, Zhaoning Wang, Mu Cai, Yixuan Li Links: Merch: store.ykilcher.com TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

2 years ago 23 views

35:57

VOS: Learning What You Don't Know by Virtual Outlier Synthesis (Paper Explained)

#vos #outliers #deeplearning Sponsor: Assembly AI Check them out here: https://www.assemblyai.com/?utm_sourc... Outliers are data points that are highly unlikely to be seen in the training distribution, and therefore deep neural networks have troubles when dealing with them. Many approaches to detecting outliers at inference time have been proposed, but most of them show limited success. This paper presents Virtual Outlier Synthesis, which is a method that pairs synthetic outliers, forged in the latent space, with an energy-based regularization of the network at training time. The result is a deep network that can reliably detect outlier datapoints during inference with minimal overhead. OUTLINE: 0:00 - Intro 2:00 - Sponsor: Assembly AI (Link below) 4:05 - Paper Overview 6:45 - Where do traditional classifiers fail? 11:00 - How object detectors work 17:00 - What are virtual outliers and how are they created? 24:00 - Is this really an appropriate model for outliers? 26:30 - How virtual outliers are used during training 34:00 - Plugging it all together to detect outliers Paper: https://arxiv.org/abs/2202.01197 Code: https://github.com/deeplearning-wisc/vos Abstract: Out-of-distribution (OOD) detection has received much attention lately due to its importance in the safe deployment of neural networks. One of the key challenges is that models lack supervision signals from unknown data, and as a result, can produce overconfident predictions on OOD data. Previous approaches rely on real outlier datasets for model regularization, which can be costly and sometimes infeasible to obtain in practice. In this paper, we present VOS, a novel framework for OOD detection by adaptively synthesizing virtual outliers that can meaningfully regularize the model's decision boundary during training. Specifically, VOS samples virtual outliers from the low-likelihood region of the class-conditional distribution estimated in the feature space. Alongside, we introduce a novel unknown-aware training objective, which contrastively shapes the uncertainty space between the ID data and synthesized outlier data. VOS achieves state-of-the-art performance on both object detection and image classification models, reducing the FPR95 by up to 7.87% compared to the previous best method. Code is available at this https URL. Authors: Xuefeng Du, Zhaoning Wang, Mu Cai, Yixuan Li Links: Merch: store.ykilcher.com TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

2 years ago 25 views

1:36:39

Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents

#deepmind #rl #society This is an in-depth paper review, followed by an interview with the papers' authors! Society is ruled by norms, and most of these norms are very useful, such as washing your hands before cooking. However, there also exist plenty of social norms which are essentially arbitrary, such as what hairstyles are acceptable, or what words are rude. These are called "silly rules". This paper uses multi-agent reinforcement learning to investigate why such silly rules exist. Their results indicate a plausible mechanism, by which the existence of silly rules drastically speeds up the agents' acquisition of the skill of enforcing rules, which generalizes well, and therefore a society that has silly rules will be better at enforcing rules in general, leading to faster adaptation in the face of genuinely useful norms. OUTLINE: 0:00 - Intro 3:00 - Paper Overview 5:20 - Why are some social norms arbitrary? 11:50 - Reinforcement learning environment setup 20:00 - What happens if we introduce a "silly" rule? 25:00 - Experimental Results: how silly rules help society 30:10 - Isolated probing experiments 34:30 - Discussion of the results 37:30 - Start of Interview 39:30 - Where does the research idea come from? 44:00 - What is the purpose behind this research? 49:20 - Short recap of the mechanics of the environment 53:00 - How much does such a closed system tell us about the real world? 56:00 - What do the results tell us about silly rules? 1:01:00 - What are these agents really learning? 1:08:00 - How many silly rules are optimal? 1:11:30 - Why do you have separate weights for each agent? 1:13:45 - What features could be added next? 1:16:00 - How sensitive is the system to hyperparameters? 1:17:20 - How to avoid confirmation bias? 1:23:15 - How does this play into progress towards AGI? 1:29:30 - Can we make real-world recommendations based on this? 1:32:50 - Where do we go from here? Paper: https://www.pnas.org/doi/10.1073/pnas... Blog: https://deepmind.com/research/publica... Abstract: The fact that humans enforce and comply with norms is an important reason why humans enjoy higher levels of cooperation and welfare than other animals. Some norms are relatively easy to explain; they may prohibit obviously harmful or uncooperative actions. But many norms are not easy to explain. For example, most cultures prohibit eating certain kinds of foods and almost all societies have rules about what constitutes appropriate clothing, language, and gestures. Using a computational model focused on learning shows that apparently pointless rules can have an indirect effect on welfare. They can help agents learn how to enforce and comply with norms in general, improving the group’s ability to enforce norms that have a direct effect on welfare. Authors: Raphael Köster, Dylan Hadfield-Menell, Richard Everett, Laura Weidinger, Gillian K. Hadfield, Joel Z. Leibo Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

2 years ago 13 views

58:07

First Author Interview: AI & formal math (Formal Mathematics Statement Curriculum Learning)

#openai #math #imo This is an interview with Stanislas Polu, research engineer at OpenAI and first author of the paper "Formal Mathematics Statement Curriculum Learning". Watch the paper review here: https://youtu.be/lvYVuOmUVs8 OUTLINE: 0:00 - Intro 2:00 - How do you explain the big public reaction? 4:00 - What's the history behind the paper? 6:15 - How does algorithmic formal math work? 13:10 - How does expert iteration replace self-play? 22:30 - How is the language model trained and used? 30:50 - Why is every model fine-tuned on the initial state? 33:05 - What if we want to prove something we don't know already? 40:35 - How can machines and humans work together? 43:40 - Aren't most produced statements useless? 46:20 - A deeper look at the experimental results 50:10 - What were the high and low points during the research? 54:25 - Where do we go from here? Paper: https://arxiv.org/abs/2202.01344 miniF2F benchmark: https://github.com/openai/miniF2F Follow Stan here: https://twitter.com/spolu Abstract: We explore the use of expert iteration in the context of language modeling applied to formal mathematics. We show that at same compute budget, expert iteration, by which we mean proof search interleaved with learning, dramatically outperforms proof search only. We also observe that when applied to a collection of formal statements of sufficiently varied difficulty, expert iteration is capable of finding and solving a curriculum of increasingly difficult problems, without the need for associated ground-truth proofs. Finally, by applying this expert iteration to a manually curated set of problem statements, we achieve state-of-the-art on the miniF2F benchmark, automatically solving multiple challenging problems drawn from high school olympiads. Authors: Stanislas Polu, Jesse Michael Han, Kunhao Zheng, Mantas Baksys, Igor Babuschkin, Ilya Sutskever Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

2 years ago 9 views

50:40

OpenAI tackles Math - Formal Mathematics Statement Curriculum Learning (Paper Explained)

#openai #math #imo Formal mathematics is a challenging area for both humans and machines. For humans, formal proofs require very tedious and meticulous specifications of every last detail and results in very long, overly cumbersome and verbose outputs. For machines, the discreteness and sparse reward nature of the problem presents a significant problem, which is classically tackled by brute force search, guided by a couple of heuristics. Previously, language models have been employed to better guide these proof searches and delivered significant improvements, but automated systems are still far from usable. This paper introduces another concept: An expert iteration procedure is employed to iteratively produce more and more challenging, but solvable problems for the machine to train on, which results in an automated curriculum, and a final algorithm that performs well above the previous models. OpenAI used this method to even solve two problems of the international math olympiad, which was previously infeasible for AI systems. OUTLINE: 0:00 - Intro 2:35 - Paper Overview 5:50 - How do formal proofs work? 9:35 - How expert iteration creates a curriculum 16:50 - Model, data, and training procedure 25:30 - Predicting proof lengths for guiding search 29:10 - Bootstrapping expert iteration 34:10 - Experimental evaluation & scaling properties 40:10 - Results on synthetic data 44:15 - Solving real math problems 47:15 - Discussion & comments Paper: https://arxiv.org/abs/2202.01344 miniF2F benchmark: https://github.com/openai/miniF2F Abstract: We explore the use of expert iteration in the context of language modeling applied to formal mathematics. We show that at same compute budget, expert iteration, by which we mean proof search interleaved with learning, dramatically outperforms proof search only. We also observe that when applied to a collection of formal statements of sufficiently varied difficulty, expert iteration is capable of finding and solving a curriculum of increasingly difficult problems, without the need for associated ground-truth proofs. Finally, by applying this expert iteration to a manually curated set of problem statements, we achieve state-of-the-art on the miniF2F benchmark, automatically solving multiple challenging problems drawn from high school olympiads. Authors: Stanislas Polu, Jesse Michael Han, Kunhao Zheng, Mantas Baksys, Igor Babuschkin, Ilya Sutskever Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

2 years ago 22 views

28:20

[ML News] DeepMind controls fusion | Yann LeCun's JEPA architecture | US: AI can't copyright its art

#mlnews #deepmind #fusion Updates on what's going on in the ML world! Check out w&b's alerts feature: https://wandb.me/yannic OUTLINE: 0:00 - Intro 0:20 - Sponsor: Weights & Biases 2:35 - DeepMind uses Reinforcement Learning to control nuclear fusion 4:35 - Google responds to carbon emission estimates 8:40 - Yann LeCun proposes new architecture for world models 11:05 - Fruit fly neurons may perform multiplication 12:00 - Emojisearch App 12:30 - Ar5iv officially in arXiv labs 12:55 - Language Model Consciousness & Media Hype 16:45 - Vision models are more fair when trained on uncurated data 18:30 - CLIPasso 19:15 - NLP with Transformers Book 20:15 - Helpful Things 26:00 - US Office: AI can't copyright its art Sponsor: Weights & Biases https://wandb.me/yannic References: https://wandb.me/yannic DeepMind uses RL to control nuclear fusion https://deepmind.com/blog/article/Acc... https://www.nature.com/articles/s4158... https://www.nature.com/articles/s4158... https://www.alexirpan.com/2018/02/14/... Google responds to carbon emission estimates https://ai.googleblog.com/2022/02/goo... Yann LeCun proposes new architecture for world models https://ai.facebook.com/blog/yann-lec... Fruit fly neurons may perform multiplication https://www.nature.com/articles/s4158... Emojisearch App https://twitter.com/lilianweng/status... https://www.emojisearch.app/ https://github.com/lilianweng/emoji-s... Ar5iv officially in arXiv labs https://blog.arxiv.org/2022/02/21/arx... Tech media may be only slightly conscious https://twitter.com/ilyasut/status/14... https://futurism.com/the-byte/openai-... https://interestingengineering.com/ai... https://futurism.com/mit-researcher-c... https://www.dailymail.co.uk/sciencete... https://futurism.com/conscious-ai-bac... https://www.dailystar.co.uk/tech/news... Vision models are more fair when trained on uncurated data https://arxiv.org/pdf/2202.08360.pdf CLIPasso https://clipasso.github.io/clipasso/ NLP with Transformers Book https://www.amazon.de/dp/1098103246?l... Helpful Things https://github.com/j3soon/tbparse https://github.com/openvinotoolkit/an... https://liuliu66.github.io/articulati... https://github.com/RobertTLange/evosax https://github.com/google/evojax https://github.com/google/evojax/pull/9 https://github.com/facebookresearch/t... https://standard-ai.github.io/Standar... https://twitter.com/PatrickPlaten/sta... https://aimagelab.ing.unimore.it/imag... https://github.com/yashbhalgat/HashNe... https://github.com/patrick-kidger/dif... https://github.com/AI4Finance-Foundat... https://huggingface.co/AI-Nordics/ber... https://huggingface.co/AI-Nordics/gpt... https://paperswithcode.com/dataset/muld https://github.com/JonasGeiping/breac... https://github.com/Weixin-Liang/MetaS... US Office: AI can't copyright its art https://www.theverge.com/2022/2/21/22... https://www.urbasm.com/2016/05/artifi... Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m

2 years ago 42 views

53:45

AlphaCode - with the authors!

An interview with the creators of AlphaCode! Paper review video here: https://youtu.be/s9UAOmyah1A OUTLINE: 0:00 - Intro 1:10 - Media Reception 5:10 - How did the project go from start to finish? 9:15 - Does the model understand its own code? 14:45 - Are there plans to reduce the number of samples? 16:15 - Could one do smarter filtering of samples? 18:55 - How crucial are the public test cases? 21:55 - Could we imagine an adversarial method? 24:45 - How are coding problems even made? 27:40 - Does AlphaCode evaluate a solution's asymptotic complexity? 33:15 - Are our sampling procedures inappropriate for diversity? 36:30 - Are all generated solutions as instructive as the example? 41:30 - How are synthetic examples created during training? 42:30 - What were high and low points during this research? 45:25 - What was the most valid criticism after publication? 47:40 - What are applications in the real world? 51:00 - Where do we go from here? Paper: https://storage.googleapis.com/deepmi... Code: https://github.com/deepmind/code_cont... Abstract: Programming is a powerful and ubiquitous problem-solving tool. Developing systems that can assist programmers or even generate programs independently could make programming more productive and accessible, yet so far incorporating innovations in AI has proven challenging. Recent large-scale language models have demonstrated an impressive ability to generate code, and are now able to complete simple programming tasks. However, these models still perform poorly when evaluated on more complex, unseen problems that require problem-solving skills beyond simply translating instructions into code. For example, competitive programming problems which require an understanding of algorithms and complex natural language remain extremely challenging. To address this gap, we introduce AlphaCode, a system for code generation that can create novel solutions to these problems that require deeper reasoning. Evaluated on recent programming competitions on the Codeforces platform, AlphaCode achieved on average a ranking of top 54.3% in programming competitions with more than 5,000 participants. We found that three key components were critical to achieve good and reliable performance: (1) an extensive and clean competitive programming dataset for training and evaluation, (2) large and efficient-to-sample transformer-based architectures, and (3) large-scale model sampling to explore the search space, followed by filtering based on program behavior to a small set of submissions. Authors: Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, Thomas Hubert, Peter Choy, Cyprien de Masson d’Autume, Igor Babuschkin, Xinyun Chen, Po-Sen Huang, Johannes Welbl, Sven Gowal, Alexey Cherepanov, James Molloy, Daniel J. Mankowitz, Esme Sutherland Robson, Pushmeet Kohli, Nando de Freitas, Koray Kavukcuoglu and Oriol Vinyals Links: Merch: store.ykilcher.com TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m

2 years ago 15 views

26:06

[ML News] Uber: Deep Learning for ETA | MuZero Video Compression | Block-NeRF | EfficientNet-X

#mlnews #muzero #nerf Your regularly irregular updates on everything new in the ML world! Merch: store.ykilcher.com OUTLINE: 0:00 - Intro 0:15 - Sponsor: Weights & Biases 2:15 - Uber switches from XGBoost to Deep Learning for ETA prediction 5:45 - MuZero advances video compression 10:10 - Learned Soft Prompts can steer large language models 12:45 - Block-NeRF captures entire city blocks 14:15 - Neural Architecture Search considers underlying hardware 16:50 - Mega-Blog on Self-Organizing Agents 18:40 - Know Your Data (for Tensorflow Datasets) 20:30 - Helpful Things Sponsor: Weights & Biases https://wandb.me/yannic References: https://docs.wandb.ai/guides/integrat... https://colab.research.google.com/git... https://wandb.ai/borisd13/GPT-3/repor... Uber switches from XGBoost to Deep Learning for ETA prediction https://eng.uber.com/deepeta-how-uber... MuZero advances video compression https://deepmind.com/blog/article/MuZ... https://storage.googleapis.com/deepmi... Learned Soft Prompts can steer large language models https://ai.googleblog.com/2022/02/gui... https://aclanthology.org/2021.emnlp-m... Block-NeRF captures entire city blocks https://arxiv.org/abs/2202.05263 https://arxiv.org/pdf/2202.05263.pdf https://waymo.com/intl/zh-cn/research... Neural Architecture Search considers underlying hardware https://ai.googleblog.com/2022/02/unl... https://openaccess.thecvf.com/content... Mega-Blog on Self-Organizing Agents https://developmentalsystems.org/sens... https://flowers.inria.fr/ Know Your Data (for Tensorflow Datasets) https://knowyourdata-tfds.withgoogle.... https://knowyourdata.withgoogle.com/ Helpful Things https://twitter.com/casualganpapers/s... https://www.reddit.com/r/MachineLearn... https://arxiv.org/abs/2202.02435 https://github.com/vicariousinc/PGMax https://www.vicarious.com/posts/pgmax... https://diambra.ai/tournaments https://github.com/diambra/diambraArena https://www.youtube.com/watch?v=dw72P... https://gitlab.com/deepcypher/python-... https://python-fhez.readthedocs.io/en... https://joss.theoj.org/papers/10.2110... https://github.com/PyTorchLightning/m... https://torchmetrics.readthedocs.io/e... https://twitter.com/alanyttian/status... https://github.com/google/evojax https://arxiv.org/abs/2202.05008 https://www.reddit.com/r/MachineLearn... https://www.gymlibrary.ml/pages/api/#... Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

2 years ago 27 views