Learning Factored Representations in a Deep Mixture of Experts
AI-generated Key Points
⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.
- Mixtures of Experts:
- Combines outputs from multiple expert networks specializing in different aspects of input space
- Key is training a "gating" network to assign inputs to specific expert distributions
- Potential for constructing larger and more efficient networks during testing
- Allows for parallelized training
- Deep Mixture of Experts:
- Stacked model with multiple sets of gating mechanisms and expert networks
- Exponentially increases effective experts by associating inputs with various combinations at different layers
- Maintains manageable model size
- Experimentation Findings:
- Autonomously develops location-dependent ("where") experts at initial layer and class-specific ("what") experts at subsequent layers on MNIST dataset
- Effectively utilizes distinct combinations of experts on speech monophones dataset
- Versatility and Adaptability:
- Showcases ability to learn complex representations based on varying inputs
- Enhances performance and enables nuanced understanding within deep learning architectures
Authors: David Eigen, Marc'Aurelio Ranzato, Ilya Sutskever
Abstract: Mixtures of Experts combine the outputs of several "expert" networks, each of which specializes in a different part of the input space. This is achieved by training a "gating" network that maps each input to a distribution over the experts. Such models show promise for building larger networks that are still cheap to compute at test time, and more parallelizable at training time. In this this work, we extend the Mixture of Experts to a stacked model, the Deep Mixture of Experts, with multiple sets of gating and experts. This exponentially increases the number of effective experts by associating each input with a combination of experts at each layer, yet maintains a modest model size. On a randomly translated version of the MNIST dataset, we find that the Deep Mixture of Experts automatically learns to develop location-dependent ("where") experts at the first layer, and class-specific ("what") experts at the second layer. In addition, we see that the different combinations are in use when the model is applied to a dataset of speech monophones. These demonstrate effective use of all expert combinations.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
⚠The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.