Learning Factored Representations in a Deep Mixture of Experts

AI-generated keywords: Deep Mixture of Experts Factored Representations Gating Network Stacked Models Parallelized Training

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Mixtures of Experts:
  • Combines outputs from multiple expert networks specializing in different aspects of input space
  • Key is training a "gating" network to assign inputs to specific expert distributions
  • Potential for constructing larger and more efficient networks during testing
  • Allows for parallelized training
  • Deep Mixture of Experts:
  • Stacked model with multiple sets of gating mechanisms and expert networks
  • Exponentially increases effective experts by associating inputs with various combinations at different layers
  • Maintains manageable model size
  • Experimentation Findings:
  • Autonomously develops location-dependent ("where") experts at initial layer and class-specific ("what") experts at subsequent layers on MNIST dataset
  • Effectively utilizes distinct combinations of experts on speech monophones dataset
  • Versatility and Adaptability:
  • Showcases ability to learn complex representations based on varying inputs
  • Enhances performance and enables nuanced understanding within deep learning architectures
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: David Eigen, Marc'Aurelio Ranzato, Ilya Sutskever

Abstract: Mixtures of Experts combine the outputs of several "expert" networks, each of which specializes in a different part of the input space. This is achieved by training a "gating" network that maps each input to a distribution over the experts. Such models show promise for building larger networks that are still cheap to compute at test time, and more parallelizable at training time. In this this work, we extend the Mixture of Experts to a stacked model, the Deep Mixture of Experts, with multiple sets of gating and experts. This exponentially increases the number of effective experts by associating each input with a combination of experts at each layer, yet maintains a modest model size. On a randomly translated version of the MNIST dataset, we find that the Deep Mixture of Experts automatically learns to develop location-dependent ("where") experts at the first layer, and class-specific ("what") experts at the second layer. In addition, we see that the different combinations are in use when the model is applied to a dataset of speech monophones. These demonstrate effective use of all expert combinations.

Submitted to arXiv on 16 Dec. 2013

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 1312.4314v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In their study titled "Learning Factored Representations in a Deep Mixture of Experts," authors David Eigen, Marc'Aurelio Ranzato, and Ilya Sutskever explore the concept of Mixtures of Experts. This approach involves combining outputs from multiple expert networks that specialize in different aspects of the input space. The key to this method is training a "gating" network that assigns each input to a specific distribution over these experts. It has shown potential for constructing larger and more efficient networks during testing while allowing for parallelized training. Building upon this foundation, the researchers introduce a novel extension known as the Deep Mixture of Experts. This stacked model incorporates multiple sets of gating mechanisms and expert networks, exponentially increasing the number of effective experts by associating each input with various combinations at different layers while maintaining a manageable model size. Through experimentation on a randomly translated version of the MNIST dataset, it was observed that the Deep Mixture of Experts autonomously develops location-dependent ("where") experts at the initial layer and class-specific ("what") experts at subsequent layers. Furthermore, when applied to a dataset featuring speech monophones, it became evident that distinct combinations of experts were effectively utilized by the model. This showcases the versatility and adaptability of the Deep Mixture of Experts in learning complex representations based on varying inputs. Overall, this research highlights how leveraging stacked models with multiple sets of gating and expert networks can enhance performance and enable more nuanced understanding within deep learning architectures.
Created on 06 Feb. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.