Dataset Distillation with Neural Characteristic Function: A Minmax Perspective
AI-generated Key Points
- Dataset distillation is a popular method in deep learning for reducing data requirements.
- Existing distance metrics in distribution matching may not accurately capture distributional differences, leading to unreliable measures of discrepancy.
- A new approach reframes dataset distillation as a minmax optimization problem and introduces Neural Characteristic Function Discrepancy (NCFD) as a comprehensive metric for measuring distributional variances.
- NCFD leverages the Characteristic Function (CF) to encapsulate complete distributional information and utilizes a neural network to optimize the sampling strategy for the CF's frequency arguments.
- The proposed method, Neural Characteristic Function Matching (\mymethod{}), aligns the phase and amplitude of neural features in the complex plane for both real and synthetic data to achieve a balance between realism and diversity in synthetic samples.
- Experimental results show significant performance improvements over state-of-the-art methods on datasets with varying resolutions, with a notable 20.5% accuracy boost observed on ImageSquawk dataset.
- The method reduces GPU memory usage by over 300 times and achieves processing speeds that are 20 times faster compared to existing techniques.
- The research marks a milestone achievement by accomplishing lossless compression of CIFAR-100 using only 2.3 GB of memory on a single NVIDIA 2080 Ti GPU, which has not been reported before in literature.
Authors: Shaobo Wang, Yicun Yang, Zhiyuan Liu, Chenghao Sun, Xuming Hu, Conghui He, Linfeng Zhang
Abstract: Dataset distillation has emerged as a powerful approach for reducing data requirements in deep learning. Among various methods, distribution matching-based approaches stand out for their balance of computational efficiency and strong performance. However, existing distance metrics used in distribution matching often fail to accurately capture distributional differences, leading to unreliable measures of discrepancy. In this paper, we reformulate dataset distillation as a minmax optimization problem and introduce Neural Characteristic Function Discrepancy (NCFD), a comprehensive and theoretically grounded metric for measuring distributional differences. NCFD leverages the Characteristic Function (CF) to encapsulate full distributional information, employing a neural network to optimize the sampling strategy for the CF's frequency arguments, thereby maximizing the discrepancy to enhance distance estimation. Simultaneously, we minimize the difference between real and synthetic data under this optimized NCFD measure. Our approach, termed Neural Characteristic Function Matching (\mymethod{}), inherently aligns the phase and amplitude of neural features in the complex plane for both real and synthetic data, achieving a balance between realism and diversity in synthetic samples. Experiments demonstrate that our method achieves significant performance gains over state-of-the-art methods on both low- and high-resolution datasets. Notably, we achieve a 20.5\% accuracy boost on ImageSquawk. Our method also reduces GPU memory usage by over 300$\times$ and achieves 20$\times$ faster processing speeds compared to state-of-the-art methods. To the best of our knowledge, this is the first work to achieve lossless compression of CIFAR-100 on a single NVIDIA 2080 Ti GPU using only 2.3 GB of memory.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.