基于WDGAN-div的语音增强方法
DOI:
作者:
作者单位:

1.陆军工程大学通信士官学校 重庆 400035; 2.合肥讯飞数码科技有限公司 安徽 合肥 230088

作者简介:

通讯作者:

中图分类号:

TP391.4

基金项目:

军内科研项目(LJ20191C070659)资助


Speech enhancement method based on WDGAN-div
Author:
Affiliation:

1. Communication Sergeants College, PLA Army Engineering University, Chongqing 400035, China; 2. Hefei iFlytek Digital Technology limited company, Hefei 230088, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对在低信噪比环境下传统语音增强方法适应性差和增强效果不理想的问题,提出一种基于Wasserstein散度的深度生成对抗网络(Wasserstein Divergence Deep Generative Adversarial Network)的语音增强方法。该方法以5个生成器和1个判别器为基础组成深度生成对抗网络,利用5个生成器进行5次增强处理,有效提高对抗网络在低信噪比条件下的增强效果,使用Wasserstein散度优化网络训练,改善传统GAN网络训练过程中存在的训练不稳定等问题,提高深度生成对抗网络训练的稳定性。在低信噪比环境下该方法相比于传统语音增强方法噪声适应性和增强效果都有明显提升。实验结果表明,与原始带噪语音相比,增强语音的分段信噪比平均提高6.1dB,语音质量感知评估测度和短时客观可懂度分别平均提升28.9%和10.6%。

    Abstract:

    Aiming at the problem of poor adaptability and unsatisfactory enhancement effects of traditional speech enhancement methods in low signal-to-noise ratio environments, this paper proposes a speech enhancement method based on Wasserstein Divergence Deep Generative Adversarial Networks. The DGAN is based on five generators and one discriminator. Five generators are used to enhance the noisy speech signal five times, which effectively improves the enhancement effect of the DGAN in low signal-to-noise ratio environments. At the same time, Wasserstein divergence is used to optimize the network training which can solve the problems in the traditional GAN training process and improve the stability of the DGAN training process. Comparing with traditional speech enhancement methods, the noise adaptability and enhancement effect of this method are significantly improved in low signal-to-noise ratio environments. The experimental results show that, compared with the original noisy speech, SegSNR of the enhanced speech is improved by an average of 6.1 dB. PESQ is increased by an average of 28.9% and STOI is increased by an average of 10.6%.

    参考文献
    相似文献
    引证文献
引用本文

韩鑫怡,张洪德,柳林,柳扬.基于WDGAN-div的语音增强方法[J].电子测量技术,2021,44(21):64-70

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-07-08
  • 出版日期: