Abstract:Aiming at the problem of outliers in the process of data collection, a data cleaning method based on improved GMM algorithm is proposed. Firstly, edge computing is introduced to solve the problem of excessive load. Secondly, in order to avoid falling into the local optimal solution when calculating the parameters of EM algorithm, the disadvantages of falling into the local optimal solution are improved by optimizing the parameters of GMM algorithm. The experimental results show that the improved GMM algorithm outperforms the GMM-EM algorithm in terms of recall and F value under a certain amount of data. It can be seen that the improved algorithm can improve the cleaning effect of abnormal data to a certain extent and ensure the reliability of data.