电气工程及其自动化(LBG算法的语音识别)外文翻译文献

由天下分享时间：2024/9/16 21:07:55 加入收藏我要投稿点赞

文献信息：

文献标题：Speech Recognition Using Vector Quantization through Modified K-meansLBG Algorithm（基于改进矢量量化K-均值LBG算法的语音识别）

国外作者：Balwant A.Sonkamble，Dharmpal Doye

文献出处：《Computer Engineering and Intelligent Systems》, 2012, 7(3) 字数统计：英文2389单词，13087字符；中文3968汉字

外文文献：

Speech Recognition Using Vector Quantization through

Modified K-meansLBG Algorithm

Abstract In the Vector Quantization, the main task is to generate a good codebook. The distortion measure between the original pattern and the reconstructed pattern should be minimum. In this paper, a proposed algorithm called Modified K-meansLBG algorithm used to obtain a good codebook. The system has shown good performance on limited vocabulary tasks.

Keywords: K-means algorithm, LBG algorithm, Vector Quantization, Speech Recognition

1.Introduction

The natural way of communication among human beings is through speech. Many human beings are exchanging the information through mobile phones as well as other communication tools in a real manner [L. R. Rabiner et al., 1993]. The Vector Quantization (VQ) is the fundamental and most successful technique used in speech coding, image coding, speech recognition, and speech synthesis and speaker recognition [S. Furui, 1986]. These techniques are applied firstly in the analysis of speech where the mapping of large vector space into a finite number of regions in that

space. The VQ techniques are commonly applied to develop discrete or semi-continuous HMM based speech recognition system.

In VQ, an ordered set of signal samples or parameters can be efficiently coded by matching the input vector to a similar pattern or codevector (codeword) in a predefined codebook [[Tzu-Chuen Lu et al., 2010].

The VQ techniques are also known as data clustering methods in various disciplines. It is an unsupervised learning procedure widely used in many applications. The data clustering methods are classified as hard and soft clustering methods. These are centroid-based parametric clustering techniques based on a large class of distortion functions known as Bregman divergences [Arindam Banerjee et al., 2005].

In the hard clustering, each data point belongs to exactly one of the partitions in obtaining the disjoint partitioning of the data whereas each data point has a certain probability of belonging to each of the partitions in soft clustering. The parametric clustering algorithms are very popular due to its simplicity and scalability. The hard clustering algorithms are based on the iterative relocation schemes. The classical K-means algorithm is based on Euclidean distance and the Linde-Buzo-Gray (LBG) algorithm is based on the Itakura-Saito distance. The performance of vector quantization techniques depends on the existence of a good codebook of representative vectors.

In this paper, an efficient VQ codebook design algorithm is proposed known as Modified K-meansLBG algorithm. This algorithm provides superior performance as compared to classical K-means algorithm and the LBG algorithm. Section-2 describes the theoretical details of VQ. Section-3 elaborates LBG algorithm. Section-4 explains classical K-means algorithm. Section -5 emphasizes proposed modified K-meansLBG algorithm. The experimental work and results are discussed in Section-6 and the concluding remarks made at the end of the paper.

2.Vector Quantization

The main objective of data compression is to reduce the bit rate for transmission or data storage while maintaining the necessary fidelity of the data. The feature vector

may represent a number of different possible speech coding parameters including linear predictive coding (LPC) coefficients, cepstrum coefficients. The VQ can be considered as a generalization of scalar quantization to the quantization of a vector. The VQ encoder encodes a given set of k-dimensional data vectors with a much smaller subset. The subset C is called a codebook and its elements Ci are called codewords, codevectors, reproducing vectors, prototypes or design samples. Only the index i is transmitted to the decoder. The decoder has the same codebook as the encoder, and decoding is operated by table look-up procedure.

The commonly used vector quantizers are based on nearest neighbor called Voronoi or nearest neighbour vector quantizer. Both the classical K-means algorithm and the LBG algorithm belong to the class of nearest neighbor quantizers.

A key component of pattern matching is the measurement of dissimilarity between two feature vectors. The measurement of dissimilarity satisfies three metric properties such as Positive definiteness property, Symmetry property and Triangular inequality property. Each metric has three main characteristics such as computational complexity, analytical tractability and feature evaluation reliability. The metrics used in speech processing are derived from the Minkowski metric [J. S. Pan et al. 1996]. The Minkowski metric can be expressed as

Dp(X,Y)?p?xi?1ki?yip

WhereX?{x1,x2,...,xk} and Y?{y1,y2,...,yk} are vectors and p is the order of the metric.

The City block metric, Euclidean metric and Manhattan metric are the special cases of Minkowski metric. These metrics are very essential in the distortion measure computation functions.

The distortion measure is one which satisfies only the positive definiteness property of the measurement of dissimilarity. There were many kinds of distortion measures including Euclidean distance, the Itakura distortion measure and the likelihood distortion measure, and so on.

The Euclidean metric [Tzu-Chuen Lu et al., 2010] is commonly used because it fits the physical meaning of distance or distortion. In some applications division calculations are not required. To avoid calculating the divisions, the squared Euclidean metric is employed instead of the Euclidean metric in pattern matching.

The quadratic metric [Marcel R. Ackermann et al., 2010] is an important generalization of the Euclidean metric. The weighted cepstral distortion measure is a kind of quadratec metric. The weighted cepstral distortion key feature is that it equalizes the importance in each dimension of cepstrum coefficients. In the speech recognition, the weighted cepstral distortion can be used to equalize the performance of the recognizer across different talkers. The Itakura-Saito distortion [Arindam Banerjee et al., 2005] measure computes a distortion between two input vectors by using their spectral densities.

The performance of the vector quantizer can be evaluated by a distortion measure

?)associated with quantizing any input D which is a non-negative cost D(Xj,Xj

?. Usually, the Euclidean distortion measure vector Xjwith a reproduction vecto Xj

is used. The performance of a quantizer is always qualified by an average distortion

?)]etween the input vectors and the final reproduction vectors, Dv?E[D(Xj,Xj

where E represents the expectation operator. Normally, the performance of the quantizer will be good if the average distortion is small.

Another important factor in VQ is the codeword search problem. As the vector dimension increases accordingly the search complexity increases exponentially, this is a major limitation of VQ codeword search. It limits the fidelity of coding for real time transmission.

A full search algorithm is applied in VQ encoding and recognition. It is a time consuming process when the codebook size is large.

In the codeword search problem, assigning one codeword to the test vector means the smallest distortion between the codeword and the test vector among all codewords. Given one codeword Ct and the test vector X in the k-dimensional space,

the distortion of the squared Euclidean metric can be expressed as follows:

D(X,Ct)??(xi?cti)2

i?1kWhere Ct?{ct1,ct2,......,ctk} and X?{x1,,x2,......,xk}

There are three ways of generating and designing a good codebook namely the random method, the pair-wise nearest

neighbor clustering and the splitting method. A wide variety of distortion functions, such as squared Euclidean distance, Mahalanobis distance, Itakura-Saito distance and relative entropy have been used for clustering. There are three major procedures in VQ, namely codebook generation, encoding procedure and decoding procedure. The LBG algorithm is an efficient VQ clustering algorithm. This algorithm is based either on a known probabilistic model or on a long training sequence of data.

3.Linde–Buzo–Gray (LBG) algorithm

The LBG algorithm is also known as the Generalised Lloyd algorithm (GLA). It is an easy and rapid algorithm used as an iterative nonvariational technique for designing the scalar quantizer. It is a vector quantization algorithm to derive a good codebook by finding the centroids of partitioned sets and the minimum distortion partitions. In LBG, the initial centroids are generated from all of the training data by applying the splitting procedure. All the training vectors are incorporated to the training procedure at each iteration. The GLA algorithm is applied to generate the centroids and the centroids cannot change with time. The GLA algorithm starts from one cluster and then separates this cluster to two clusters, four clusters, and so on until N clusters are generated, where N is the desired number of clusters or codebook size. Therefore, the GLA algorithm is a divisive clustering approach. The classification at each stage uses the full-search algorithm to find the nearest centroid to each vector. The LBG is a local optimization procedure and solved through various approaches such as directed search binary-splitting, mean-distance-ordered partial codebook search [Linde et al., 1980, Modha et al., 2003], enhance LBG, GA-based algorithm

电气工程及其自动化(LBG算法的语音识别)外文翻译文献

文献信息：文献标题：SpeechRecognitionUsingVectorQuantizationthroughModifiedK-meansLBGAlgorithm（基于改进矢量量化K-均值LBG算法的语音识别）国外作者：BalwantA.Sonkamble，DharmpalDoye文献出处：《ComputerEngi

推荐度：

点击下载文档文档为doc格式

电气工程及其自动化(LBG算法的语音识别)外文翻译文献

电气工程及其自动化(LBG算法的语音识别)外文翻译文献

相关推荐文档

精选图文

热门排序

推荐文章

热门标签

相关文章列表