680-4 Kawazu, Iizuka-shi, Fukuoka, 820-8502, JAPAN
Abstract: -This paper describes that actomyosin complex particles are automatically detected. We propose a new approach, which combines the cascading classifier based on AdaBoost algorithm to select features with SVM classifier to detect actomyosin complex particles automatically. Experimental results show that the detection rate achieved 94% with false positive rate of 2.14% leading to a total rate of 96.57% of examples that were correct classified and the area under the ROC curve (AUC) is 0.9705.
Key-Words: - Actomyosin complex, Cryo-EM image, AdaBoost algorithm, Cascade of classifier, Support vector machine
The single particle analysis has been widely used for 3D reconstruction of large molecular complexes from cryo-EM image . Owing to the low signal to noise ratio in cryo-EM images, one will require a hundreds of thousands or even million of high resolution particles, which make it impractical to manually pick the particles. Myosin is the best studying molecular motor. Myosin was known to exist in muscle and non-muscle tissue. About 50% protein in a muscle cell is myosin and about 30% myosin bound to action . In order to understand how myosin produces force, it is necessary to visualize the structure of myosin during a power stroke, as it goes through the cycle of splitting ATP and binding to actin. And information on the myosin bound to actin can be obtained using cryo-EM. EOS (Extensible and Object-oriented System) is a group of small tools including three-dimensional reconstruction of macromolecules. For particle analysis, the particle selection is critical and become a bottleneck in high the resolution structure determination of macromolecules using cryo-EM. This is an unresolved challenging problem. This demands development of fast and accuracy detection algorithm . In order to enhance detection rate, speed up detection procedure, and reduce the false positive rate such as Yongyi Yang et al. proposed SVM approach of detection for microcalcifications, Zeyun Yun et al. proposed feature extraction from the edge map, Roseman, A.M. proposed particle finding using a fast local correlation algorithm, and Zhu,Y. et al. proposed fast detection of generic biological particles. These algorithms (most of particles are spherical and rectangular) can achieve over 90% detecting rate and false positive rate ranging from 15% to 30%. The lowest false positive rate is 4.5% with a false negative rate of 23.2%.
The paper focused mainly on automatic selection asymmetric macromolecule particles in low contrast cryo-EM image. Since actomyosin complex shape is complex, its feature extraction is very difficult. We propose a new approach that combines cascading classifier based on AdaBoost algorithm to select features with SVM (support vector machine) classifier to detect actomyosin complex particles automatically.
The feature extraction of actomyosin particle will be key problems. In the system, the feature is computed using Haar-like rectangle feature and integral image, in which actomyosin particles are computed rapidly. Also, we adopt cascade of classifier for the purpose of achieving a high detection performance and reducing computational time radically. The learning goal for the cascade is to construct the efficient a set classifiers, which reject a large majority of negative sub-windows while detecting most of all positive examples. SVM classifier is used as final classifier to improve performance of classifier. The decision function of SVM classifier is computed by Support Vectors (SVs) that can represent all the information about classification in the training examples. The number of SVs is quite small compared with a total number of training examples. Our experimental results show that the number of SVs is approximately 13.14% of a total number of training examples, and so training time for SVM classifier is reduced. And the paper combines cascaded classifier based on AdaBoost algorithm to select features out of huge dataset, so that the detection process is speeded up. Therefore, training speed of system is faster than ANN (artificial neural network) or other methods. The paper is organized as follows: In section 2, the architecture of Ada-SVM, feature selection based on AdaBoost algorithm and SVM classifier are described. In section 3, the implementation of the automatic detection system is represented. The experimental results are presented in section 4. In the last section conclusions and a look towards future research are represented.
A combination approach, in which the actomyosin particle features are selected by AdaBoost algorithm and used for a reduced representation for training SVM (simplified Ada-SVM). The main idea of designing Ada-SVM is that actomyosin particle feature is computed very rapidly using Haar-like rectangle feature and integral image. And this system combines the cascading classifier with SVM classifier to speed up detection process and improve classifier performance. The architecture of Ada-SVM is shown in Fig. 1.
Fig. 1 The architecture of Ada-SVM
The automatic detection system is made up of two major parts of cascade and SVM. The first part consists of three stages cascade of classifiers. In the cascade, the multi-weak classifiers construct a strong classifier. The features of actomyosin particle are extracted using weak classifier with T round of boosting and AdaBoost algorithm is used to select a small number of the important features out of huge feature space given the training set to speed up detection process. The second part is composed of SVM classifier that is used for the final classifier and implement binary classes.
2.2 Haar-like Feature Types and Integral Image
Actomyosin particle is described by the over-complete Haar-like features. These are very simple features that compute very rapidly using the integral image. Four Haar-like feature types  are shown in Fig. 2.
Haar-like feature type is described as follows. The sum of pixels which lie within the white rectangles are subtracted from the sum of pixels in the grey rectangles. Since the two rectangle features defined above involve adjacent rectangular sums they can be computed in six array references, the case of the three-rectangular features can be computed in eight array references, and the special diagonal line features can be computed in nine array references.
Fig. 2 Four types of simple Haar-like feature
Rectangle features can be computed very rapidly using the integral image . An integral image ii over an image i is defined as follow:
Any rectangular feature sum of pixels can be computed and shown in Fig.3.
Fig. 3 Computing the sum of the pixels within rectangle D
The sum of the pixels within rectangle D is computed with four array references: The value of the integral image at location 1 is the sum of the pixels in rectangle A, the value at location 2 is A+ B, the value at location 3 is A+ C, and the value at location 4 is A+B +C +D. Therefore, the sum within D is computed as (4+1)-(2+3).