基于LFDA和SVM的微阵列基因检索系统(IJISA-V10-N1-2)

I.J. Intelligent Systems and Applications, 2018, 1, 9-15

Published Online January 2018 in MECS (https://www.360docs.net/doc/4e10199656.html,/)

DOI: 10.5815/ijisa.2018.01.02

Microarray Gene Retrieval System Based on

LFDA and SVM

Lt. Thomas Scaria1

1Department of Computer Science, St. Pius X College, Kerala, India

E-mail: thomasscaria.pu@https://www.360docs.net/doc/4e10199656.html,

Dr. T. Christopher2

2Asst Professor, PG and Research Department of Information Technology, Government

Arts College, Coimbatore, India

Received: 10 May 2017; Accepted: 06 July 2017; Published: 08 January 2018

Abstract—The DNA microarray technology enables the biologists to observe the expressions of multiple thousands of genes in parallel fashion. However, processing and gaining knowledge from the voluminous microarray gene data is serious issue. It is necessary for the biologists to retrieve the required data in a reasonable time. In order to address this issue, this work presents a gene retrieval system, which is based on feature dimensionality minimization and classification of the microarray gene data. The feature dimensionality minimization is achieved by Local Fisher Discriminant Analysis (LFDA), which inherits the merits of both Fisher Discriminant Analysis (FDA) and Locality Preserving Projection (LPP). Support Vector Machine (SVM) is employed as the classifier to classify between the genes. The LFDA is chosen for reducing the dimensionality of the features, owing to its better performance on multimodal data. The SVM is trained with the feature dimensionality reduced microarray gene data, which improves the efficiency and overthrows the computational complexity. The performance of the proposed approach is compared with the LPP and FDA. Additionally, the performance of SVM is compared with the k-Nearest Neighbour (k-NN) classifier. The combination of LFDA and SVM serves better in terms of accuracy, sensitivity and specificity.

Index Terms—DNA microarray technology, feature reduction, SVM classification, LFDA.

I.I NTRODUCTION

Data mining is the process of examining a vast amount of data to provide well-defined information. Hence, the process of transformation from raw data to clear-cut information is called as knowledge discovery process. Recognising the superiority of data mining, several applications utilize the technology. Some of the noteworthy applications are business platforms, finance, biomedicine, networks, education and so on [1]. Microarray technology enables the life scientists to study and analyse the genomic expression of a living organism [2, 3]. DNA microarray technology makes it possible to observe the expression level of several thousands of genes in a concurrent fashion [4,5]. Numerous research activities exploit the microarrays for definitive solution. For instance, toxicology, medical diagnosis, gene regulative studies depend on the microarrays [6-8].

Data mining techniques intend to gain knowledge from the microarray data by means of data clustering, classification and association mining. The overall process of microarray data analysis consists of data normalization, data processing and post-processing. Based on the operative nature of data mining techniques, the learning algorithms are classified into supervised and unsupervised learning.

Supervised learning is comprised of two phases and they are training and testing. In the training phase, the classifier is fed with knowledge of the subject samples of several categories. The testing stage aims to find the category of the test sample being passed. Unsupervised learning techniques do not require any prior knowledge about the processing data.

Irrespective of the presence of several data mining techniques, it is still difficult to gain knowledge from the voluminous microarray data [9]. It would be beneficial for the scientists to have a system that can fetch the data being searched and is popularly called as information retrieval system. The microarray information retrieval system strongly depends on the gene classification process. However the classification process is not simple, as the microarray data is quite voluminous and high dimensional [10, 11].

Understanding the necessity of the information retrieval from microarray gene data, this article proposes a microarray gene information retrieval system. The entire work is decomposed into two different phases, which are microarray feature reduction and classification. The initial phase aims to reduce the dimensionality of the microarray data and the next phase engages itself in gene classification. This work employs Local Fisher Discriminant Analysis (LFDA) to reduce the dimensions of the features. The reasons for the employment of LFDA

are it preserves the local structure of the data and clearly separates different classes [12].

The Support Vector Machine (SVM) is trained with the dimensionality reduced feature set. SVM is employed owing to its margin optimization and exclusion of local maxima [13]. The key points of the proposed work are listed below.

?Incorporation of LFDA to minimize the dimensionality of the features, which makes the

entire process simpler.

?The classifier SVM is trained with the dimensionality reduced set of features, which

enhances the speed of learning.

?The information retrieval is faster, as LFDA reduces the feature dimension to train the SVM.

?The computational complexity is overthrown, as the process of classification is done after feature

dimensionality minimization.

The rest of the paper is organized as follows. Section 2 presents the related review of literature with respect to the microarray gene data classification. The proposed information retrieval system for microarray gene data is presented in section 3. The performance of the proposed approach is analysed in section 4. The conclusions are drawn in section 5.

II.R ELATED W ORKS

This section reviews the related literature with respect to the classification of microarray gene data.

In living organisms, all the cells possess nucleus which in turn contains DNA. Every DNA contains the coding and decoding segments. These coding segments are called as genes and the genes describe the structure of proteins. The formation of proteins is achieved in two steps. Initially, the gene is converted to mRNA and mRNA is converted to proteins.

DNA microarray is the advanced technology that renders a provision to have a global insight of the cell and this paves way for measuring the expression level of several thousands of genes concurrently [14]. However, the serious challenge associated with the microarray gene data is the data dimensionality and this issue leads to misclassification [15, 16].

Effective gene selection is the remedy to this issue and the gene selection is the process of detecting and eliminating unwanted features during the training phase. As the knowledge is gained by the system without unwanted features, the knowledge discovery process is more effective [17]. Feature selection is the most important step, as it decides the efficiency of the work.

A perfect feature selection technique reduces the time consumption and computational complexity involved in the gene classification. However it is not an easy task, as it is difficult to determine which are relevant features and irrelevant features. On successful gene selection, the researchers can analyse the data to classify between the normal and the gene expression related to some diseases. Understanding the significance of this issue, several researchers engage themselves in this research. Globally, the feature selection techniques are classified into filter, wrapper and embedded techniques. Filter based approaches depend on the overall features of the training data for feature selection. Wrapper based approaches involve in the optimization process of the predictor for feature selection. Finally, the embedded techniques employ machine learning algorithms to extract the relevant set of features. However, it is reported that the wrapper and embedded approaches show computational burden [18].

The filter based approaches measure the relevancy of the genes by examining the fundamental features of the data, where a single gene or a subgroup of genes is analysed against the class labels. The advantage of filter based approach is its minimal time consumption and the drawback is that the classification results are not up to the mark.

Wrapper approaches failed to grab the attention of the researchers, as the computational overhead is high. Embedded techniques utilize machine learning for feature selection and the mostly used classifier is the Support Vector Machine (SVM). Besides these traditional feature selection techniques, there are several ensemble and hybrid approaches.

Fisher Discriminant Analysis (FDA) classifies the entities by organizing the entities in a single dimensional space and then the classification process is done by fixing threshold. The fisher criterion is utilized to reduce the dimensionality. However, the results produced by FCA are not convincing, when the samples in a class form multiple clusters [19, 20].

Local Preserving Projection (LPP) conserves the local structure of the data, while minimizing the data dimensionality [21]. The local structure is preserved, as the embedding transformation takes the neighbouring pair of data in the embedding space. However, the LPP doesn‘t take the label in formation into account such that it does not work well in the supervised environment.

In [22], an associative classification algorithm is proposed for microarray gene classification. This work clubs both the association rule and classification mining. This work is based on four different phases, which are statistical gene filtering, discretization, class association rules and prediction. The initial phase is meant for distinguishing between the genes and to choose the important genes in the gene expression. The discretization phase is for converting the continuous values to discrete values. The next phase is responsible for producing a group of association rules by utilizing closed frequent itemset. Finally, the classification process is carried out by employing a scoring function. The performance of the proposed approach is tested against Linear Discriminant Analysis (LDA), SVM and decision tree.

In [23], a technique for gene selection and classification is proposed. This work is based on Random Forest Ranking (RFB) and Binary Balck Hole Algorithm (BBHA). The experimental results of this work are

compared with several classification techniques and the proposed work is proved to be better.

Motivated by these works, this article intends to present an information retrieval system for microarray data with two stages, which are data dimensionality reduction and classification. This work produces expected results, as the dimensionality of the features are minimized prior to the classification stage, which is achieved by LFDA.

The dimensionality minimization stage reduces the computational complexity and time consumption. The SVM is trained with the obtained feature reduced microarray data, which improves the learning speed and performance. The following section presents the proposed approach in a detailed fashion.

III.P ROPOSED G ENE R ETRIEVAL S YSTEM WITH

L FDA AND S VM

This section elaborates the phases involved in the proposed approach along with the overall working principle.

A.Overall Idea

The microarray gene expression data is complex to manage, owing to the nature and the high dimensionality of the data [24]. As the microarray gene data contains numerous expressions, it is difficult to manage the data. For this sake, this article proposes a gene retrieval system by utilizing the LFDA and SVM for the biomedical applications.

Fig.1. Overall flow of the proposed approach

To achieve the goal, the research is carried out in two steps, which are feature dimensionality minimization and classification. The initial step intends to reduce the data by means of LFDA. The SVM is then trained by the data with minimized dimensionality. The overall flow of the proposed work is presented in figure 1.

The LFDA is chosen by this work, as it clearly increases the class partitions and conserves the local structure of the data. Thus, LFDA works well both between-class and within-class scenarios. This dimensionality minimized data is utilized for the purpose of training the SVM. SVM is selected as the classifier, as the learning ability is better with few samples. The gene retrieval system yields better results, as the feature reduction and classification phases are employed together. The following subsections explain the phases involved in the proposed gene retrieval technique.

B.Feature Dimensionality Minimization by LFDA

The major goal of dimensionality minimization is to acquire a low dimensional depiction of high dimensional data along with the preservation of the local structure of the original data [25]. A perfect dimensionality minimization paves way for better classification. Recognizing the importance of feature dimensionality minimization, this work utilizes LFDA which inherits the ideas of both FDA and LPP. FDA is the traditional technique for dimensionality reduction but it cannot perform well with multimodal data.

LPP addresses this issue by minimizing the dimension of multimodal data while it preserves the local structure of the data also. Basically, LPP is an unsupervised dimensionality reduction technique and so it does not take the class labels into consideration. This implies that this technique does not suit supervised dimensionality reduction.

LFDA combines the idea of both FDA and LPP [26], such that it works well for multimodal data. LFDA can address the dimensionality minimization issue by solving the generalized eigenvalue issue. Let the microarray gene data possess labelled entities and is represented as *()+, where * + which is the class label related to the entity and is the count of classes. Consider as the count of labelled entities which are in class * + and this can be represented as

∑ (1) The local between-class and local within-class matrices are denoted by and respectively and are represented as follows.

∑( )( ) (2)

∑( )( )

(3)

Here, and are the matrices and represented in the following equation.

{()

(4) { (5)

Where is the affinity value between and , subject to the local scaling heuristic. The local scaling is performed on class by class basis, such that the local structure of the microarray gene data is maintained. In eqns 4 and 5, () is negative. On the other hand, are positive. This means that the entities present in the same class are closer to each other and the entities of different class are farther. This local scaling operation reduces the computational cost as well. The transformation matrix for microarray gene data is formed as

, ( ()-

(6) The LFDA forms the transformation matrix of the genes, which maximizes the local between-class scatter ( ) and minimizes the local within-class scatter ()in the embedding space. The

generalized eigen value issue is solved by

( )( ) (7)

Suppose if the value of , then the ( ) and ( ) are minimized to ( ) and ( ) respectively. Thus, the scatter matrices are formed for the microarray gene data and the dimensionality reduced data is obtained.

C.Gene Classification by SVM

As soon as the feature dimensionality reduction is achieved on microarray gene data, the SVM is trained with that data. The goal of SVM is to separate the entities by considering the hyperplane.

Let the training entities for SVM is denoted as , with which the SVM is trained. The SVM gains knowledge from the training entities and classify the entities into either class A or class B, by means of hyperplane. The hyperplane is formed by solving the below given equation.

()∑() (8) Where is the lagrange multipler that aims to segregate the hyperplane (). is the threshold that determines the classification policy by the hyperplane. With respect to the threshold value, the entities are classified to be either in class A or B and this can be represented as

{

()

(9)

By this way, the SVM classifies between the entities present in the microarray gene data. The next section analyses the performance of the proposed gene retrieval system.

IV.E XPERIMENTAL A NALYSIS

This section evaluates the performance of the proposed approach. Initially, a short description about the dataset is presented, followed by which the performance of the proposed work is analysed.

A. Dataset Descripton

The experiments are carried out on the Leukemia dataset, which is publicly available [27]. This dataset is formed by collecting the microarray data from Affymetrix chip, which contains 6817 genes. The genes are filtered after which the gene count is reduced to 3051. The training data contains 38 cases, which constitutes 27 Acute Lymphoblastic Leukemia (ALL) and 11 Acute Myeloid Leukemia (AML).

B. Results and Discussion

The proposed approach is implemented in Matlab environment (version 8.2). The feature dimensionality of the microarray gene data is minimized by LFDA. The obtained feature dimensionality minimized data is passed for SVM classification. The performance of the proposed technique is analysed by varying the feature reduction techniques and classification techniques in terms of accuracy, sensitivity, specificity and misclassification rates.

Accuracy is the most important measure of any classification technique, and is the ratio of the correctly classified samples to the total number of samples being involved in classification. Sensitivity rate is computed by the ratio of correctly classified samples to be in the correct class to the sum of correctly classified samples in the correct class and the samples which are misclassified to be the members of the wrong class.

Specificity is the rate of samples which are correctly classified as non-native samples of the class to the sum of samples that are wrongly classified as the member of a specific class and the samples that are correctly classified as non-native samples of the class. All the above stated performance measures are represented as follows.

(10)

(11)

(12)

(13) In the above equations,

are the true positive, true negative, false positive and false negative rates. Initially, the feature reduction techniques are varied and the performance of the proposed approach is analysed. The proposed work is compared by implementing LPP [21] and FDA [19] individually against LFDA. The second round of performance analysis deals with the classifiers. This work compares the performance of SVM against k-NN classifier. The experimental results are presented below. The experimental results present the accuracy, sensitivity, specificity and misclassification rates with respect to both ALL and AML.

Fig.2. Comparative analysis w.r.t feature reduction techniques for ALL

Fig.3. Comparative analysis w.r.t feature reduction techniques for AML The above presented graphs show the experimental results of both ALL and AML cases with respect to the feature reduction techniques. The attained results prove that the accuracy, sensitivity and specificity rates of the LFDA are better than the FDA and LPP. LFDA achieves better results, as it works well for both between and within class relationships of entities.

Besides this, the LFDA addresses the dimensionality minimization issue by solving the generalized eigen value effectively. All these factors help the LFDA to perform better than the analogous techniques. FDA does not serve well for multimodal data and this issue is addressed by LPP.

However, LPP cannot perform well for supervised environment and thus, LFDA inherits the merits of both the techniques to render better results. The following graphs (fig 4 and 5) present the accuracy, sensitivity and specificity rates by varying the classifiers for ALL and AML.

Fig.4. Comparative analysis w.r.t classification techniques for ALL

Fig.5. Comparative analysis w.r.t classification techniques for AML The performance of the classification technique is needed to be justified. In order to achieve this, the feature dimensionality reduced microarray gene data is fed to k-NN and SVM for checking the performance. The learning ability of the k-NN classifier is not upto the mark, when compared to SVM.

Besides this, the learning speed of SVM is greater than k-NN. For these reasons, the

choice of SVM is justified

in the above presented experimental results. SVM shows the greatest accuracy rates in both ALL and AML cases. The maximum accuracy rate being shown by SVM is 98.9. The highest sensitivity and specificity rates being shown by SVM are 98.6 and 96.3 respectively. This shows the efficacy of the SVM over k-NN classifier. The following graphs (fig 6 and 7) present the misclassification rates by taking the feature reduction and classification techniques into account.

Fig.6. Misclassification rate analysis w.r.t feature reduction techniques

Fig.7. Misclassification rate analysis w.r.t classification techniques Misclassification rate is indirectly proportional to the accuracy rate. Hence, it is obvious that the misclassification rate of LFDA is comparatively lower than the analogous techniques. LFDA shows the least misclassification rates for both ALL and AML, which are 1.1 and 3.7 respectively. This proves the efficacy of the LFDA over FDA and LPP.

On varying the classifiers, SVM shows the least misclassification rates, when compared to k-NN. From the experimental results, it is evident that the proposed approach serves better with LFDA for feature dimensionality reduction followed by SVM classification. This results in better retrieval rates for both ALL and AML cases.

V.C ONCLUSION

This article presents an effective gene retrieval system, which is based on LFDA and SVM. LFDA is utilized for reducing the feature dimensionality of the microarray gene data. This activity of feature reduction helps in removing the unwanted data and thereby saves memory and reduces computational complexity of the forthcoming phase. The learning speed improves, as the SVM gains knowledge from the feature dimensionality reduced microarray data. The performance of the proposed approach is analysed by varying the feature reduction and classification techniques.

LFDA outperforms the LPP and FDA, as it inherits the benefits of both feature extraction techniques. The SVM proves its efficiency over k-NN in terms of classification accuracy. The experimental results prove the efficacy of the proposed approach in terms of accuracy, sensitivity and specificity rates. In future, we plan to continue the research by analysing several feature reduction and classification techniques for microarray gene data.

R EFERENCES

[1]H. Chen, "Homeland Security Data Mining Using Social

Network Analysis". In: Ortiz-Arroyo D., Larsen H.L., Zeng D.D., Hicks D., Wagner G. (eds) Intelligence and Security Informatics. Lecture Notes in Computer Science, Vol. 5376, pp. 4-4, 2008.

[2]M.M. Babu, "Introduction to microarray data analysis".

Computational genomics: Theory and application. Vol.

17, No.6, pp.225-49, 2004.

[3]V. Filkov, S. Skiena and J. Zhi, ?Analysis Techniques for

Microarray Time Series Data‘, Journal of Computational Biology, Vol. 9, No. 2, pp. 317-330, 2002.

[4]M. Schena, D. Shalon, R.W. Davis, and P. O. Brown,

―Quantitative monitoring of gene expression patterns with

a complementary DNA microarray,‖ Science, Vol. 270,

No. 5235, pp. 467–470, 1995.

[5]J. L.DeRisi, V.R. Iyer, and P.O.Brown, ―Exploring

themetabolic and genetic control of gene expression on a genomic scale,‖ Science, vol. 278, no. 5338, pp. 680–686, 1997.

[6]―The chipping forecast I,‖ Supplement to Nature Genetics,

vol. 21, no. 1, 1999.

[7]―The chipping forecast II,‖ Supplement to Nature Genetics,

vol. 32, 2002.

[8] D.P. Berrar, W. Dubitzky, M. Granzow, editors. A

practical approach to microarray data analysis. Boston, Mass, USA: Kluwer academic publishers; 2003.

[9]I. Slavkov, S. D?eroski, J. Struyf, S. Loskovska.

"Constrained clustering of gene expression profiles". In : Proc. of the Conference on Data Mining and Data Warehouses at the Seventh International Multi-

Conference on Information Society, pp. 212-215, 2005. [10]Y.Y. Leung, Y.S. Hung, "An integrated approach to

feature selection and classification for microarray data with outlier detection, In: Proc. of 8th Annual International Conference on Computational Systems Bioinformatics, Aug. 10-12, Stanford, CA, USA, pp. 1-4, 2009.

[11] A. Osareh and B. Shadgar, "Classification and Diagnostic

Prediction of Cancers using Gene Microarray Data Analysis", Journal of Applied Sciences, Vol. 9, No. 3, 459-468, 2009.

[12]M. Sugiyama, "Dimensionality reduction of multimodal

labeled data by local Fisher discriminant analysis".

Journal of Machine Learning Research,Vol.8, pp.1027–1061, 2007.

[13]J. Shawe-Taylor, and Cristianini N. "Kernel methods for

pattern analysis". Cambridge university press, 2004. [14]G. Piatetsky-Shapiro, P. Tamayo, Microarray data mining:

facing the challenges, ACM SIGKDD Explor. Newsl. Vol.

5, No. 2,pp. 1-5, 2003.

[15]T. Golub, D. Slonim, P. Tamayo, C. Huard, M.

Gaasenbeek, J. Mesirov, H. Coller, M. Loh, J. Downing, M. Caligiuri, et al, "Molecular classification of cancer: class discovery and class prediction by gene expression monitoring", Science, Vol. 286, No.5439, 531–537, 1999.

[16] A. Jain, D. Zongker, Feature selection: evaluation,

application, and small sample performance, IEEE Trans.

Pattern Anal. Mach. Intell. Vol.19, No. 2, pp.153–158, 1997.

[17]I. Guyon, S. Gunn, M. Nikravesh, L.A. Zadeh, editors.

"Feature extraction: foundations and applications".

Springer, Vol. 207, 2008.

[18]V. Bolón-Canedo, N. Sánchez-Maro?o, A. Alonso-

Betanzos, J.M. Benítez, F. Herrera, "A review of microarray datasets and applied feature selection methods", Information Sciences, vol.282, pp.111-135, 2014.

[19]R. A. Fisher. "The use of multiple measurements in

taxonomic problems". Annals of Eugenics, Vol. 7, No. 2, pp. 179–188, 1936.

[20]K. Fukunaga. "Introduction to Statistical Pattern

Recognition". Academic Press, Inc., Boston, second edition, 1990.

[21]X. He and P. Niyogi. "Locality preserving projections". In

S. Thrun, L. Saul, and B. Sch ¨olkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA,2004.

[22]S Alagukumar, R Lawrance, "Classification of microarray

gene expression data using associative classification", International Conference on Computing Technologies and Intelligent Data Engineering, 7-9 Jan, Kovilpatti, 2016. [23]Elnaz Pashaei, Mustafa Ozen, Nizamettin Aydin, "Gene

selection and classification approach for microarray data based on Random Forest Ranking and BBHA", IEEE-EMBS International Conference on Biomedical and Health Informatics, 24-27 Feb, Las Vegas, USA, 2016. [24]S. Dehuri and S. Cho, ―Multiobjective Classification Rule

mining Using Gene Expression Programming‖, In : Proc of the International Conference on convergence and Hybrid Information Technology, 11-13 November, Busan, Vol. 2, pp. 754-760, 2008.

[25]G.E. Hinton and R.R. Salakhutdinov, ―Reducing the

dimensionality of data with neural networks‖, Science, Vol. 313, pp. 504–507, 2006.

[26]M. Sugiyama, M. Krauledat and K.R. Muller. ―Covariate

shift adaptation by im portance weighted cross validation‖.

Journal of Machine Learning Research, Vol. 8, pp. 985–1005, 2007.

[27]https://www.360docs.net/doc/4e10199656.html,/rdiaz/Papers/rfVS/. Authors’ Profiles

Lt. Thomas Scaria is presently working as

Assistant professor of Computer Science at

Department Of computer Science, St.Pius

College, Rajapuram, Kerala, India and an

active Research Scholar at periyar University

Salem. His Area of Interest Includes Data

Mining in Bio Sciences and Bio Informatics. He Received his MCA degree from Periyar University in the Year 2003 and M.Phil in Computer Science from Annamalai University in the Year 2008. He Did his M.Sc Applied Psychology from Annamalai University in the Year 2012. He has to Credit 13 Years of teaching and 6 years of research experience.

Dr. T. Christopher is presently working as

Assistant Professor of Computer Science, at

Post Graduate and Research Department of

Information Technology, Government Arts

College (Autonomous), Coimbatore,

Tamilnadu, India. He has published 40

research papers in International/National Journals; His area of interest includes knowledge mining and Network security. He has to credit 26 years of teaching and research experience.

How to cite this paper:Lt. Thomas Scaria, T. Christopher, "Microarray Gene Retrieval System Based on LFDA and SVM", International Journal of Intelligent Systems and Applications(IJISA), Vol.10, No.1, pp.9-15, 2018. DOI: 10.5815/ijisa.2018.01.02

线性代数知识点总结

线性代数知识点总结第一章行列式 (一)要点 1、二阶、三阶行列式 2、全排列和逆序数，奇偶排列（可以不介绍对换及有关定理），n 阶行列式的定义 3、行列式的性质 4、n 阶行列式ij a D =，元素ij a 的余子式和代数余子式，行列式按行（列）展开定理 5、克莱姆法则（二）基本要求 1、理解n 阶行列式的定义 2、掌握n 阶行列式的性质 3、会用定义判定行列式中项的符号 4、理解和掌握行列式按行（列）展开的计算方法，即 5、会用行列式的性质简化行列式的计算，并掌握几个基本方法：归化为上三角或下三角行列式，各行（列）元素之和等于同一个常数的行列式，利用展开式计算 6、掌握应用克莱姆法则的条件及结论会用克莱姆法则解低阶的线性方程组 7、了解n 个方程n 个未知量的齐次线性方程组有非零解的充要条件第二章矩阵（一）要点 1、矩阵的概念 n m ?矩阵n m ij a A ?=)(是一个矩阵表。当n m =时，称A 为n 阶矩阵，此时由A 的元素按原来排列的形式构成的n 阶行列式，称为矩阵A 的行列式，记为A . 注：矩阵和行列式是两个完全不同的两个概念。 2、几种特殊的矩阵：对角阵；数量阵；单位阵；三角形矩阵；对称矩阵 3、矩阵的运算；矩阵的加减法；数与矩阵的乘法；矩阵的转置；矩阵的乘法（1）矩阵的乘法不满足交换律和消去律，两个非零矩阵相乘可能是零矩阵。如果两矩阵A 与B 相乘，有BA AB =,则称矩阵A 与B 可换。注：矩阵乘积不一定符合交换（2）方阵的幂：对于n 阶矩阵A 及自然数k ，规定I A =0 ,其中I 为单位阵 .

(3) 设多项式函数k k k k a a a a ++++=--λλλλ?1110)( ，A 为方阵，矩阵A 的多项式I a A a A a A a A k k k k ++++=--1110)( ?,其中I 为单位阵。（4）n 阶矩阵A 和B ，则B A AB =. （5）n 阶矩阵A ，则A A n λλ= 4、分块矩阵及其运算 5、逆矩阵：可逆矩阵（若矩阵A 可逆，则其逆矩阵是唯一的）；矩阵A 的伴随矩阵记为*A ，矩阵可逆的充要条件；逆矩阵的性质。 6、矩阵的初等变换：初等变换与初等矩阵；初等变换和初等矩阵的关系；矩阵在等价意义下的标准形；矩阵A 可逆的又一充分必要条件：A 可以表示成一些初等矩阵的乘积；用初等变换求逆矩阵。 7、矩阵的秩：矩阵的k 阶子式；矩阵秩的概念；用初等变换求矩阵的秩 8、矩阵的等价（二）要求 1、理解矩阵的概念；矩阵的元素；矩阵的相等；矩阵的记号等 2、了解几种特殊的矩阵及其性质 3、掌握矩阵的乘法；数与矩阵的乘法；矩阵的加减法；矩阵的转置等运算及性质 4、理解和掌握逆矩阵的概念；矩阵可逆的充分条件；伴随矩阵和逆矩阵的关系；当A 可逆时,会用伴随矩阵求逆矩阵 5、了解分块矩阵及其运算的方法（1）在对矩阵的分法符合分块矩阵运算规则的条件下，其分块矩阵的运算在形式上与不分块矩阵的运算是一致的。（2）特殊分法的分块矩阵的乘法，例如n m A ?，l n B ?，将矩阵B 分块为 ) (21l b b b B =，其中j b （l j 2, ,1=）是矩阵B 的第j 列，则又如将n 阶矩阵P 分块为) (21n p p p P =，其中j p （n j 2, ,1=）是矩阵P 的第j 列. （3）设对角分块矩阵

基因微阵列数据中的聚类技术研究

基因微阵列数据中的聚类技术研究第l6卷第2期 2006年2月计算机技术与发展 C()NIPUTERTE({N0l)【YANI)I)EVE1』)ljMEN7, V(,l16NO.2 Feb.2006 基因微阵列数据中的聚类技术研究马煜,陈莉,方鹤鹤 (西北大学计算机科学系,陕西西安710069) 摘要:微阵列技术是后基因时代功能基因组研究的主要工具.由于采用了高效的并行杂交技术,每次实验可以得到大量丰富的数据,因此其结果分析成为一项很有挑战性而且具有重要意义的工作.聚类分析是微阵列数据分析中使用最为广泛的一类方法.微阵列实验得到的大量数据通过聚类分析,可以得到很多有用的信息,其成功应用已广泛涉及到基因功能研究和生物医学研究中的各个领域.文中介绍了基因微阵列数据的聚类分析方法及其重要应用. 关键词:微阵列;基因表达谱;聚类分析中图分类号:TP391文献标识码:A文章编号:1005—3751(2006)02一Ol17—03 ClusteringAnalysisofMicroarrayGeneExpressionData MAYu,CHENLi,FANGHe-he (DepartmentofComputerScience,NorthwestUniversity,Xi'an710069,China) Abstract:Microarraytechnologyisthechieftoolforfunctionalgenomeresearch.Asadoptin gthehighefficientandparallelDNAhy? bridizaitontechnology,canachieveadundantdatafromeachexperiment,sothedataanalysis ofmicroarraysdatabecomesamorechalleng?

2014年染色体微阵列分析技术在产前诊断中的应用专家共识

染色体微阵列分析技术在产前诊断中的应用专家共识染色体微阵列分析技术在产前诊断中的应用协作组目前，G 显带染色体核型分析技术仍然是细胞遗传学产前诊断的“金标准”，但该技术具有细胞培养耗时长、分辨率低以及耗费人力的局限性。包括荧光原位杂交(fluorescence in situ hybridization，FISH) 技术在内的快速产前诊断技术的引入虽然具有快速及特异性高的优点，但还不能做到对染色体组的全局分析。染色体微阵列分析(chromosomal mlcroarray analysis，CMA) 技术又被称为“分子核型分析”，能够在全基因组水平进行扫描，可检测染色体不平衡的拷贝数变异(copy number variant,CNV)，尤其是对于检测染色体组微小缺失、重复等不平衡性重排具有突出优势。根据芯片设计与检测原理的不同，CMA 技术可分为两大类：基于微阵列的比较基因组杂交(array- based comparative genomic hybridization ，aCGH) 技术和单核苷酸多态性微阵列(single nucleotide polymorphism array，SNP array) 技术。前者需要将待测样本DNA 与正常对照样本DNA 分别标记、进行竞争性杂交后获得定量的拷贝数检测结果，而后者则只需将待测样本DNA 与一整套正常基因组对照资料进行对比即可获得诊断结果。通过aCGH 技术能够很好地检出CNV，而SNP array 除了能够检出CNV 外，还能够检测出大多数的单亲二倍体(uniparental disomy，UPD) 和三倍体，并且可以检测到一定水平的嵌合体。而设计涵盖CNV+SNP 检测探针的芯片，可同时具有CNV 和SNP 芯片的特点。 2010 年，国际细胞基因组芯片标准协作组(lntemational Standards for Cytogenomic Arrays Consortium，ISCA Consortium) 在研究了21698 例具有异常临床表征，包括智力低下、发育迟缓、多种体征畸形以及自闭症的先证者的基础上，发现aCGH 技术对致病性CNV 的检出率为 12.2%，比传统 G 显带核型分析技术的检出率提高了10%。因此，ISCA Consortium 推荐将aCGH 作为对原因不明的发育迟缓、智力低下、多种体征畸形以及自闭症患者的首选临床一线检测方法。近年来，CMA 技术在产前诊断领域中的应用越来越广泛，很多研究也证明了该技术具有传统胎儿染色体核型分析方法所无法比拟的优势。 CMA 对非整倍体和不平衡性染色体重排的检出效率与传统核型分析方法相同，并具有更高的分辨率和敏感性，且CMA 还能发现额外的、有临床意义的基因组CNV，尤其是对于产前超声检查发现胎儿结构异常者，CMA 是目前最有效的遗传学诊断方法。基于上述研究结果，不少学者认为，CMA 技术有可能取代传统的核型分析方法，成为产前遗传学诊断的一线技术。但到目前为止，尚缺乏基于人群的大规模应用研究结果。目前，在国内CMA 只有少数具有技术条件和资质的医疗机构进行了小规模的探索，大致有以下几类临床应用情况： 1．儿童复杂、罕见遗传病，如：智力障碍、生长发育迟缓、多发畸形、孤独症样临床表现，排除染色体病、代谢病和脆性X 综合征之后的全基因组CNV 检测。 2．对自然流产、胎死宫内、新生儿死亡等妊娠产物(product of concept，POC) 的遗传学检测。 3．对产前诊断中核型分析结果异常，但无法确认异常片段的来源和性质者进行DNA 水平的更精细分析。 4．对产前超声检查异常而染色体核型分析结果正常的胎儿进一步行遗传学检测。在产前诊断领域中，CMA 的应用主要在后两种情况中。虽然目前应用研究的范围不广，积累的例数也不多，但却显现出一些问题的存在，主要表现在： 1．在部分开展应用的医疗机构，对CMA 检测前和检测后的产前咨询能力存在不足。

高中数学必修和选修知识点归纳总结

高中数学必修+选修知识点归纳引言 1.课程内容：必修课程由5个模块组成：必修1：集合、函数概念与基本初等函数（指、对、幂函数）必修2：立体几何初步、平面解析几何初步。必修3：算法初步、统计、概率。必修4：基本初等函数（三角函数）、平面向量、三角恒等变换。必修5：解三角形、数列、不等式。以上是每一个高中学生所必须学习的。上述内容覆盖了高中阶段传统的数学基础知识和基本技能的主要部分，其中包括集合、函数、数列、不等式、解三角形、立体几何初步、平面解析几何初步等。不同的是在保证打好基础的同时，进一步强调了这些知识的发生、发展过程和实际应用，而不在技巧与难度上做过高的要求。此外，基础内容还增加了向量、算法、概率、统计等内容。选修课程有4个系列：系列1：由2个模块组成。选修1—1：常用逻辑用语、圆锥曲线与方程、导数及其应用。选修1—2：统计案例、推理与证明、数系的扩充与复数、框图系列2：由3个模块组成。选修2—1：常用逻辑用语、圆锥曲线与方程、空间向量与立体几何。选修2—2：导数及其应用，推理与证明、数系的扩充与复数选修2—3：计数原理、随机变量及其分布列，统计案例。系列3：由6个专题组成。选修3—1：数学史选讲。选修3—2：信息安全与密码。选修3—3：球面上的几何。选修3—4：对称与群。选修3—5：欧拉公式与闭曲面分类。选修3—6：三等分角与数域扩充。系列4：由10个专题组成。选修4—1：几何证明选讲。选修4—2：矩阵与变换。选修4—3：数列与差分。选修4—4：坐标系与参数方程。选修4—5：不等式选讲。选修4—6：初等数论初步。选修4—7：优选法与试验设计初步。选修4—8：统筹法与图论初步。选修4—9：风险与决策。选修4—10：开关电路与布尔代数。 2．重难点及考点：重点：函数，数列，三角函数，平面向量，圆锥曲线，立体几何，导数难点：函数、圆锥曲线高考相关考点： ⑴集合与简易逻辑:集合的概念与运算、简易逻辑、充要条件 ⑵函数：映射与函数、函数解析式与定义域、值域与最值、反函数、三大性质、函数图象、指数与指数函数、对数与对数函数、函数的应用 ⑶数列：数列的有关概念、等差数列、等比数列、数列求和、数列的应用

基因芯片数据功能分析

生物信息学在基因芯片数据功能分析中的应用 2009-4-29 随着人类基因组计划(Human Genome Project)即全部核苷酸测序的即将完成，人类基因组研究的重心逐渐进入后基因组时代(Postgenome Era)，向基因的功能及基因的多样性倾斜。通过对个体在不同生长发育阶段或不同生理状态下大量基因表达的平行分析，研究相应基因在生物体内的功能，阐明不同层次多基因协同作用的机理，进而在人类重大疾病如癌症、心血管疾病的发病机理、诊断治疗、药物开发等方面的研究发挥巨大的作用。它将大大推动人类结构基因组及功能基因组的各项基因组研究计划。生物信息学在基因组学中发挥着重大的作用, 而另一项崭新的技术——基因芯片已经成为大规模探索和提取生物分子信息的强有力手段，将在后基因组研究中发挥突出的作用。基因芯片与生物信息学是相辅相成的，基因芯片技术本身是为了解决如何快速获得庞大遗传信息而发展起来的，可以为生物信息学研究提供必需的数据库，同时基因芯片的数据分析也极大地依赖于生物信息学，因此两者的结合给分子生物学研究提供了一条快捷通道。本文介绍了几种常用的基因功能分析方法和工具：一、GO基因本体论分类法最先出现的芯片数据基因功能分析法是GO分类法。Gene Ontology（GO，即基因本体论）数据库是一个较大的公开的生物分类学网络资源的一部分，它包含38675 个Entrez Gene注释基因中的17348个，并把它们的功能分为三类：分子功能，生物学过程和细胞组分。在每一个分类中，都提供一个描述功能信息的分级结构。这样，GO中每一个分类术语都以一种被称为定向非循环图表（DAGs）的结构组织起来。研究者可以通过GO分类号和各种GO数据库相关分析工具将分类与具体基因联系起来，从而对这个基因的功能进行描述。在芯片的数据分析中，研究者可以找出哪些变化基因属于一个共同的GO功能分支，并用统计学方法检定结果是否具有统计学意义，从而得出变化基因主要参与了哪些生物功能。 EASE（Expressing Analysis Systematic Explorer）是比较早的用于芯片功能分析的网络平台。由美国国立卫生研究院（NIH）的研究人员开发。研究者可以用多种不同的格式将芯片中得到的基因导入EASE 进行分析，EASE会找出这一系列的基因都存在于哪些GO分类中。其最主要特点是提供了一些统计学选项以判断得到的GO分类是否符合统计学标准。EASE 能进行的统计学检验主要包括Fisher 精确概率检验，或是对Fisher精确概率检验进行了修饰的EASE 得分（EASE score）。由于进行统计学检验的GO分类的数量很多，所以EASE采取了一系列方法对“多重检验”的结果进行校正。这些方法包括弗朗尼校正法（Bonferroni），本杰明假阳性率法（Benjamini falsediscovery rate）和靴带法（bootstraping）。同年出现的基于GO分类的芯片基因功能分析平台还有底特律韦恩大学开发的Onto-Express。2002年，挪威大学和乌普萨拉大学联合推出的Rosetta 系统将GO分类与基因表达数据相联系，引入了“最小决定法则”（minimal decision rules）的概念。它的基本思想是在对多张芯片结果进行聚类分析之后，与表达模式

基于微阵列的比较基因组分析

微阵列芯片（Microarray）以高密度阵列为特征。其基础研究始于20世纪80年代末，本质上是一种生物技术，主要是在生物遗传学领域发展起来的。微阵列分为cDNA微阵列和寡聚核苷酸微阵列.微阵列上"印"有大量已知部分序列的DNA探针,微阵列技术就是利用分子杂交原理,使同时被比较的标本(用同位素或荧光素标记)与微阵列杂交,通过检测杂交信号强度及数据处理,把他们转化成不同标本中特异基因的丰度,从而全面比较不同标本的基因表达水平的差异.微阵列技术是一种探索基因组功能的有力手段. 其发展契机主要来自于现代遗传学的一些重要发现，并直接收益于该领域的某些重要研究成果，即在载体上固定寡核苷酸的基础上以杂交法测序的技术。因此发展早期，微阵列芯片有时被通俗的称为“生物芯片（Biochip）”，目前媒体和科普读物中仍然常用该名称。微阵列芯片经过近十年的主要发展期，国内外学术界渐渐采用名称Microarray（微阵列芯片），而Biochip（生物芯片）由于这名称容易混淆微阵列芯片和微流控芯片，渐渐该领域用的越来越少了。比较基因组杂交技术比较基因组杂交（comparative genomic hybridization,CGH）是自1992年后发展起来的一种分子细胞遗传学技术，它通过单一的一次杂交可对某一肿瘤整个基因组的染色体拷贝数量的变化进行检查。其基本原理是用不同的荧光染料通过缺口平移法分别标记肿瘤组织和正常细胞或组织的DNA制成探针，并与正常人的间期染色体进行共杂交，以在染色体上显示的肿瘤与正常对照的荧光强度的不同来反映整个肿瘤基因组DNA表达状况的变化，再借助于图像分析技术可对染色体拷贝数量的变化进行定量研究。 CGH技术的优点：1.实验所需DNA样本量较少，做单一的一次杂交即可检查肿瘤整个基因组的染色体拷贝数量的变化。2.此法不仅适用于外周血、培养细胞和新鲜组织样本的研究，还可用于对存档组织的研究，也可用于因DNA量过少而经PCR扩增的样本的研究。CGH技术的局限性：CGH技术所能检测到的最小的DNA扩增或丢失是在3-5Mb,故对于低水平的DNA扩增和小片段的丢失会漏检。此外在相差染色体的拷贝数量无变化时，CGH技术不能检测出平等染色体的易位。

阵列集成微型LED芯片及其制作方法与设计方案

本技术公开了一种阵列集成微型LED芯片及其制作方法，所述芯片包括基板、导电连接层、热固性缓冲层和n个发光结构，所述导电连接层包括第一金属连接层和第二金属连接层，所述第一金属连接层将同一列的第一发光微结构形成导电连接，所述第二金属连接层将同一行的相邻两个第二发光微结构形成导电连接。本技术通过全共阴加双共阳的方式形成微型LED芯片集成式阵列，提高芯片的转移效率和转移良率。权利要求书 1.一种阵列集成微型LED芯片，其特征在于，包括基板、导电连接层、热固性缓冲层和n个发光结构，所述n个发光结构分成x行和y列，n＞3，x＞1，y＞1；每个发光结构均包括第一发光微结构、第二发光微结构和第一隔离槽，所述第一隔离槽设置在第一发光微结构和第二发光微结构之间；在同一行中，相邻两个发光结构之间设有第二隔离槽；所述导电连接层包括第一金属连接层和第二金属连接层，所述第一金属连接层将同一列的第一发光微结构形成导电连接，所述第二金属连接层将同一行的相邻两个第二发光微结构形成导电连接；所述热固性缓冲层填充在基板和发光结构之间，以将发光结构固定在基板上，所述第一金属连接层和第二金属连接层贯穿所述热固性缓冲层和基板，并延伸到基板外。

2.如权利要求1所述的阵列集成微型LED芯片，其特征在于，所述第一发光微结构和第二发光微结构均包括依次设置的第一半导体层、有源层、第二半导体层和反射层，所述第一隔离槽和第二隔离槽均从反射层刻蚀至第一半导体层。 3.如权利要求2所述的阵列集成微型LED芯片，其特征在于，所述第一金属连接层设置在第一发光微结构和第一隔离槽上，以将同一列的第一发光微结构形成导电连接；所述第二金属连接层设置在第二发光微结构和第二隔离槽上，以将同一行的相邻两个第二发光微结构形成导电连接。 4.如权利要求3所述的阵列集成微型LED芯片，其特征在于，所述第二发光微结构还包括钝化层，所述钝化层设置在第二发光微结构的侧壁和第二隔离槽的表面，所述第二金属连接层设置在钝化层上并延伸到第二发光微结构的反射层上将所述钝化层覆盖。 5.如权利要求1所述的阵列集成微型LED芯片，其特征在于，所述热固性缓冲层由热固性材料和有机硅胶制成，或者由热固性材料和环氧树脂制成；所述热固性材料包括酚醛塑料、环氧塑料、氨基塑料、不饱和聚酯和醇酸塑料中的一种或几种。 6.如权利要求1所述的阵列集成微型LED芯片，其特征在于，所述发光结构的出光面设有量子点层和光学隔绝层，所述光学隔绝层设置在两个发光结构之间，以吸收或反射发光结构的侧向光线。 7.如权利要求6所述的阵列集成微型LED芯片，其特征在于，所述光学隔绝层由添加了吸光材料或反光材料的有机硅胶或环氧树脂制成。 8.一种阵列集成微型LED芯片的制作方法，其特征在于，包括：在衬底上形成外延层和反射层，所述外延层包括依次设于衬底上的第一半导体层、有源层和第二半导体层，所述反射层设置在第二半导体层上；

基因芯片数据处理流程与分析介绍

基因芯片数据处理流程与分析介绍关键词：基因芯片数据处理当人类基因体定序计划的重要里程碑完成之后，生命科学正式迈入了一个后基因体时代，基因芯片(microarray) 的出现让研究人员得以宏观的视野来探讨分子机转。不过分析是相当复杂的学问，正因为基因芯片成千上万的信息使得分析数据量庞大，更需要应用到生物统计与生物信息相关软件的协助。要取得一完整的数据结果，除了前端的实验设计与操作的无暇外，如何以精确的分析取得可信数据，运筹帷幄于方寸之间，更是画龙点睛的关键。基因芯片的应用基因芯片可以同时针对生物体内数以千计的基因进行表现量分析，对于科学研究者而言，不论是细胞的生命周期、生化调控路径、蛋白质交互作用关系等等研究，或是药物研发中对于药物作用目标基因的筛选，到临床的疾病诊断预测，都为基因芯片可以发挥功用的范畴。基因表现图谱抓取了时间点当下所有的动态基因表现情形，将所有的探针所代表的基因与荧光强度转换成基本数据(raw data) 后，仿如尚未解密前的达文西密码，隐藏的奥秘由丝丝的线索串联绵延，有待专家抽丝剥茧，如剥洋葱般从外而内层层解析出数千数万数据下的隐晦含义。要获得有意义的分析结果，恐怕不能如泼墨画般洒脱随兴所致。从raw data 取得后，需要一连贯的分析流程(图一)，经过许多统计方法，才能条清理明的将raw data 整理出一初步的分析数据，当处理到取得实验组除以对照组的对数值后(log2 ratio)，大约完成初步的统计工作，可进展到下一步的进阶分析阶段。

图一、整体分析流程。基本上raw data 取得后，将经过从最上到下的一连串分析流程。(1) Rosetta 软件会透过统计的model，给予不同的权重来评估数据的可信度，譬如一些实验操作的误差或是样品制备与处理上的瑕疵等，可已经过Rosetta error model 的修正而提高数据的可信值；(2) 移除重复出现的探针数据；(3) 移除flagged 数据，并以中位数对荧光强度的数据进行标准化(Normalized) 的校正；(4) Pearson correlation coefficient (得到R 值) 目的在比较技术性重复下的相似性，R 值越高表示两芯片结果越近似。当R 值超过0.975，我们才将此次的实验结果视为可信，才继续后面的分析流程；(5) 将技术性重复芯片间的数据进行平均，取得一平均之后的数据；(6) 将实验组除以对照组的荧光表现强度差异数据，取对数值(log2 ratio) 进行计算。找寻差异表现基因实验组与对照组比较后的数据，最重要的就是要找出显著的差异表现基因，因为这些正是条件改变后而受到调控的目标基因，透过差异表现基因的加以分析，背后所隐藏的生物意义才能如拨云见日般的被发掘出来。一般根据以下两种条件来筛选出差异表现基因：(i) 荧光表现强度差异达2 倍变化(fold change 增加2 倍或减少2倍) 的基因。而我们通常会取对数(log2) 来做fold change 数值的转换，所以看的是log2 ≧1 或≦-1 的差异表现基因；(ii) 显著值低于0.05 (p 值< 0.05) 的基因。当这两种条件都符合的情况下所交集出来的基因群，才是显著性高且稳定的差异表现基因。

矩阵理论知识点整理资料

三、矩阵的若方标准型及分解 λ-矩阵及其标准型定理1 λ-矩阵()λ A可逆的充分必要条件是行列式()λ A是非零常数引理2 λ-矩阵()λ A=() () n m ij? λ a的左上角元素()λ 11 a不为0，并且()λ A中至少有一个元素不能被它整除，那么一定可以找到一个与()λ A等价的()() () n m ij? =λ λb B使得()0 b 11 ≠ λ且 ()λ 11 b的次数小于()λ 11 a的次数。引理3 任何非零的λ-矩阵()λ A=() () n m ij? λ a等价于对角阵 () () () ? ? ? ? ? ? ? ? ? ? ? ? ... ..... d 2 1 λ λ λ r d d ()()()λ λ λ r 2 1 d ,.... d, d是首项系数为1的多项式，且 ()()1 ...... 3,2,,1 , / d 1 - = + r i d i i λ λ 引理4 等价的λ-矩阵有相同的秩和相同的各阶行列式因子推论5 λ-矩阵的施密斯标准型是唯一的由施密斯标准型可以得到行列式因子推论6 两个λ-矩阵等价，当且仅当它们有相同的行列式因子，或者相同的不变因子推论7 λ-矩阵()λ A可逆，当且仅当它可以表示为初等矩阵的乘积推论8 两个()()λ λ λB A m与矩阵的- ?n等价当且仅当存在一个m阶的可逆λ-矩阵()λ P和一个n阶的λ-矩阵()λ Q使得()()()()λ λ λ λQ A P = B 推论9 两个λ-矩阵等价，当且仅当它们有相同的初等因子和相同的秩

定理10 设λ-矩阵()λA 等价于对角型λ-矩阵()() ()()?????? ?? ? ???????? ?=λλλλn h h . . . ..21h B ，若将()λB 的次数大于1的对角线元素分解为不同的一次因式的方幂的乘积，则所有这些一次因式的方幂（相同的按照重复的次数计算）就是()λA 的全部初等因子。行列式因子不变因子初等因子初等因子被不变因子唯一确定但，只要λ-矩阵()λA 化为对角阵，再将次数大于等于1的对角线元素分解为不同的一次方幂的乘积，则所有这些一次因式的方幂（相同的必须重复计算）就为()λA 的全部初等因子，即不必事先知道不变因子，可以直接求得初等因子。矩阵的若当标准型定理1 两个n ?m 阶数字矩阵A 和B 相似，当且仅当它们的特征矩阵B -E A -E λλ与等价 N 阶数字矩阵的特征矩阵A -E λ的秩一定是n 因此它的不变因子有n 个，且乘积是A 的特征多项式推论3 两个同阶矩阵相似，当且仅当它们有相同的行列式因子，或相同的不变因子，或相同的初等因子。定理4 每个n 阶复矩阵A 都与一个若当标准型矩阵相似，这个若当标准型矩阵除去其中若当块的排列次序外是被矩阵A 唯一确定的。求解若当标准型及可逆矩阵Ｐ：根据数字矩阵写出特征矩阵，化为对角阵后，得出初等因子，根据初等因子，写出若当标准型Ｊ，设Ｐ（Ｘ１Ｘ２Ｘ３），然后根据 J X X X X X X A PJ AP J AP P 321321-1），，（），，（，即得到===得到 P （X1X2X3）方阵矩阵的最小多项式定理1 矩阵A 的最小多项式整除A 的任何零化多项式，且最小多项式唯一。 N 阶数字矩阵可以相似对角化，当且仅当最小多项式无重根。定理2 矩阵A 的最小多项式的根一定是A 的特征值，反之，矩阵Ａ的特征值一定是最小多项式的根。求最小多项式：根据数字矩阵写出特征多项式()A E f -=λλ，根据特征多项式得到最小多

基于 DNA 微阵列的基因表达数据管理和分析

基于DNA微阵列的基因表达数据管理和分析 029129 谢建明 2002年10月摘要：DNA微阵列是生命科学研究的重要工具，在疾病诊断、药物开发等领域得到了广泛应用。在应用过程中，产生了大量的数据，这些数据的存储、分发和数据挖掘成为DNA微阵列能被推广应用的关键技术。本论文简单介绍了这两方面的研究现状。关键词：DNA微阵列数据挖掘数据仓库标准基因表达分析一、引言 DNA微阵列（DNA microarray）,也叫基因芯片，是近几年发展起来的一种能快速、高效检测DNA片段序列、基因型及其多态性或基因表达水平的新技术。它将几十个到上百万个不等的称之为探针的核苷酸序列固定在微小的（约1cm2）玻璃或硅片等固体基片或膜上，该固定有探阵的基片就称之为DNA微阵列。它利用核苷酸分子在形成双链时遵循碱基互补原则，可以检测出样本中与探阵阵列中互补的核苷酸片段，从而得到样本中关于基因结构和表达的信息。它的技术来源追溯到一个多世纪之前，Ed Southern发现被标记的核酸分子能够与另一被固化的核酸分子配对杂交。因此，Southern blot可被看做是最早的基因芯片。在八十年代，Bains W.等人就将短的DNA片断固定到支持物上，借助杂交方式进行序列测定。1995年，斯坦福大学开发出第一片cDNA芯片并用于生命科学研究，1998年美国Affymetrix公司将第一片带有13.5万个基因探阵的寡聚核苷酸芯片推向市场，标志着DNA微阵列的产业化，从此基因芯片或DNA微阵列的研究和应用得到了广泛的重视，可以说在生命科学研究界和产业界掀起了基因芯片热潮，1999年Nature出专刊介绍这门基因芯片及其应用。基因芯片可用于DNA序列的再测序、基因SNP或多态性检测和基因表达分析。由于基因芯片技术是一种高通量检测技术，它可是并行的同时检测成百上千，甚至成千上万个基因的活动情况或DNA片段，改变了传统的每次只能检测一个基因的情况，因此能大大提高检测效率，降低检测成本，并保证了检测质量。基因芯片技术可广泛应用于疾病诊断和治疗、药物筛选、农作物的优育优选、司法鉴定、食品卫生监督、环境检测、国防、航天等许多领域。它将为人类认识生命的起源、遗传、发育与进化、为人类疾病的诊断、治疗和防治开辟全新的途径，为生物大分子的全新设计和药物开发中先导化合物的快速筛选和药物基因组学研究提供技术支撑平台。通过基因表达谱的研究可以进行进一步的理论研究或应用研究。 1、理论研究。根据基因组基因表达谱可以进一步分析共表达基因是否存在共同的顺式调控元件，发现新的调控元件。此外，可以研究基因的调控规律，构建调控网络。 2、应用研究包括疾病诊断和药物开发。根据不同疾病状态下的差异表达谱的研究可以确定疾病的类型和进展。研究药物作用后基因表达谱的改变可以确定药物的毒性、预后和疗效，从而指导药物开发和临床合理用药。在基于DNA微阵列的基因表达分析研究中，数据的分析和管理是一个关键性的问题，它直接影响了实验结果的准确型和实验的可靠性。

矩阵秩重要知识点总结_考研必看

一．矩阵等价行等价：矩阵A 经若干次初等行变换变为矩阵B 列等价：矩阵A 经若干次初等列变换变为矩阵B 矩阵等价:矩阵A 经若干次初等行变换可以变为矩阵B ，矩阵B 经若干次初等行变换可以变成矩阵A ，则成矩阵A 和B 等价矩阵等价的充要条件 1. 存在可逆矩阵P 和Q,PAQ=B 2. R(A)=R(B) 二．向量的线性表示 Case1：向量b r 能由向量组A 线性表示: 充要条件: 1.线性方程组A x r =b 有解 (A)=R(A,b) Case2：向量组B 能由向量组A 线性表示充要条件： R(A)=R(A,B) 推论 ∵R(A)=R(A,B),R(B) ≤R(A,B) ∴R(B) ≤R(A) Case3：向量组A 能由向量组B 线性表示充要条件： R(B)=R(B,A) 推论 ∵R(B)=R(A,B),R(A) ≤R(A,B) ∴R(A) ≤R(B) Case4：向量组A 和B 能相互表示，即向量组A 和向量组B 等价充要条件: R(A)=R(B)=R(A,B)=R(B,A) Case5：n 维单位坐标向量组能由矩阵A 的列向量组线性表示充要条件是: R(A)=R(A,E)

n=R(E)<=R(A),又R(A)>=n ，所以R(A)=n=R(A,E) 三．线性方程组的解 1. 非齐次线性方程组（1） R(A)=R(A,B),方程有解. （2） R(A)=R(A,B)=n ，解唯一. （3） R(A)=R(A,B)

线性代数知识点总结

线性代数知识点总结第一章行列式（一）要点 1、二阶、三阶行列式 2、全排列和逆序数，奇偶排列（可以不介绍对换及有关定理），n 阶行列式的定义 3、行列式的性质 4、 n 阶行列式 ^a i j ，元素a j 的余子式和代数余子式，行列式按行（列）展开定理 5、克莱姆法则（二）基本要求 1 、理解n 阶行列式的定义 2、掌握n 阶行列式的性质 3 、会用定义判定行列式中项的符号 4、理解和掌握行列式按行（列）展开的计算方法，即 a 1i A Ij ' a 2i A 2 j ' a ni A nj ^ 5、会用行列式的性质简化行列式的计算，并掌握几个基本方法：归化为上三角或下三角行列式，各行（列）元素之和等于同一个常数的行列式，利用展开式计算 6、掌握应用克莱姆法则的条件及结论会用克莱姆法则解低阶的线性方程组 7、了解n 个方程n 个未知量的齐次线性方程组有非零解的充要条件第二章矩阵（一）要点 1、矩阵的概念 m n 矩阵A =（a j ）mn 是一个矩阵表。当 m =n 时，称A 为n 阶矩阵，此时由 A 的元素按原来排列的形式构成的 n 阶行列式，称为矩阵 A 的行列式，记为 A . 注：矩阵和行列式是两个完全不同的两个概念。 2、几种特殊的矩阵：对角阵；数量阵；单位阵；三角形矩阵；对称矩阵 a i 1A j 1 ■ a i2A j 2 ? a in A jn = 〔 D '

3、矩阵的运算；矩阵的加减法；数与矩阵的乘法；矩阵的转置；矩阵的乘法 (1矩阵的乘法不满足交换律和消去律，两个非零矩阵相乘可能是零矩阵。如果两矩阵A与B相乘，有AB = BA ,则称矩阵A与B可换。注：矩阵乘积不一定符合交换 (2)方阵的幕：对于n阶矩阵A及自然数k， A k=A A A , 1 k个规定A° = I ,其中I为单位阵. (3) 设多项式函数(J^a^ k?a1?k^l Z-心律??a k，A为方阵，矩阵A的多项式(A) = a0A k?a1A k' …-?-a k jA ■ a k I ,其中I 为单位阵。 (4)n阶矩阵A和B ,贝U AB=IAB . (5)n 阶矩阵A ,则∣∕Λ =λn A 4、分块矩阵及其运算 5、逆矩阵：可逆矩阵(若矩阵A可逆，则其逆矩阵是唯一的)；矩阵A的伴随矩阵记 * 为A , AA* = A*A = AE 矩阵可逆的充要条件；逆矩阵的性质。 6、矩阵的初等变换：初等变换与初等矩阵；初等变换和初等矩阵的关系；矩阵在等价意义下的标准形；矩阵A可逆的又一充分必要条件：A可以表示成一些初等矩阵的乘积；用初等变换求逆矩阵。 7、矩阵的秩：矩阵的k阶子式；矩阵秩的概念；用初等变换求矩阵的秩 8、矩阵的等价 (二)要求 1、理解矩阵的概念；矩阵的元素；矩阵的相等；矩阵的记号等 2、了解几种特殊的矩阵及其性质 3、掌握矩阵的乘法；数与矩阵的乘法；矩阵的加减法；矩阵的转置等运算及性质 4、理解和掌握逆矩阵的概念；矩阵可逆的充分条件；伴随矩阵和逆矩阵的关系；当A 可逆时，会用伴随矩阵求逆矩阵 5、了解分块矩阵及其运算的方法 (1)在对矩阵的分法符合分块矩阵运算规则的条件下，其分块矩阵的运算在形式上与不分块矩阵的运算是一致的。 (2)特殊分法的分块矩阵的乘法，例如A m n, B nl,将矩

个体化医学检测微阵列基因芯片技术规范

微阵列基因芯片是基于DNA分子杂交技术原理研制，通过探针结合碱基互补序列的单链核酸，从而确定其相应序列来识别基因或其产物。能够同时快速检测多个基因及其多个位点，在多态性分析、突变分析、基因表达谱测定及杂交测序等多领域具有广泛应用价值。临床诊断技术使用的微阵列基因芯片，可快速鉴定病原体、检测遗传突变及基因表达，更早更方便的检测肿瘤基因标志，检测药物反应和代谢相关基因多态性来指导临床个体化治疗。本规范旨在对个体化医学检测中采用微阵列基因芯片检测核酸序列以及基因表达进行一般性技术指导，不包括行政审批要求。本规范由全国生物芯片标准化技术委员会（SAC/TC 421）提出。本规范起草单位：全国生物芯片标准化技术委员会、清华大学医学院、生物芯片北京国家工程研究中心、北京博奥医学检验所。本规范起草人：项光新、李元源、王辉、邓涛、孙义民、张治位、张川、邢婉丽、程京。

1.适用范围 (1) 2.声明/警告 (1) 3.术语和定义 (1) 4.样本处理 (2) 4.1样本类型 (2) 4.2样本采集、运输与保存 (3) 4.3样本质量保证 (3) 4.4样本信息保存 (3) 5.检测各步骤分述 (4) 5.1核酸分离 (4) 5.2核酸定量（如适用） (4) 5.3核酸扩增和标记 (4) 5.4芯片杂交 (5) 5.5信号采集和数据分析 (5) 6.结果报告 (5) 7.质量控制 (5) 8.注意事项 (6) 9.参考文献 (6)

1.适用范围本规范适用于医疗机构开展微阵列基因芯片个体化医学检测服务。检测服务需遵循国家卫生主管部门或各专业协会发布的疾病诊疗指南或国家卫生计生委医政医管局个体化医学检测技术专家委员会发布的个体化医学检测指南。 2.声明/警告本规范所称微阵列基因芯片诊断技术是指从医疗机构获得的临床样本中，提取核酸（DNA或RNA），进行必要的扩增和标记，标记后的靶标与基因芯片进行分子杂交，通过基因芯片扫描仪器获得基因芯片杂交的图像与数据，经计算机程序分析，并给出检测报告的全过程。 3.术语和定义（1）聚合酶链反应polymerase chain reaction（PCR）聚合酶链反应或多聚酶链反应是一种对特定的DNA或RNA片段在体外进行快速扩增的方法。（2）杂交hybridization 具有一定同源序列的两条核酸单链（DNA或RNA）可以通过氢键的方式，按碱基互补配对原则相结合。（3）突变mutation 是细胞中DNA核苷酸序列发生了稳定的可遗传的改变。（4）点重复spot replicates 每种探针在芯片上每个阵列中的重复次数。（5）探针probe

线性代数知识点归纳,超详细

线性代数复习要点第一部分行列式 1. 排列的逆序数 2. 行列式按行（列）展开法则 3. 行列式的性质及行列式的计算行列式的定义 1.行列式的计算： ①(定义法) ②（降阶法）行列式按行（列）展开定理：行列式等于它的任一行（列）的各元素与其对应的代数余子式的乘积之和. 推论：行列式某一行（列）的元素与另一行（列）的对应元素的代数余子式乘积之和等于零.

③(化为三角型行列式)上三角、下三角、主对角行列式等于主对角线上元素的乘积. ④若都是方阵（不必同阶）,则 ⑤关于副对角线： ⑥范德蒙德行列式：证明用从第n行开始，自下而上依次的由下一行减去它上一行的倍，按第一列展开，重复上述操作即可。 ⑦型公式： ⑧(升阶法)在原行列式中增加一行一列，保持原行列式不变的方法. ⑨(递推公式法) 对阶行列式找出与或,之间的一种关系——称为递推公式，其中 ,,等结构相同，再由递推公式求出的方法称为递推公式法. (拆分法) 把某一行（或列）的元素写成两数和的形式，再利用行列式的性质将原行列式写成两行列式之和，使问题简化以例计算. ⑩(数学归纳法) 2. 对于阶行列式，恒有：，其中为阶主子式；

3. 证明的方法： ①、； ②、反证法； ③、构造齐次方程组，证明其有非零解； ④、利用秩，证明； ⑤、证明0是其特征值. 4. 代数余子式和余子式的关系：第二部分矩阵 1.矩阵的运算性质 2.矩阵求逆 3.矩阵的秩的性质 4.矩阵方程的求解 1.矩阵的定义由个数排成的行列的表称为矩阵. 记作：或 ①同型矩阵：两个矩阵的行数相等、列数也相等. ②矩阵相等: 两个矩阵同型，且对应元素相等. ③矩阵运算 a. 矩阵加（减）法：两个同型矩阵，对应元素相加（减）. b. 数与矩阵相乘：数与矩阵的乘积记作或，规定为. c. 矩阵与矩阵相乘：设, ,则，其中注：矩阵乘法不满足：交换律、消去律, 即公式不成立.

微阵列资料分析(Microarray Data Analysis)

微陣列資料分析(Microarray Data Analysis) 蔡政安副教授前言在人類基因體定序計劃的重要里程碑陸續完成之後，生命科學邁入了一個前所未有的新時代，在人類染色體總長度約三十億個鹼基對中，約含有四萬個基因，這是生物學家首次以這麼宏觀的視野來檢視生命現象，而醫藥上的研究方針亦從此改觀，科學研究從此正式進入後基因體時代。微陣列實驗(Microarray) 及其它高產能檢測(high-throughput screen) 技術的興起，無疑將成為本世紀的主流；微陣列實驗主要的優勢再於能同時大量地、全面性地偵測上萬個基因表現量，透過基因晶片，可在短時間內找出可能受疾病影響基因，作為早期診斷的生物指標(biomarker)。然而，由於這一類技術的高度自動化、規模化及微型化的特性，使得他們所生成的資料量非常龐大且資料型態比一般實驗數據更加複雜，因此，傳統統計分析方法已經不敷使用。在此同時，統計學家並未在此重要時刻缺席，提出非常多新的統計理論和方法來分析微陣列實驗資料，也廣受生物學家所使用。由於微陣列資料分析所牽涉的統計問題層面相當廣且深入，本文僅針對整個實驗中所衍生的統計問題加以介紹，並介紹其中一些新的圖形工具用以呈現分析結果。基因晶片的原理微陣列晶片即一般所謂的基因晶片，也是基因體計畫完成後衍生出來的產品，花費成本雖高，但效用無限，是目前所有生物晶片中應用最廣的，由於近年來不斷改進，也是最有成效的生物技術。一般而言，基因晶片是利用微處理技術，先把人類所有的基因分別固著在一小範圍的玻璃片(glass slide)、薄膜(membrane)或者矽晶片上；然後，可以平行地、大量地、全面性地偵測基因體中mRNA的量，也就是偵測基因的調控及相互作用表現。目前微陣列晶片大致分為以下兩種平台(如圖一) : cDNA 晶片及高密度寡核甘酸晶片(high-density oligonucleotide)，兩種系統無論在晶片的製程及樣本處理上皆有相當的差異，因此在分析上也略有不同，以下便就晶片的特性約略介紹。 1.cDNA 晶片: 基本上晶片上的探針(probes)及準備進行雜合反應(hybridization) 的樣本(Targets)皆來自於cDNA。正常及癌組織中萃取的mRNA經反轉錄後，分別標上綠色(Cy3)和紅色(Cy5)螢光標記，並同時和晶片進行雜合反應，反