Data Mining and Knowledge Discovery in Proton Nuclear Magnetic Resonance (1H-NMR) Spectra using Frequency to Information Transformation (FIT)

Document Type



Recent rapid development of research in the fields of structural genomics and bioinformatics has stressed the need for the development of effective methods of data mining and knowledge extraction from complex and convoluted signals.

In this paper we introduce frequency to information transformation (FIT) as a novel method of extracting information content of complex signals. Because FIT uses a priori knowledge and is a comparative technique, it is well suited for data mining and knowledge discovery from complex data. In this paper, we introduce FIT and compare it to established methods used in automated conditioning and knowledge discovery in proton-nuclear magnetic resonance (1H-NMR) spectra. FIT transformation was applied to a collection of 80 one-dimensional (1D) 1H-NMR spectra of 23 N-linked oligosaccharides.

Three classification methods, namely, cluster analysis, Bayesian analysis and artificial neural networks (ANN) were used to demonstrate the advantages of FIT in information and knowledge extraction in comparison with classical methods such as frequency-based filtering, nonlinear and piecewise linear curve fitting, and correlation coefficient analysis.