Genomic Signal Processing (GSP) is the engineering discipline that studies the processing of genomic signals.The theory of signal processing is utilized in both structural and functional understanding. The aim of GSP is to integrate the theory and methods of signal processing with the global understanding of functional genomics, with special emphasis on genomic regulation.
Gene prediction typically refers to the area of computational that is concerned with algorithmically identifying biology genomic DNA, that are stretches of sequence, usually biologically functional. This especially includes protein-genes, but may also include other functional coding RNA genes and regulatory regions. Gene elements such as finding is
one of the first and most important steps in understanding the genome of a species once it has been sequenced.
Genomic signal processing (GSP) is the engineering dis-
cipline that studies the processing of genomic signals. Ow-
ing to the major role played in genomics by transcriptional
signaling and the related pathway modeling, it is only nat-
ural that the theory of signal processing should be utilized
in both structural and functional understanding. The aim of
GSP is to integrate the theory and methods of signal process-
ing with the global understanding of functional genomics,
with special emphasis on genomic regulation. Hence, GSP
encompasses various methodologies concerning expression
profiles: detection, prediction, classification, control, and sta-
tistical and dynamical modeling of gene networks. GSP is
a fundamental discipline that brings to genomics the struc-
tural model-based analysis and synthesis that form the basis
of mathematically rigorous engineering.
Application is generally directed towards tissue classifi-
cation and the discovery of signaling pathways, both based
on the expressed macromolecule phenotype of the cell. Ac-
complishment of these aims requires a host of signal process-
ing approaches. These include signal representation relevant
to transcription, such as wavelet decomposition and more
general decompositions of stochastic time series, and system
modeling using nonlinear dynamical systems. The kind of
correlation-based analysis commonly used for understand-
ing pairwise relations between genes or cellular effects can-
not capture the complex network of nonlinear information
processing based upon multivariate inputs from inside and
outside the genome. Regulatory models require the kind of
nonlinear dynamics studied in signal processing and con-
trol, and in particular the use of stochastic dataflow networks
common to distributed computer systems with stochastic
inputs. This is not to say that existing model systems suf-
fice. Genomics requires its own model systems, not simply
straightforward adaptations of currently formulated mod-
els. New systems must capture the specific biological mecha-
nisms of operation and distributed regulation at work within
the genome. It is necessary to develop appropriate mathe-
matical theory, including optimization, for the kinds of ex-
ternal controls required for therapeutic intervention as well
as approximation theory to arrive at nonlinear dynamical
models that are sufficiently complex to adequately represent
genomic regulation for diagnosis and therapy while not be-
ing overly complex for the amounts of data experimentally
feasible or for the computational limits of existing computer
hardware.
A cell relies on its protein components for a wide variety of
its functions, including energy production, biosynthesis of
component macromolecules, maintenance of cellular archi-
tecture, and the ability to act upon intra- and extra-cellular
stimuli. Each cell in an organism contains the information
necessary to produce the entire repertoire of proteins the
organism can specify. Since a cell’s specific functionality is
largely determined by the genes it is expressing, it is logical
that transcription, the first step in the process of convert-
ing the genetic information stored in an organism’s genome
into protein, would be highly regulated by the control net-
work that coordinates and directs cellular activity. A primary
means for regulating cellular activity is the control of pro-
tein production via the amounts of mRNA expressed by in-
dividual genes. The tools to build an understanding of ge-
nomic regulation of expression will involve the characteriza-
tion of these expression levels. Microarray technology, both
cDNA and oligonucleotide, provides a powerful analytic tool
for genetic research. Since our concern in this paper is to ar-
ticulate the salient issues for GSP, and not to delve deeply
into microarray technology, we confine our brief discussion
to cDNA microarrays.
Complementary DNA microarray technology combines
robotic spotting of small amounts of individual, pure nu-
cleic acid species on a glass surface, hybridization to this array
with multiple fluorescently labeled nucleic acids, and detec-
tion and quantitation of the resulting fluor-tagged hybrids
by a scanning confocal microscope. A basic application is
quantitative analysis of fluorescence signals representing the
relative abundance of mRNA from distinct tissue samples.
Complementary DNA microarrays are prepared by print-
ing thousands of cDNAs in an array format on glass micro-
scope slides, which provide gene-specific hybridization tar-
gets. Distinct mRNA samples can be labeled with different
fluors and then co-hybridized onto each arrayed gene. Ratios
(or sometimes the direct intensity measurements) of gene
expression levels between the samples can be used to detect
meaningfully different expression levels between the samples
for a given gene. Given an experimental design with multiple
tissue samples, microarray data can be used to cluster genes
based on expression profiles, to characterize and classify dis-
ease based on the expression levels of gene sets, and for other
signal processing tasks.