Genomic Signal Processing


Genomic Signal Processing (GSP) is the engineering discipline that studies the processing of genomic signals.The theory of signal processing is utilized in both structural and functional understanding. The aim of GSP is to integrate the theory and methods of signal processing with the global understanding of functional genomics, with special emphasis on genomic regulation.

Gene prediction typically refers to the area of computational that is concerned with algorithmically identifying biology genomic DNA, that are stretches of sequence, usually biologically functional. This especially includes protein-genes, but may also include other functional coding RNA genes and regulatory regions. Gene elements such as finding is

one of the first and most important steps in understanding the genome of a species once it has been sequenced.

Genomic signal processing (GSP) is the engineering dis-

cipline that studies the processing of genomic signals. Ow-

ing to the major role played in genomics by transcriptional

signaling and the related pathway modeling, it is only nat-

ural that the theory of signal processing should be utilized

in both structural and functional understanding. The aim of

GSP is to integrate the theory and methods of signal process-

ing with the global understanding of functional genomics,

with special emphasis on genomic regulation. Hence, GSP

encompasses various methodologies concerning expression

profiles: detection, prediction, classification, control, and sta-

tistical and dynamical modeling of gene networks. GSP is

a fundamental discipline that brings to genomics the struc-

tural model-based analysis and synthesis that form the basis

of mathematically rigorous engineering.

Application is generally directed towards tissue classifi-

cation and the discovery of signaling pathways, both based

on the expressed macromolecule phenotype of the cell. Ac-

complishment of these aims requires a host of signal process-

ing approaches. These include signal representation relevant

to transcription, such as wavelet decomposition and more

general decompositions of stochastic time series, and system

modeling using nonlinear dynamical systems. The kind of

correlation-based analysis commonly used for understand-

ing pairwise relations between genes or cellular effects can-

not capture the complex network of nonlinear information

processing based upon multivariate inputs from inside and

outside the genome. Regulatory models require the kind of

nonlinear dynamics studied in signal processing and con-

trol, and in particular the use of stochastic dataflow networks

common to distributed computer systems with stochastic

inputs. This is not to say that existing model systems suf-

fice. Genomics requires its own model systems, not simply

straightforward adaptations of currently formulated mod-

els. New systems must capture the specific biological mecha-

nisms of operation and distributed regulation at work within

the genome. It is necessary to develop appropriate mathe-

matical theory, including optimization, for the kinds of ex-

ternal controls required for therapeutic intervention as well

as approximation theory to arrive at nonlinear dynamical

models that are sufficiently complex to adequately represent

genomic regulation for diagnosis and therapy while not be-

ing overly complex for the amounts of data experimentally

feasible or for the computational limits of existing computer

hardware.

A cell relies on its protein components for a wide variety of

its functions, including energy production, biosynthesis of

component macromolecules, maintenance of cellular archi-

tecture, and the ability to act upon intra- and extra-cellular

stimuli. Each cell in an organism contains the information

necessary to produce the entire repertoire of proteins the

organism can specify. Since a cell’s specific functionality is

largely determined by the genes it is expressing, it is logical

that transcription, the first step in the process of convert-

ing the genetic information stored in an organism’s genome

into protein, would be highly regulated by the control net-

work that coordinates and directs cellular activity. A primary

means for regulating cellular activity is the control of pro-

tein production via the amounts of mRNA expressed by in-

dividual genes. The tools to build an understanding of ge-

nomic regulation of expression will involve the characteriza-

tion of these expression levels. Microarray technology, both

cDNA and oligonucleotide, provides a powerful analytic tool

for genetic research. Since our concern in this paper is to ar-

ticulate the salient issues for GSP, and not to delve deeply

into microarray technology, we confine our brief discussion

to cDNA microarrays.

Complementary DNA microarray technology combines

robotic spotting of small amounts of individual, pure nu-

cleic acid species on a glass surface, hybridization to this array

with multiple fluorescently labeled nucleic acids, and detec-

tion and quantitation of the resulting fluor-tagged hybrids

by a scanning confocal microscope. A basic application is

quantitative analysis of fluorescence signals representing the

relative abundance of mRNA from distinct tissue samples.

Complementary DNA microarrays are prepared by print-

ing thousands of cDNAs in an array format on glass micro-

scope slides, which provide gene-specific hybridization tar-

gets. Distinct mRNA samples can be labeled with different

fluors and then co-hybridized onto each arrayed gene. Ratios

(or sometimes the direct intensity measurements) of gene

expression levels between the samples can be used to detect

meaningfully different expression levels between the samples

for a given gene. Given an experimental design with multiple

tissue samples, microarray data can be used to cluster genes

based on expression profiles, to characterize and classify dis-

ease based on the expression levels of gene sets, and for other

signal processing tasks.