Bayesian Networks

A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data analysis. One, because the model encodes dependencies among all variables, it readily handles situations where some data entries are missing. Two, a Bayesian network can be used to learn causal relationships, and hence can be used to gain understanding about a problem domain and to predict the consequences of intervention. Three, because the model has both a causal and probabilistic semantics, it is an ideal representation for combining prior knowledge (which often comes in causal form) and data. Four, Bayesian statistical methods in conjunction with bayesian networks offer an efficient and principled approach for avoiding the overfitting of data. In this paper, we discuss methods for constructing Bayesian networks from prior knowledge and summarize Bayesian statistical methods for using data to improve these models. With regard to the latter task, we describe methods for learning both the parameters and structure of a Bayesian network, including techniques for learning with incomplete data. In addition, we relate Bayesian-network methods for learning to techniques for supervised and unsupervised learning. We illustrate the graphical-modeling approach using a real-world case study.

Bayesian Networks are becoming an increasingly important area for research and application in the entire field of Artificial Intelligence. This paper explores the nature and implications for Bayesian Networks beginning with an overview and comparison of inferential statistics and Bayes' Theorem. The nature, relevance and applicability of Bayesian Network theory for issues of advanced computability forms the core of the current discussion. A number of current applications using Bayesian networks is examined. The paper concludes with a brief discussion of the appropriateness and limitations of Bayesian Networks for human-computer interaction and automated learning.

Inferential statistics is a branch of statistics that attempts to make valid predictions based on only a sample of all possible observations[1]. For example, imagine a bag of 10,000 marbles. Some are black and some white, but of which the exact proportion of these colours is unknown. It is unnecessary to count all the marbles in order to make some statement about this proportion. A randomly acquired sample of 1,000 marbles may be sufficient to make an inference about the proportion of black and white marbles in the entire population. If 40% of our sample are white, then we may be able to infer that about 40% of the population are also white.

To the layperson, this process seems rather straight forward. In fact, it might seem that there is no need to even acquire a sample of 1,000 marbles. A sample of 100 or even 10 marbles might do.

This is assumption is not necessarily correct. As the sample size becomes smaller, the potential for error grows. For this reason, inferential statistics has developed numerous techniques for stating the level of confidence that can be placed on these inferences.

Classical inferential models do not permit the introduction of prior knowledge into the calculations. For the rigours of the scientific method, this is an appropriate response to prevent the introduction of extraneous data that might skew the experimental results. However, there are times when the use of prior knowledge would be a useful contribution to the evaluation process.

Assume a situation where an investor is considering purchasing some sort of exclusive franchise for a given geographic territory. Her business plan suggests that she must achieve 25% of market saturation for the enterprise to be profitable. Using some of her investment funds, she hires a polling company to conduct a randomized survey. The results conclude that from a random sample of 20 consumers, 25% of the population would indeed be prepared to purchase her services. Is this sufficient evidence to proceed with the investment?

If this is all the investor has to go on, she could find herself on her break-even point and could just as easily turn a loss instead of a profit. She may not have enough confidence in this survey or her plan to proceed.

Limitations of Bayesian Networks

In spite of their remarkable power and potential to address inferential processes, there are some inherent limitations and liabilities to Bayesian networks.

In reviewing the Lumiere project, one potential problem that is seldom recognized is the remote possibility that a system's user might wish to violate the distribution of probabilities upon which the system is built. While an automated help desk system that is unable to embrace unusual or unanticipated requests is merely frustrating, an automated navigation system that is unable to respond to some previously unforeseen event might put an aircraft and its occupants in mortal peril. While these systems can update their goals and objectives based on prior distributions of goals and objectives among sample groups, the possibility that a user will make a novel request for information in a previously unanticipated way must also be accommodated.

Two other problems are more serious. The first is the computational difficulty of exploring a previously unknown network. To calculate the probability of any branch of the network, all branches must be calculated. While the resulting ability to describe the network can be performed in linear time, this process of network discovery is an NP-hard task which might either be too costly to perform, or impossible given the number and combination of variables.

The second problem centers on the quality and extent of the prior beliefs used in Bayesian inference processing. A Bayesian network is only as useful as this prior knowledge is reliable. Either an excessively optimistic or pessimistic expectation of the quality of these prior beliefs will distort the entire network and invalidate the results. Related to this concern is the selection of the statistical distribution induced in modelling the data. Selecting the proper distribution model to describe the data has a notable effect on the quality of the resulting network.

A Bayesian network (or a belief network) is a probabilistic graphical model that represents a set of variables and their probabilistic independencies. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

Formally, Bayesian networks are directed acyclic graphs whose nodes represent variables, and whose arcs encode conditional independencies between the variables. Nodes can represent any kind of variable, be it a measured parameter, a latent variable or a hypothesis. They are not restricted to representing random variables, which represents another "Bayesian" aspect of a Bayesian network. Efficient algorithms exist that perform inference and learning in Bayesian networks. Bayesian networks that model sequences of variables (such as for example speech signals or protein sequences) are called dynamic Bayesian networks. Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called influence diagrams.

Bayesian networks are used for modelling knowledge in bioinformatics (gene regulatory networks, protein structure), medicine, document classification, image processing, data fusion, decision support systems, engineering, and law.