Probabilistic methods for Intelligent Software Systems

Computer scientists and programmers increasingly develop complex software solutions to intelligent tasks such as expert systems, speech and natural language understanding, vision, knowledge discovery, automatic text indexing, robotics, image matching and indexing, image clustering and classification, robotics and agents, systems monitoring, health management and diagnosis, scientific instrumentation, applied physics, molecular biology, networking and communications, and so forth. These tasks involve a high degree of uncertainty for the following reasons:

Complexity:

Intractable search problems (like aircraft scheduling) very large data problems (like image analysis of tera-byte earth science data), and complex physical models (atmospheric modeling or lighting models in graphics) have computational complexity that defies exact or optimal solution. One can approximate a solution, and thus be uncertain as to how good an approximation one is making. In addition, at any of the choice points during the computation, uncertainty exists as to its future consequences for the computation.

Incompleteness:

The inputs to the problem may be inadequate to yield a unique solution: for instance, in learning only a small number of examples may be given, or in diagnosis there may be many possible explanations of the phenomena. In image interpretation, one will be given a coarse pixel representation of a complex object. One is uncertain as to which of the possible solutions may be the best.

Intrinsic uncertainty:

"Noise" in its pure form crops up in many problems, for instance, due to sampling, truncation, instrument drift, and human error. This is uncertainty in its statistical sense.

Approximation:

How good is an approximation, under which conditions does it work? How can the approximation be improved, or its parameters tweaked for different conditions? Uncertainty exists as to the quality of the approximation, and in setting the parameters to tune the approximation.

Language:

Spoken and written input is finite and therefore vague in some aspects, especially when human frailties intervene: for instance a victim's description of an attacker, or a physician's summary of the knowledge gleaned from 50 case histories. Uncertainty exists in interpreting the precise meaning of language, and incorporating this with other information.

Information fusion:

Intelligent systems increasing acquire information from disparate sources: for instance, speech recognition systems use information about the speaker, about the context of the utterance, and about grammar. Geographic information systems combine information about hydrology, soil-type, climate and satellite data from several different instruments. Uncertainty exists in determining how to assign relative importance to disparate information sources.

Uncertainty is fundamental in intelligent software systems.

While many models exist for addressing uncertainty, analysis using the probability calculus is perhaps the most general. Most well known frameworks for analysis can be modeled within the probability calculus, often times leading to significant insight. Uncertainty models included in this category are fuzzy logic, classical frequentist statistics, minimum complexity methods such as description length, and maximum entropy methods. Probability calculus now sees widespread use in neural networks, vision, graphics, natural language, and text processing, as well as in its original stronghold of statistical analysis. In may cases, these areas only make partial use of the full power of the probability calculus because they employ a classical frequentist interpretation which implies a sample space---many problems in intelligent systems are unfortunately one-off so this is not possible. The Artificial Intelligence community originally saw logic as a powerful calculus that could be the theoretical basis for intelligence. While logic has fundamental contributions to make in representation and programming languages, uncertainty invariably arises and other analytic tools are required, for instance the probability calculus.

Returning now to the design of intelligent systems, the probability calculus has computational variants in much the same way that logic has its computational variants. The understanding of probabilities, its use within a computation, and its efficient implementation within some broader application are issues of general concern in the design of intelligent systems.

Last change: Fri, Nov 8th, 10:38am, 1996

wray@ic.eecs.berkeley.edu