Partial List of Abstracts

J28
Y. N. Lien, Y. W. Ma, and B. W. Wah, ``Transformation of the Generalized Traveling-Salesman Problem into the Standard Traveling-Salesman Problem,'' Information Sciences, Elsevier Science Pub. Co., Inc., New York, NY, vol. 74, nos. 1 & 2, Oct. 15, 1993, pp. 177-189.

In the generalized traveling-salesman problem (GTSP), we are given a set of cities that are grouped into possibly intersecting clusters. The objective is to find a closed path of minimum cost that visits at least one city in each cluster. Given an instance G of the GTSP, we first transform G into another instance G' of the GTSP in which all the clusters are nonintersecting, and then transform G' into an instance G" of the standard traveling-salesman problem (TSP). We show that any feasible solution of the TSP instance G" can be transformed into a feasible solution of the GTSP instance G of no greater cost, and that any optimal solution of the TSP instance G" can be transformed into an optimal solution of the GTSP instance G.

J38
K. M. Baumgartner and B. W. Wah ``Computer Scheduling Algorithms: Past, Present, and Future,'' Information Sciences, Elsevier Science Pub. Co., vol. 57 & 58, Sept.-Dec. 1991, pp. 319-345.

Efficient scheduling techniques of computing resources are essential for achieving satisfactory performance for users as computer systems and their applications become more complex. In this paper, we survey research on scheduling algorithms, review previous classifications of scheduling problems, and present a broader classification scheme. Using a uniform terminology for scheduling strategies and the new classification scheme, previous work on scheduling strategies is reviewed, and trends in scheduling research are identified. Finally, a methodology for developing scheduling strategies is presented.

J41
A. Ieumwananonthachai, A. N. Aizawa, S. R. Schwartz, B. W. Wah, and J. C. Yan, ``Intelligent Process Mapping through Systematic Improvement of Heuristics,'' J. of Parallel and Distributed Computing, Academic Press, vol. 15, June 1992, pp. 118-142.

In this paper, we present the design of a system for automatically learning and evaluating new heuristic methods that can be used to map a set of communicating processes on a network of computers. Our learning system is based on testing a population of competing heuristic methods within a fixed time constraint. We develop and analyze various resource scheduling strategies based on a statistical model that trades between the number of new heuristic methods considered and the amount of testing performed on each. We implement a prototype learning system (Teacher 4.1) for learning new heuristic methods used in Post-Game Analysis, a system that iteratively generates and refines mappings of a set of communicating processes on a network of computers. Our performance results show that a significant improvement can be obtained by a systematic exploration of the space of possible heuristic methods.

J42
B. W. Wah, A. Ieumwananonthachai, L.-C. Chu, and A. N. Aizawa, ``Genetics-Based Learning of New Heuristics: Rational Scheduling of Experiments and Generalization,'' IEEE Trans. on Knowledge and Data Engineering, vol. 7, no. 5 (Oct. 1995), pp. 763-785.

In this paper we present new methods for the automated learning of heuristics in knowledge-lean applications and for finding heuristics that can be generalized to unlearned domains. These applications lack domain knowledge for credit assignment; hence, operators for composing new heuristics are generally model-free, domain independent, and syntactic in nature. The operators we have used are genetics-based; examples of which include mutation and cross-over. Learning is based on a generate-and-test paradigm that maintains a pool of competing heuristics, tests them to a limited extent, creates new ones from those that perform well in the past, and prunes poor ones from the pool. We have studied three important issues in learning better heuristics: (a) anomalies in performance evaluation, (b) rational scheduling of limited computational resources in testing candidate heuristics in single-objective as well as multi-objective learning, and (c) finding heuristics that can be generalized to unlearned domains. We show experimental results in learning better heuristics for (a) process placement for distributed-memory multicomputers, (b) node decomposition in a branch-and-bound search, (c) generation of test patterns in VLSI circuit testing, and (d) VLSI cell placement and routing.

J43
B. W. Wah and L.-C. Chu, ``Combinatorial Search Algorithms with Meta-Control: Modeling and Implementations,'' Int'l J. of Artificial Intelligence Tools, World Scientific Publishers, vol. 1, no. 3 (Sept. 1992), pp. 369-397.

In this paper, we model search algorithms with meta-control, allowing resource constraints, approximation, and parallel processing to be incorporated easily in the search process. The basic building block of the model is a hierarchical search process (HSP) consisting of context-free and context-sensitive grammars classified according to problem-independent and problem-dependent parts. The context-sensitive components are used mainly for evaluating decision parameters and in ordering production rules in the context-free grammar. The execution of the grammars for given initial conditions may invoke other HSPs already defined in the system. We describe ISE (acronym for Integrated Search Environment), a tool that implements hierarchical searches with meta-control. By separating the problem-dependent and problem-independent components in ISE, new search methods based on a combination of existing methods can be developed easily by coding a single master control program. Further, new applications solved by searches can be developed by coding the problem-dependent parts and reusing the problem-independent parts already developed. We describe the organization of ISE and present experiments carried out on the system.

J44
B. W. Wah, ``Population-Based Learning: A New Method for Learning from Examples under Resource Constraints,'' IEEE Trans. on Knowledge and Data Engineering, vol. 4, no. 5 (Oct. 1992), pp. 454-474.

In this paper, we study a new learning model for designing heuristics automatically under resource constraints. We focus on improving performance-related heuristic methods (HMs) in knowledge-lean application domains. We assume that learning is episodic, that the performance measures of an episode are dependent only on the final state reached in evaluating the corresponding test case and are independent of the intermediate decisions and internal states, and that the aggregate performance measures of the HMs involved are independent of the order of evaluation of test cases. The learning model is based on testing a population of competing HMs for an application problem and switches from one to another dynamically, depending on the outcome of previous tests. Its goal is to find a good HM within the resource constraints, with proper tradeoff between cost and quality. It complements existing point-based machine learning models that maintain only one incumbent HM, and that tests the HM extensively before switching to alternative ones. It extends existing work on classifier systems by addressing issues related to delays in feedback, scheduling of tests of HMs under limited resources, anomalies in performance evaluation, and scalability of HMs. Finally, we describe our experience in applying the learning method on five application problems.

J46
B. W. Wah, T. S. Huang, A. K. Joshi, D. Moldovan, J. Aloimonos, R. K. Bajcsy, D. Ballard, D. DeGroot, K. DeJong, C. R. Dyer, S. E. Fahlman, R. Grishman, L. Hirschman, R. E. Korf, S. E. Levinson, D. P. Miranker, N. H. Morgan, S. Nirenburg, T. Poggio, E. M. Riseman, C. Stanfill, S. J. Stolfo, and S. L. Tanimoto, ``Report on Workshop on High Performance Computing and Communications for Grand Challenge Applications: Computer Vision, Speech and Natural Language Processing, and Artificial Intelligence,'' IEEE Transactions on Knowledge and Data Engineering, vol. 5, no. 1; also in Tech. Rep. CRHC-92-26, Center for Reliable and High Performance Computing, Coordinated Science Laboratory, Univ. of Illinois, Urbana, IL, February 1993, pp. 138-154.

This article reports the findings of the Workshop on High Performance Computing and Communications (HPCC) for Grand Challenge Applications: Computer Vision, Speech and Natural Language Processing (SNLP), and Artificial Intelligence (AI). The goals of the workshop are to identify applications, research problems, and designs of HPCC systems for supporting applications in these areas.

In computer vision, we have identified the main scientific issues as machine learning, surface reconstruction, inverse optics and integration, model acquisition, and perception and action. Since vision algorithms operate in different levels of granularity, computers for supporting these algorithms need to be heterogeneous and modular. Advances in technology, new architectural concepts, and software design methods are essential for this area.

In SNLP, we have identified issues in statistical analysis in corpus-based speech and language understanding, search strategies for language analysis, auditory and vocal-tract modeling, integration of multiple levels of speech and language analyses, and connectionist systems. Similar to algorithms in computer vision, algorithms in SNLP require high computational power, ranging from general purpose supercomputing to special pur- pose VLSI systems. As processing has various requirements, a hybrid heterogeneous computer system is the most desirable.

In AI, important issues that need immediate attention include the development of efficient machine learning and heuristic search methods that can adapt to different architectural configurations, and the design and construction of scalable and verifiable knowledge bases, active memories, and artificial neural networks. A computer system for supporting AI applications is heterogeneous, requiring research in high speed computer networks, mass storage and efficient retrieval methods, computer languages, and hardware and software design tools.

Research in these areas is inherently multidisciplinary and will require active participation of researchers in device and networking technologies, signal processing, computer architecture, software engineering, and knowledge engineering. Besides extending current frontiers in research, an important aspect to be emphasized is the integration of existing components and results into working systems.

J47
Akiko N. Aizawa and Benjamin W. Wah, ``A Sequential Sampling Procedure for Genetic Algorithms: Computers and Mathematics with Applications, Pergamon Press, Ltd., Tarrytown, NY, 1994, vol. 27, no. 9/10, pp. 77-82. Also presented at the 5th Int'l Workshop of the Bellman Continum Waikoloa, Hawaii, Jan. 1993.

In this paper, we develop new methods for adjusting configuration parameters of genetic algorithms operating in a noisy environment. Such methods are related to the scheduling of resources for tests performed in genetic algorithms. Assuming that the population size is given, we address two problems related to the design of efficient scheduling algorithms specifically important in noisy environments. First, we study the duration-scheduling problem that is related to setting dynamically the duration of each generation. Second, we study the sample-allocation problem that entails the adaptive determination of the number of evaluations taken from each candidate in a generation. In our approach, we model the search process as a statistical selection process and derive equations useful for these problems. Our results show that our adaptive procedures improve the performance of genetic algorithms over that of commonly used static ones.

J48
Kumar Ganapathy and Benjamin Wah, ``Optimal Synthesis of Processor Arrays with Pipelined Arithmetic Units from Uniform Recurrence Equations,'' Parallel Processing Letters, World Scientific Pub. Co., 1994, vol. 4, no. 3, pp. 339-350.

Two-level pipelining in processor arrays (PAs) involves pipelining of operations across processing elements (PEs) and pipelining of operations in functional units in each PE. Although it is an attractive method for improving the throughput of PAs, existing methods for generating PAs with two-level pipelining are restricted and cannot systematically explore the entire space of feasible designs. In this paper, we extend a systematic design method, called General Parameter Method (GPM), we have developed earlier to find optimal designs of PAs with two-level pipelines. The basic idea is to add new constraints on periods of data flows to include the effect of internal functional pipelines in the PEs. As an illustration, we present pipelined PA designs for computing matrix products. For $n$-dimensional meshes and other symmetric problems, we provide an efficient scheme to obtain a pipelined PA from a non-pipelined PA using a reindexing transformation. This scheme is used in GPM as a pruning condition to arrive at optimal pipelined PAs efficiently. For pipelines with minimum initiation interval (MII) greater than unity, we show additional constraints that ensure correctness of the synthesized PAs.

J49
Akiko Aizawa and Benjamin Wah, ``Scheduling of Genetic Algorithms in a Noisy Environment,'' Evolutionary Computation, Academic Press, vol. 2, no. 2, 1994, pp. 97-122.

In this paper, we have studied the scheduling of computational resources for genetic algorithms operating in a noisy environment. Assuming that the number of candidates in each generation is fixed, we have studied two related scheduling problems: determining a suitable duration for a generation, and finding the appropriate number of samples to be drawn from each candidate. Our results show that dynamic scheduling policies perform better that existing static ones. Although we have used simple test functions in our experiments, the advantage of using dynamic scheduling is universal and is independent of the reproduction mechanism of individual genetic algorithms.

J50
Kumar N. Ganapathy and Benjamin W. Wah, ``Optimal Synthesis of Algorithm-Specific Lower-Dimensional Processor Arrays,'' IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 3 (March 1996), pp. 274-287.

Processor arrays are frequently used to deliver high performance in many applications with computationally intensive operations. This paper presents the General Parameter Method (GPM), a systematic parameter-based approach for synthesizing such algorithm-specific architectures. GPM can synthesize processor arrays of any lower dimension from a uniform-recurrence description of the algorithm. The design objective is a general non-linear and non-monotonic user-specified function, and depends on attributes such as computation time of the recurrence on the processor array, completion time, load time, and drain time. In addition, bounds on some or all of these attributes can be specified. GPM performs an efficient search of polynomial complexity to find the optimal design satisfying the user-specified design constraints. As an illustration, we show how GPM can be used to find optimal linear processor arrays for computing transitive closures. We consider design objectives that minimize computation time, or processor count, or completion time (including load and drain times), and user-specified constraints on number of processing elements and/or computation/completion times. We show that GPM can be used to obtain optimal designs that trade between number of processing elements and completion time, thereby allowing the designer to choose a design that best meets the specified design objectives. We also show the equivalence between the model assumed in GPM and that in the popular dependence-based methods. Consequently, GPM can be used to find optimal designs for both models.

J51
M. Aboelaze and B. W. Wah, ``Processor Array with Bounded I/O Ports for computing Transitive Closures,`` Journal of Parallel and Distributed Computing, Academic Press, vol. 29, no. 1 (August 1995), pp. 84-90.

In this paper, we present the design of a processor array (PA) with bounded number of input/output (I/O) ports (L-by-N PEs with O(L) I/O ports, where 1 leq L leq N) for computing transitive closures. Previous designs of PAs to solve this problem use either N-by-N PAs with O(N) I/O ports or linear PAs with O(1) I/O ports. These designs are restricted by their I/O bandwidth because designs using O(N) I/O ports are limited by the number of PEs that can be supported by these ports, whereas designs with O(1) ports are limited by the I/O time to load and drain data. In contrast, our proposed design uses O(L) I/O ports, where L is a technology-dependent design parameter. In our proposed architecture, we have used more powerful processing elements (PEs), each with a microprogram-driven ALU, a limited number of I/O buffers and O(N/L) memory words. Our technique for mapping transitive-closure computations consists of four steps. (1) Starting from a mapping of a transitive-closure problem on an N-by-N PA, we map the computations into an L-by-N PA. (2) We then schedule the computations in the L-by-N PA in order to satisfy the dependencies. (3) Based on the new mapping, we derive speeds and directions of data flows and the number of buffers required in each PE. (4) Finally, we derive the microprogram in each PE for correct execution. We show that our design specializes to efficient designs in the boundary cases when L=1 (linear PA) and L=N (square mesh).

J52
Benjamin W. Wah and Yi Shang, ``A Comparative Study of IDA*-Style Searches,'' Int'l Journal of Tools with Artificial Intelligence, World Scientific, vol. 3, no. 4 (Oct. 1995), pp. 493-523.

In this paper, we study the performance of various IDA*-style searches and investigate methods to improve their performance by predicting in each stage the threshold to be used for pruning. Without loss of generality, we consider minimization problems in this paper. We first present three models to approximate the distribution of the number of search nodes by lower bounds: exponential, geometric, and linear, and illustrate these distributions based on some well-known combinatorial search problems. Based on these distributions, we show the performance of an ideal IDA* algorithm and identify reasons why existing IDA*-style algorithms perform well. In practice, we will be able to know from experience the type of distribution for a given problem instance, but will not be able to know the parameters of this distribution until the instance is solved. Hence, we develop RIDA*, a method that estimates dynamically the parameters of the distribution, and predicts the best threshold to be used in each stage. Finally, we compare the performance of several IDA*-style algorithms --- Korf's IDA* and RBFS, RIDA*, IDA*_CR and DFS* --- on several application problems, and identify conditions under which each of these algorithms will perform well.

J53
Pankaj Mehra and Benjamin W. Wah, ``Synthetic Workload Generation for Load-balancing Experiments,'' IEEE Parallel and Distributed Technology, vol. 3, no. 3 (Fall 1995), pp. 4-19.

This paper describes Dynamic Workload Generator (DWG), a facility for generating realistic and reproducible synthetic workloads for use in load-balancing experiments. For such experiments, the generated workload must not only mimic the highly dynamic resource-utilization patterns found on today's distributed systems but also behave as a real workload does when test jobs are run concurrently with it. The latter requirement is important in testing alternative load-balancing strategies, a process that requires running the same job multiple times, each time at a different site but under an identical network-wide workload. Parts of DWG are implemented inside the operating-system kernel and have complete control over the utilization levels of four key resources: CPU, memory, disk, and network. Besides accurately replaying network-wide load patterns recorded earlier, DWG gives up a fraction of its resources each time a new job arrives and reclaims these resources upon job completion. The latter operation is controlled by pattern-doctoring rules implemented in DWG. We present DWG's architecture, its doctoring rules, systematic methods for adjusting and evaluating doctoring rules, and experimental results on a network of Sun workstations.

J54
Benjamin W. Wah and Yao-Jen Chang, ``Trace-Based Methods for Solving Nonlinear Global Optimization and Satisfiability Problems,'' J. of Global Optimization, vol. 10, no. 2 (March 1997), pp. 107-141.

In this paper we present a method called Novel (Nonlinear Optimization via External Lead) for solving continuous and discrete global optimization problems. Novel addresses the balance between global search and local search, using a trace to aid in identifying promising regions before committing to local searches. We discuss Novel for solving continuous constrained optimization problems and show how it can be extended to solve constrained satisfaction and discrete satisfiability problems. We first transform the problem using Lagrange multipliers into an unconstrained version. Since a stable solution in a Lagrangian formulation only guarantees a local optimum satisfying the constraints, we propose a global search phase in which an aperiodic and bounded trace function is added to the search to first identify promising regions for local search. The trace generates an information-bearing trajectory from which good starting points are identified for further local searches. Taking only a small portion of the total search time, this elegant approach significantly reduces unnecessary local searches in regions leading to the same local optimum. We demonstrate the effectiveness of Novel on a collection of continuous optimization benchmark problems, finding the same or better solutions while satisfying the constraints. We extend Novel to discrete constraint satisfaction problems (CSPs) by showing an efficient transformation method for CSPs and the associated representation in finite-difference equations in Novel. We apply Novel to solve Boolean satisfiability instances in circuit fault detection and circuit synthesis applications, and show comparable performance when compared to the best existing method.

J55
Yi Shang and Benjamin W. Wah, ``Global Optimization for Neural Network Training,'' IEEE Computer Magazine, vol. 29, no. 3 (March 1996), pp. 45-54.

In this paper, we have studied various supervised learning methods for training feed-forward neural networks. In general, such learning can be considered as a nonlinear global optimization problem in which the goal is to minimize a nonlinear error function that spans the space of weights, using heuristic strategies that look for global optima (in contrast to local optima). We have surveyed various global optimization methods suitable for neural-network learning, and have proposed the Novel method (Nonlinear global Optimization method Via External Lead) for nonlinear optimization and neural network learning. By combining global and local searches, we show how Novel can be used to find a good local minimum in the error space. Our key idea is to use a user-defined trace that pulls a search out of a local minimum without having to restart it from a new starting point. Using five benchmark problems, we have compared Novel against some of the best global optimization algorithms and have demonstrated its superior improvement in performance.

The key points studied and summarized in this paper are as follows: (a) Global optimization involves strategies to escape from local minima after reaching there, (b) A good global search strategyy can find better local minima, (c) The search space of small neural networks is more rugged, requiring more powerful global search methods to get convergence, (d) A better local minimum in the error space does not always lead to neural networks that generalize better, (e) Smaller networks tend to generalize better.

Sun Sparc object code training software and results

J56
Benjamin W. Wah and Lon-Chan Chu, ``TCGD: A Time-Constrained Approximate Guided Depth-First Search Algorithm,'' International Journal on Artificial Intelligence Tools, World Scientific Publishing Co., Pte., vol. 6, no. 2 (1997), pp. 255-271.

In this paper, we develop TCGD, a problem-independent, time-constrained, approximate guided depth-first search (GDFS) algorithm. The algorithm is designed to achieve the best ascertained approximation degree under a fixed time constraint. We consider only searches with finite search space and admissible heuristic functions. We study NP-hard combinatorial optimization problems with polynomial-time computable feasible solutions. For the problems studied, we observe that the execution time increases exponentially as approximation degree decreases, although anomalies may happen. The algorithms we study are evaluated by simulations using the symmetric traveling-salesperson problem.

J57
Chin-Chi Teng and Benjamin W. Wah, ``Automated Learning for Reducing the Configuration of a Feed-Forward Neural Network,'' IEEE Transactions on Neural Networks, vol. 7, no. 5 (Sept. 1996), pp. 1072-1085.

In this paper, we present two learning mechanisms for artificial neural networks (ANNs) that can be applied to solve classification problems with binary outputs. These mechanisms are used to reduce the number of hidden units of an ANN when trained by the cascade-correlation learning algorithm (CAS). Since CAS adds hidden units incrementally as learning proceeds, it is difficult to predict the number of hidden units required when convergence is reached. Further, learning must be restarted when the number of hidden units is larger than expected. Our key idea in this paper is to provide alternatives in the learning process and to select the best alternative dynamically based on run-time information obtained. Mixed-mode learning (MM), our first algorithm, provides alternative output matrices so that learning is extended to find one of the many one-to-many mappings instead of finding a unique one-to-one mapping. Since the objective of learning is relaxed by this transformation, the number of learning epochs can be reduced. This in turn leads to a smaller number of hidden units required for convergence. Population-based Learning for ANNs (PLAN), our second algorithm, maintains alternative network configurations and to select at run time promising networks to train based on error information obtained and time remaining. This dynamic scheduling avoids training possibly unpromising ANNs to completion before exploring new ones. We show the performance of these two mechanisms by applying them to solve the two-spiral problem, a two-region classification problem, and the Pima Indian Diabetes Diagnosis problem.

J58
Arthur Ieumwananonthachai and Benjamin W. Wah, ``Statistical Generalization of Performance-Related Heuristics for Knowledge-Lean Applications,'' Int'l Journal of Tools with Artificial Intelligence, World Scientific, vol. 5, nos. 1 & 2 (June 1996), pp. 61-79.
In this paper, we present new results on the automated generalization of performance-related heuristics learned for knowledge-lean applications. By first applying genetics-based learning to learn new heuristics for some small subsets of test cases in a problem space, we study methods to generalize these heuristics to unlearned subdomains of test cases. Our method uses a new statistical metric called probability of win. By assessing the performance of heuristics in a range-independent and distribution-independent manner, we can compare heuristics across problem subdomains in a consistent manner. To illustrate our approach, we show experimental results on generalizing heuristics learned for sequential circuit testing, VLSI cell placement and routing, branch-and-bound search, and blind equalization. We show that generalization can lead to new and robust heuristics that perform better than the original heuristics across test cases of different characteristics.

J59
Pankaj Mehra and Benjamin W. Wah ``Strategy Learning: A Survey of Problems, Methods, and Architectures Int'l Journal of Tools with Artificial Intelligence, World Scientific, vol. 7, no. 4, Dec. 1998, pp. 487-550.
Problem solvers employ strategies in searching for solutions to given problem instances. Strategies have traditionally been designed by experts using prior knowledge and refined manually using trial and error. Recent attempts to automate these processes have produced strategy-learning systems. This paper shows how various issues in strategy learning are affected by the nature of performance tasks, problem solvers, and learning environments. Surveyed learning systems are grouped by the commonality of their approaches into four general architectures.

J60
Yi Shang and Benjamin W. Wah, ``A Discrete Lagrangian-Based Global-Search Method for Solving Satisfiability Problems,'' Journal of Global Optimization, Kluwer Academic Publishers, vol. 12, no. 1, Jan. 1998, pp. 61-99.

Satisfiability is a class of NP-complete problems that model a wide range of real-world applications. These problems are difficult to solve because they have many local minima in their search space, often trapping greedy search methods that utilize some form of descent. In this paper, we propose a new discrete Lagrange-multiplier-based global-search method for solving satisfiability problems. We derive new approaches for applying Lagrangian methods in discrete space, show that equilibrium is reached when a feasible assignment to the original problem is found, and present heuristic algorithms to look for equilibrium points. Our method and analysis provides a theoretical foundation and generalization of local search schemes that optimize the objective alone and clause-weight schemes that optimize the constraints alone. In contrast to local search methods that restart from a new starting point when a search reaches a local trap, the Lagrange multipliers in our method provide a force to lead the search out of a local minimum and move it in the direction provided by the Lagrange multipliers. In contrast to clause-weight schemes that rely only on the weights of violated constraints to escape from local minima, our method also uses the value of an objective function (in this case the number of violated constraints) to provide further guidance. The dynamic shift in emphasis between the objective and the constraints, depending on their relative values, is the key of Lagrangian methods. One of the major advantages of our method is that it has very few algorithmic parameters to be tuned by users, and the search procedure can be made deterministic and the results, reproducible. We demonstrate our method by applying it to solve an extensive set of benchmark problems archived in DIMACS of Rutgers University. Our method often performs better than the best existing methods and can achieve an order-of-magnitude speedup for some problems.

J61
Pankaj Mehra and Benjamin W. Wah, ``Automated Learning of Load-Balancing Strategies in Multiprogrammed Distributed Systems,'' International Journal of System Sciences, vol. 28, no. 11, pp. 1077-1100, November 1997.

Dynamic load-balancing strategies for distributed systems seek to improve average completion time of independent tasks by migrating each incoming task to the site where it is expected to finish the fastest: usually the site having the smallest load index. SMALL is an off-line learning system for developing configuration-specific load-balancing strategies; it learns new load indices as well as tunes the parameters of given migration policies. Using a dynamic workload generator, a number of typical systemwide load patterns are first recorded; the completion times of several benchmark jobs are then measured at each site, under each of the recorded load patterns. These measurements are used to simultaneously train comparator neural networks, one per site. The comparators collectively model a set of perfect load indices in that they seek to rank, at arrival time, the possible destinations for an incoming task by their (not yet known) respective completion times. The numerous parameters of the decentralized dynamic load-balancing policy are then tuned using a genetic algorithm. We present experimental results for a mix of scientific and interactive workloads on Sun workstations connected by Ethernet. The policies tuned by SMALL are shown to intelligently and effectively exploit idle resources.

J62
Kumar N. Ganapathy, Benjamin W. Wah, and Chien-Wei Li, ```Designing a Scalable Processor Array for Recurrent Computations,'' IEEE Trans. on Parallel and Distributed Systems, vol. 8, no. 8, August 1997, pp. 840-856.

In this paper, we study the design of a coprocessor (CoP) to execute efficiently recursive algorithms with uniform dependencies. Our design is based on two objectives: (a) fixed bandwidth to main memory (MM) and (b) scalability to higher performance without increasing MM bandwidth. Our CoP has an access unit (AU) organized as multiple queues, a processor array (PA) with regularly connected processing elements (PEs), and input/output networks for data routing. Our design is unique because it addresses input/output bottleneck and scalability, two of the most important issues in integrating processor arrays in current systems. To allow processor arrays to be widely usable, they must be scalable to high performance with little or no impact on the supporting memory system. The use of multiple queues in AU also eliminates the use of explicit data addresses, thereby simplifying the design of the control program. We present a mapping algorithm that partitions a data dependence graph (DG) of an application into regular blocks, sequences the blocks through AU, and schedules the execution of the blocks, one at a time, on PA. We show that our mapping procedure minimizes the amount of communication between blocks in the partitioned DG, and sequences the blocks through AU to reduce the communication between AU and MM. Using the matrix-product and transitive-closure applications, we study design trade-offs involving (a) division of a fixed chip area between PA and AU, and (b) improvements in speedup with respect to increases in chip area. Our results show, for a fixed chip area, (a) that there is little degradation in throughput in using a linear PA as compared to a PA organized as a square mesh, and (b) that the design is not sensitive to the division of chip area between PA and AU. We further show that, for a fixed throughput, there is an inverse square root relationship between speedup and total chip area. Our study demonstrates the feasibility of a low-cost, memory bandwidth-limited, and scalable coprocessor system for evaluating recurrent algorithms with uniform dependencies.

J63
Tao Wang and Benjamin W. Wah, ``Handling Inequality Constraints in Continuous Nonlinear Global Optimization,'' Journal of Integrated Design and Process Science, Society for Design and Process Science, vol. 2, no. 3, 1998, pp. 1-10.

In this paper, we present a new method to handle inequality constraints and apply it in Novel (Nonlinear Optimization via External Lead), a system we have developed for solving constrained continuous nonlinear optimization problems. In general, in applying Lagrange-multiplier methods to solve these problems, inequality constraints are first converted into equivalent equality constraints. One such conversion method adds a slack variable to each inequality constraint in order to convert it into an equality constraint. The disadvantage of this conversion is that when the search is inside a feasible region, some satisfied constraints may still pose a nonzero weight in the Lagrangian function, leading to possible oscillations and divergence when a local optimum lies on the boundary of a feasible region. We propose a new conversion method called the MaxQ method such that all satisfied constraints in a feasible region always carry zero weight in the Lagrange function; hence, minimizing the Lagrange function in a feasible region always leads to local minima of the objective function. We demonstrate that oscillations do not happen in our method. We also propose methods to speed up convergence when a local optimum lies on the boundary of a feasible region. Finally, we show improved experimental results in applying our proposed method in Novel on some existing benchmark problems and compare them to those obtained by applying the method based on slack variables.

J64
Benjamin W. Wah and Tao Wang, ``Efficient and Adaptive Lagrange-Multiplier Methods for Nonlinear Continuous Global Optimization,'' Journal of Global Optimization, Kluwer Academic Press, vol. 14, no. 1, Jan. 1999, pp. 1-25.

Lagrangian methods are popular in solving continuous constrained optimization problems. In this paper, we address three important issues in applying Lagrangian methods to solve optimization problems with inequality constraints. First, we study methods to transform inequality constraints into equality constraints. An existing method, called the slack-variable method, adds a slack variable to each inequality constraint in order to transform it into an equality constraint. Its disadvantage is that when the search trajectory is inside a feasible region, some satisfied constraints may still pose some effect on the Lagrangian function, leading to possible oscillations and divergence when a local minimum lies on the boundary of the feasible region. To overcome this problem, we propose the MaxQ method that carries no effect on satisfied constraints. Hence, minimizing the Lagrangian function in a feasible region always leads to a local minimum of the objective function. We also study some strategies to speed up its convergence. Second, we study methods to improve the convergence speed of Lagrangian methods without affecting the solution quality. This is done by an adaptive-control strategy that dynamically adjusts the relative weights between the objective and the Lagrangian part, leading to better balance between the two and faster convergence. Third, we study a trace-based method to pull the search trajectory from one saddle point to another in a continuous fashion without restarts. This overcomes one of the problems in existing Lagrangian methods that converges only to one saddle points and requires random restarts to look for new saddle points, often missing good saddle points in the vicinity of saddle points already found. Finally, we describe a prototype Novel (Nonlinear Optimization via External Lead) that implements our proposed strategies and present improved solutions in solving a collection of benchmarks.

J65
Benjamin W. Wah ``Generalization and Generalizability Measures,'' IEEE Transactions on Knowledge and Data Engineering, vol. 11, no. 1, Jan-Feb 1999, pp. 175-186.

In this paper, we define the generalization problem, summarize various approaches in generalization, identify the credit assignment problem, and present the problem and some solutions in measuring generalizability. We discuss anomalies in the ordering of hypotheses in a subdomain when performance is normalized and averaged, and show conditions under which anomalies can be eliminated. To generalize performance across subdomains, we present a measure called probability of win that measures the probability whether a hypothesis is better than another. Finally, we discuss some limitations in using probabilities of win and illustrate their application in finding new parameter values for TimberWolf, a package for VLSI cell placement and routing.

J66
Benjamin W. Wah, Y. Shang, and Zhe Wu ``Discrete Lagrangian Methods for Optimizing the Design of Multiplierless QMF Banks,'' IEEE Trans. on Circuits and Systems, Part II, vol. 46, no. 9, Sept. 1999, pp. 1179-1191.

In this paper, we present a new discrete Lagrangian method for designing multiplierless QMF (quadrature mirror filter) banks. The filter coefficients in these filter banks are in powers-of-two (PO2), where numbers are represented as sums or differences of powers of two (also called Canonical Signed Digit--CSD--representation), and multiplications are carried out as additions, subtractions and shifts. We formulate the design problem as a nonlinear discrete constrained optimization problem, using reconstruction error as the objective, and stopband and passband energies, stopband and passband ripples and transition bandwidth as constraints. Using the performance of the best existing designs as constraints, we search for designs that improve over the best existing designs with respect to all the performance metrics. We propose a new discrete Lagrangian method for finding good designs, and study methods to improve the convergence speed of Lagrangian methods without affecting their solution quality. This is done by adjusting dynamically the relative weights between the objective and the Lagrangian part. We show that our method can find designs that improve over Johnston's benchmark designs using a maximum of three to six ONE bits in each filter coefficient instead of using floating-point representations. Our approach is general and is applicable to the design of other types of multiplierless filter banks.

J67
Benjamin W. Wah and Zhe Wu ``Discrete Lagrangian Methods for Designing Multiplierless Two-Channel PR-LP Filter Banks,'' Journal of VLSI Signal Processing, Kluwer Academic Press, volume 21, no. 2, June 1999, pp. 131-150

In this paper, we present a new search method based on the theory of discrete Lagrange multipliers for designing multiplierless PR (perfect reconstruction) LP (linear phase) filter banks. To satisfy the PR constraints, we choose a lattice structure that, under certain conditions, can guarantee the resulting two filters to be a PR pair. Unlike the design of multiplierless QMF filter banks that represents filter coefficients directly using PO2 (powers-of-two) form (also called Canonical Signed Digit or CSD representation), we use PO2 forms to represent the parameters associated with the lattice structure. By representing these parameters as sums or differences of powers of two, multiplications can be carried out as additions, subtractions, and shifts. Using the lattice representation, we decompose the design problem into a sequence of four subproblems. The first two subproblems find a good starting point with continuous parameters using a single-objective, multi-constraint formulation. The last two subproblems first transform the continuous solution found by the second subproblem into a PO2 form, then search for a design in a mixed-integer space. We propose a new search method based on the theory of discrete Lagrange multipliers for finding good designs, and study methods to improve its convergence speed by adjusting dynamically the relative weights between the objective and the Lagrangian part. We show that our method can find good designs using at most four terms in PO2 form in each lattice parameter. Our approach is unique because our results are the first successful designs of multiplierless PR-LP filter banks. It is general because it is applicable to the design of other types of multiplierless filter banks.

J68
Benjamin W. Wah and Dong Lin, ``Transformation-Based Reconstruction For Real-Time Voice Transmissions Over the Internet,'' IEEE Trans. on Multimedia, IEEE, vol. 1, no. 4, Dec. 1999, pp. 342-351.

In this paper, we explore the loss behavior encountered in transmitting real-time voice over the Internet and propose a new loss-concealment scheme to improve its received quality. One known technique to conceal loss is to send interleaved streams of voice samples and reconstruct missing or late samples by interpolation at the receiver. Based on this method, we propose a new transformation-based reconstruction algorithm. Its basic idea is for the sender to transform an input voice stream, according to the interpolation method used at the receiver and the predicted loss behavior, before interleaving the stream. The transformation is derived by minimizing reconstruction error in case of loss. We show that our method is computationally efficient and can be extended to various interleaving factors and interpolation- based reconstruction methods. Finally, we show performance improvements of our method by testing it over the Internet.

J69
Benjamin W. Wah, Tao Wang, Yi Shang, and Zhe Wu, ``Improving the Performance of Weighted Lagrange-Multiplier Methods for Nonlinear Constrained Optimization,'' Information Sciences, Elsevier Science Pub. Co., vol. 124, no. 1-4, May 2000, pp. 241-272.

Nonlinear constrained optimization problems in discrete and continuous spaces are an important class of problems studied extensively in operations research and artificial intelligence. These problems can be solved by a Lagrange-multiplier method in continuous space and by an extended discrete Lagrange-multiplier method in discrete space. When constraints are satisfied, these methods rely on gradient descents in the objective space to find high-quality solutions. On the other hand, when constraints are violated, these methods rely on gradient ascents in the Lagrange-multiplier space in order to increase the penalties on unsatisfied constraints and to force these constraints into satisfaction. The balance between gradient descents and gradient ascents depends on the relative weights between the objective function and the constraints, which indirectly control the convergence speed and solution quality of the Lagrangian method. To improve convergence speed without degrading solution quality, we propose an algorithm to dynamically control the relative weights between the objective and the constraints. Starting from an initial weight, the algorithm automatically adjusts the weights based on the behavior of the search progress. With this strategy, we are able to eliminate divergence, reduce oscillations, and speed up convergence. We show improved convergence behavior of our proposed algorithm on both nonlinear continuous and discrete problems.

J70
Benjamin W. Wah and Tao Wang, ``Tuning Strategies of Constrained Simulated Annealing for Nonlinear Global Optimization,'' Int'l J. of Artificial Intelligence Tools, World Scientific Publishing Co. Pte. Ltd., vol. 9, no. 1, March 2000, pp. 3-25

This paper studies various strategies in constrained simulated annealing (CSA), a global optimization algorithm that achieves asymptotic convergence to constrained global minima (CGM) with probability one for solving discrete constrained nonlinear programming problems (NLPs). The algorithm is based on the necessary and sufficient condition for discrete constrained local minima (CLM) in the theory of discrete Lagrange multipliers and its extensions to continuous and mixed-integer constrained NLPs. The strategies studied include adaptive neighborhoods, distributions to control sampling, acceptance probabilities, and cooling schedules. We report much better solutions than the best-known solutions in the literature on two sets of continuous benchmarks and their discretized versions.

J71
Benjamin W. Wah and Zhe Wu, ``Penalty Formulations and Trap-Avoidance Strategies for Solving Hard Satisfiability Problems,'' J. of Computer Science and Technology, Springer-Verlag, vol. 20, no. 1, Jan. 2005, pp. 3-17

In this paper we study the solution of SAT problems formulated as discrete decision and discrete constrained optimization problems. Constrained formulations are better than traditional unconstrained formulations because violated constraints may provide additional forces to lead a search towards a satisfiable assignment. We summarize the theory of extended saddle points in penalty formulations for solving discrete constrained optimization problems and the associated discrete penalty method (DPM). We then examine various formulations of the objective function, choices of neighborhood in DPM, strategies for updating penalties, and heuristics for avoiding traps. Experimental evaluations on hard benchmark instances pinpoint that traps contribute significantly to the inefficiency of DPM and force a trajectory to repeatedly visit the same set of or nearby points in the original variable space. To address this issue, we propose and study two trap-avoidance strategies. The first strategy adds extra penalties on unsatisfied clauses inside a trap, leading to very large penalties for unsatisfied clauses that are trapped more often and making these clauses more likely to be satisfied in the future. The second strategy stores information on points visited before, whether inside traps or not, and avoids visiting points that are close to points visited before. It can be implemented by modifying the penalty function in such a way that, if a trajectory gets close to points visited before, an extra penalty will take effect and force the trajectory to a new region. It specializes to the first strategy because traps are special cases of points visited before. Finally, we show experimental results on evaluating benchmarks in the DIMACS and SATLIB archives and compare our results with existing results on GSAT, WalkSAT, LSDL, and Grasp. The results demonstrate that DPM with trap avoidance is robust as well as effective for solving hard SAT problems.

J72
Xiao Su and Benjamin W. Wah, ``Multi-Description Video Streaming with Optimized Reconstruction-Based DCT and Neural-Network Compensations,'' IEEE Trans. on Multimedia, vol. 3, no. 1, March 2001, pp. 123-131.

Packet and compression losses are two sources of quality losses when streaming compressed video over unreliable IP networks, such as the Internet. In this paper, we propose two new approaches for concealing such losses. First, we present a joint sender-receiver approach for designing transforms in multi-description coding (MDC). In the receiver, we use a simple interpolation-based reconstruction algorithm, as sophisticated concealment techniques cannot be employed in real time. In the sender, we design an optimized reconstruction-based discrete cosine transform (ORB-DCT) with an objective of minimizing the mean squared error, assuming that some of the descriptions are lost and that the missing information is reconstructed by simple averaging at the destination. Second, we propose an artificial neural network to compensate for compression losses introduced in MDC. Experimental results show that our proposed algorithms perform well in real Internet tests.

J73
Benjamin W. Wah and Minglun Qian, ``Violation-Guided Neural-Network Learning for Constrained Formulations in Time-Series Predictions,'' Int'l Journal on Computational Intelligence and Applications, World Scientific, vol. 1, no. 4, Dec. 2001, pp. 383-398.

Time-series predictions by artificial neural networks (ANNs) are traditionally formulated as unconstrained optimization problems. As an unconstrained formulation provides little guidance on search directions when a search gets stuck in a poor local minimum, we have proposed to use a constrained formulation in order to use constraint violations to provide additional guidance. In this paper, we formulate ANN learning with cross-validations for time-series predictions as a non-differentiable nonlinear constrained optimization problem. Based on our theory of Lagrange multipliers for discrete constrained optimization, we propose an efficient learning algorithm, called violation guided back-propagation (VGBP), that computes an approximate gradient using back-propagation (BP), that introduces annealing to avoid blind acceptance of trial points, and that applies a relax-and-tighten (R&T) strategy to achieve faster convergence. Extensive experimental results on well-known benchmarks, when compared to previous work, show one to two orders-of-magnitude improvement in prediction quality, while using less weights.

J74
Yi Shang, Longzhuang Li, and Benjamin W. Wah, ``Optimization Design of Biorthogonal Filter Banks for Image Compression,'' Information Sciences, Elsevier Science Pub., vol. 132, no. 1, Feb. 2001, pp. 23-51.

In this paper, we present a new approach for designing filter banks for image compression. This approach has two major components: optimization and generalization. In the optimization phase, we formulate the design problem as a nonlinear optimization problem whose objective consists of both the performance metrics of the image coder, such as the peak signal-to-noise ratio (PSNR), and those of individual filters. Filter banks are optimized in the optimization phase based on a set of training images. In the generalization phase, the filter bank that can be generalized to other images is selected from the candidates obtained in the optimization phase to be the final result. The filter bank selected should perform well not only on the training examples used in the design process but also on test cases not seen. In contrast to existing methods that design filter banks independently from the other operations in an image compression algorithm, our approach allows us to find filter banks that work best in a specific image compression algorithm for a certain class of images. In system prototype development, we adopt the agent-based approach to achieve better modularity, portability, and scalability. Agents in the multi-agent system are specialized in performing problem formulation, image compression, optimization, and generalization. In the experiments, we show that on a set of benchmark images our approach has found filter banks that perform better than the existing filter banks in different image compression algorithms and at different compression ratios.

J75
Xiao Su and Benjamin W. Wah, ``Reconstruction-Based Subband Image Coding for UDP Transmissions over the Internet,'' Journal of VLSI Signal Processing, Kluwer Academic Press, vol. 34, no. 1-2, May-June 2003, pp. 29-48.

Quality-delay trade-offs can be made in transmitting subband-coded images in the Internet by using either the TCP or the UDP protocol. Delivery by TCP gives superior decoding quality but with very long delays when the network is unreliable, whereas delivery by UDP has negligible delays but with degraded quality when packets are lost. Although images are delivered primarily by TCP today, we study in this paper the use of UDP to deliver multi-description reconstruction-based subband-coded images and the reconstruction of missing information at the receiver based on information received. We first determine empirically the interleaving factors that should be used in order to keep the probability of unrecoverable packet losses sufficiently small. Next, we propose a joint sender-receiver approach for designing transforms in multi-description subband coding. In the receiver, we use a simple interpolation-based reconstruction algorithm, as sophisticated concealment techniques cannot be employed in practice. In the sender, we design an optimized reconstruction-based subband transform (ORB-ST), with an objective of minimizing the mean squared error, assuming that some of the descriptions are lost and that the missing information is reconstructed by simple averaging at the destination. Experimental results show that our proposed ORB-ST performs well in real Internet tests, and UDP delivery of MDC images is an attractive alternative to TCP delivery.

J76
Benjamin W. Wah and Dong Lin, ``LSP-Based Multiple-Description Coding for Real-Time Low Bit-Rate Voice over IP,'' IEEE Trans. on Multimedia, Vol. 7, No. 1, Feb. 2005, pp. 167-178.

A fundamental issue in real-time interactive voice transmissions over unreliable IP networks is the loss or late arrival of packets for playback. This problem is especially serious when transmitting low bit rate-coded speech with pervasive dependencies introduced. In this case, the loss or late arrival of a single packet will lead to the loss of subsequent dependent frames. In this paper, we study end-to-end loss-concealment schemes for ensuring high quality in playback. We propose a novel multiple description-coding method for concealing packet losses in transmitting low bit rate-coded speech. Based on high correlations observed in linear predictor parameters--in the form of Line Spectral Paris (LSPs)--of adjacent frames, we generate multiple descriptions in senders by interleaving LSPs, and reconstruct lost LSPs in receivers by linear interpolations. As excitation codewords have low correlations, we further enlarge the segment size for excitation generation and replicate excitation codewords in all the descriptions in order to maintain the same transmission bandwidth. Our proposed scheme can be extended easily to more than two descriptions and can adapt its number of descriptions dynamically to network-loss conditions. Experimental results on FS-1016 CELP, ITU G.723.1, and FS MELP coders show good performance of our scheme.

J77
Xindong Wu, Philip S. Yu, Gregory Piatetsky-Shapiro, Nick Cercone, T. Y. Lin, Ramamohanarao Kotagiri, and Benjamin W. Wah,
``Data Mining: How Research Meets Practical Development,''
Knowledge and Information Systems: An International Journal, Springer-Verlag, vol. 5, no. 2, April 2003, pp. 248-261.
At the 2001 IEEE International Conference on Data Mining in San Jose, California on November 29 - December 2, 2001, there was a panel discussion on how data mining research meets practical development. One of the motivations for organizing the panel discussion was to provide useful advice for industrial people to explore their directions in data mining development. Based on the panel discussion, this paper presents the views and arguments from the panel members, the Conference Chair and the Program Committee Co-Chairs. These people as a group have both academic and industrial experiences in different data mining related areas such as databases, machine learning, and neural networks. We will answer such as (1) how far is data mining from practical development, (2) how data mining research differs from practical development, and (3) what are the most promising areas in data mining for practical development?

J78
Benjamin W. Wah and Yixin Chen,
``Hybrid Evolutionary and Annealing Algorithms for Nonlinear Discrete Constrained Optimization,''
Int'l Journal of Computational Intelligence and Applications, Imperial College Press, UK, vol. 3, no. 4, Dec. 2003, pp. 331-355.

This paper presents a procedural framework that unifies various mechanisms to look for discrete-neighborhood saddle points in solving discrete constrained optimization problems (DCOPs). Our approach is based on the necessary and sufficient condition on local optimality in discrete space, which shows the one-to-one correspondence between the discrete-space constrained local minima of a problem and the saddle points of the corresponding Lagrangian function. To look for such saddle points, we study various mechanisms for performing ascents of the Lagrangian function in the original-variable subspace and descents in the Lagrange-multiplier subspace. Our results show that CSAEA, a combined constrained simulated annealing and evolutionary algorithm, performs well when using mutations and crossovers to generate trial points and accepting them based on the Metropolis probability. We apply iterative deepening to determine the optimal number of generations in CSAEA and show that its performance is robust with respect to changes in population size. To test the performance of the procedures developed, we apply them to solve some continuous and mixed-integer nonlinear programming (NLP) benchmarks and show that they obtain better results than those of existing algorithms.

J79
Benjamin W. Wah and Yixin Chen,
``Constraint Partitioning in Penalty Formulations for Solving Temporal Planning Problems,''
Artificial Intelligence, Elsevier, vol. 170, no. 3, pp. 187-231, 2006.

In this paper, we study the partitioning of constraints in temporal plan- ning problems formulated as mixed-integer nonlinear programming (MINLP) problems. Constraint partitioning is attractive because it leads to much easier subproblems, where each is a significant relaxation of the original problem. Moreover, each subproblem is very similar to the original problem and can be solved by any existing solver with little or no modification. Constraint partitioning, however, introduces global constraints that may be violated when subproblems are evaluated independently. To reduce the over- head in resolving such global constraints, we develop in this paper new conditions and algorithms for limiting the search space to be backtracked in each subproblem. Using a penalty formulation of a MINLP where the con- straint functions of the MINLP are transformed into non-negative functions, we present a necessary and sufficient extended saddle-point condition (ES- PC) for constrained local minimization. When the penalties are larger than some thresholds, our theory shows a one-to-one correspondence between a constrained local minimum of the MINLP and an extended saddle point of the penalty function. Hence, one way to find a constrained local minimum is to increase gradually the penalties of those violated constraints and to look for a local minimum of the penalty function using any existing algorithm until a solution to the constrained model is found. Next, we extend the ESPC to constraint-partitioned MINLPs and propose a partition-and-resolve strategy for resolving violated global constraints across subproblems. Us- ing the discrete-space ASPEN and the mixed-space MIPS planners to solve subproblems, we show significant improvements on some planning benchmarks, both in terms of the quality of the plans generated and the execution times to find them.

J80
Benjamin W. Wah and Yixin Chen,
``Subgoal Partitioning and Global Search for Solving Temporal Planning Problems in Mixed Space,''
Int'l J. of Artificial Intelligence Tools, World Scientific Publishing Co. Pte. Ltd., vol. 13, no. 4, Dec. 2004, pp. 767-790.

We study in this paper the partitioning of the constraints of a temporal planning problem by subgoals, their sequential evaluation before parallelizing the actions, and the resolution of inconsistent global constraints across subgoals. Using a Lagrangian formulation and the theory of extended saddle points, we propose a global-search strategy that looks for local minima in the original-variable space of the Lagrangian function and for local maxima in the Lagrange-multiplier space. Our approach improves over a previous scheme that partitions constraints along the temporal horizon. The previous scheme leads to many global constraints that relate states in adjacent stages, which means that an incorrect assignment of states in an earlier stage of the horizon may violate a global constraint in a later stage of the horizon. To resolve the violated global constraint in this case, state changes will need to propagate sequentially through multiple stages, often leading to a search that gets stuck in an infeasible point for an extended period of time. In this paper, we propose to partition all the constraints by subgoals and to add new global constraints in order to ensure that state assignments of a subgoal are consistent with those in other subgoals. Such an approach allows the information on incorrect state assignments in one subgoal to propagate quickly to other subgoals. Using MIPS as the basic planner in a partitioned implementation, we demonstrate significant improvements in time and quality in solving some PDDL2.1 benchmark problems.

J81
Xiao Su and Benjamin W. Wah,
``Loss Aware Rate Allocations in H.263 Coded Video Transmissions,''
Journal of Circuits, Systems, and Computers, World Scientific Publishing Co. Pte. Ltd., vol. 14, no. 6, Dec. 2005. pp. 1157-1171.

For packet video, information loss and bandwidth limitation are two factors that affect video playback quality. Traditional rate allocation approaches have focused on optimizing video quality under bandwidth constraint alone. However, in the best-effort Internet, packets carrying video data are susceptible to losses, which need to be reconstructed at the receiver side. In this paper, we propose loss aware rate allocations in both group-of-block (GOB) level and macroblock level, given that certain packets are lost during transmissions and reconstructed using simple interpolation methods at the receiver side. Experimental results show that our proposed algorithms can produce videos of higher quality when sent over lossy Internet.

J82
J. W. Han, Y. X. Chen, G. Dong, J. Pei, B. W. Wah, J Wang, and Y. D. Cai,
``The Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams,''
Distributed and Parallel Databases: Special Issue on Data Warehousing and Data Streams, Springer-Verlag, Sept. 2005.

Real-time surveillance systems, telecommunication systems, and other dynamic environments often generate tremendous (potentially infinite) volume of stream data: the volume is too huge to be scanned multiple times. Much of such data resides at rather lowlevel of abstraction, whereas most analysts are interested in relatively high-level dynamic changes (such as trends and outliers). To discover such high-level characteristics, one may need to perform on-line multi-level, multi-dimensional analytical processing of stream data. In this paper, we propose an architecture, called stream cube, to facilitate on-line, multi-dimensional, multi-level analysis of stream data. For fast online multi-dimensional analysis of stream data, three important techniques are proposed for efficient and effective computation of stream cubes. First, a tilted time frame model is proposed as a multi-resolution model to register time-related data: the more recent data are registered at finer resolution, whereas the more distant data are registered at coarser resolution. This design reduces the overall storage of time-related data and adapts nicely to the data analysis tasks commonly encountered in practice. Second, instead of materializing cuboids at all levels, we propose to maintain a small number of critical layers. Flexible analysis can be efficiently performed based on the concept of observation layer and minimal interesting layer. Third, an efficient stream data cubing algorithm is developed which computes only the layers (cuboids) along a popular path and leaves the other cuboids for query-driven, on-line computation. Based on this design methodology, stream data cube can be constructed and maintained incrementally with a reasonable amount of memory, computation cost, and query response time. This is verified by our substantial performance study. Stream data cube architecture facilitates online analytical processing of stream data. It also forms 36 a preliminary data structure for online stream data mining. The impact of the design and implementation of stream data cube in the context of stream data mining is also discussed in the paper.

J83
Y. X. Chen, C. W. Hsu, and B. W. Wah,
``Temporal Planning using Subgoal Partitioning and Resolution in SGPlan,''
J. of Artificial Intelligence Research, AAAI Press, vol. 26, Aug. 2006, pp.323-369.

In this paper, we present the partitioning of mutual-exclusion (mutex) constraints in temporal planning problems and its implementation in the SGPlan planner. Based on the strong locality of mutex constraints observed in many benchmarks of the Fourth International Planning Competition (IPC4), we propose to partition the constraints of a planning problem into groups based on their subgoals. Constraint partitioning leads to significantly easier subproblems that are similar to the original problem and that can be efficiently solved by the same planner with some modifications to its objective function. We present a partition-and-resolve strategy that looks for locally optimal subplans in constraint-partitioned temporal planning subproblems and that resolves those inconsistent global constraints across the subproblems. We also discuss some implementation details of SGPlan, which include the resolution of violated global constraints, techniques for handling producible resources, landmark analysis, path finding and optimization, search-space reduction, and modifications of Metric-FF when used as a basic planner in SGPlan. Last, we show results on the sensitivity of each of these techniques in quality-time trade-offs and experimentally demonstrate that SGPlan is effective for solving the IPC3 and IPC4 benchmarks.

J84
B. W. Wah and D. Xin,
``Optimization of Bounds in Temporal Flexible Plans with Dynamic Controllability,''
Int'l J. of Artificial Intelligence Tools, World Scientific Publishing Co. Pte. Ltd., Feb. 2007, vol. 16, no. 1, pp. 17-44.

A temporal flexible planning problem that involves contingent and requirement events can be formulated as a simple temporal network with uncertainty (STNU). An STNU is controllable when there is a strategy for executing the requirement events (or actions) in such a way that all the conditions involving contingent events can be satisfied in all situations. The most interesting and useful controllability property is dynamic controllability in which the remaining actions in an STNU can always be scheduled under all possible feasible durations of future contingent events when all the past contingent events are known. In this paper, we propose and study a novel problem of assigning bounds on the duration of each requirement link in order for the resulting STNU to be dynamically controllable and to minimize the total cost over the allowed durations of all requirement links. We first prove the NP hardness of the problem with a linear cost function. We then formulate the dynamic controllability of an STNU as the constraints in a nonlinear optimization problem. Finally, we present methods for reducing the number of constraints in order to make the problem tractable and to demonstrate the computational performance of our methods.

J85
Y. X. Chen, G. Dong, J. W. Han, J. Pei, B. W. Wah, and J. Y. Wang,
``Regression Cubes with Lossless Compression and Aggregation,''
IEEE Trans. on Knowledge and Data Engineering, vol. 18, no. 12, Dec. 2006, pp. 1585-1599.

As OLAP engines being widely used nowadays to support multidimensional data analysis and intelligent decision making, it is desirable to support in data cubes advanced statistical measures, such as regression and filtering measures, in addition to the traditional simple measures such as count and average. Such advanced measures will allow users to model, smooth, and predict the trends and patterns of data. Although raw data is generally collected at the lowest level, an analyst is often more interested in generating higher and multiple levels of abstraction without accessing the raw data. Efficient computation of aggregated statistical measures in multidimensional spaces is a challenging problem as existing aggregation algorithms for simple distributive and algebraic measures are inadequate for such computation.

In this paper, we propose the concept of regression cubes in which regression models can be obtained for any data cuboid efficiently without accessing the raw data. The traditional classification of measures into distributive, algebraic, and holistic measures cannot accommodate advanced statistical models such as regression and filters. In this paper, we propose a fundamentally new class of measures, compressible measures, in order to support efficient computation of the statistical models. For compressible measures, we can efficiently compute the statistical models at an arbitrary data cell, while each data cell is compressed into a small auxiliary matrix with a fixed size independent of the number of tuples. We can then compute the statistical measures for any data cell from the compressed data of the lower-level cells without accessing the raw data. Time- and space-efficient lossless aggregation formulae are derived for regression and filtering measures. Our analytical and experimental studies show that the proposed approach substantially reduces the memory usage and the overall response time for statistical analysis of multidimensional data.

J86
C. W. Hsu, Y. X. Chen, and B. W. Wah,
``Subgoal Ordering and Granularity Control for Incremental Planning,''
Int'l J. of Artificial Intelligence Tools, World Scientific Publishing Co. Pte. Ltd., vol. 16, no. 4, Aug. 2007, pp. 707-723.

In this paper, we study strategies in incremental planning for ordering and grouping subproblems partitioned by the subgoals of a planning problem. To generate a rich set of partial orders for ordering subproblems, we propose an algorithm based on a relaxed plan that ignores the delete lists. The new algorithm considers both the initial and the goal states and can effectively order subgoals in such a way that greatly reduces the number of invalidations during incremental planning. We have also considered trade-offs between the granularity of the subgoal sets and the complexity of solving the overall planning problem. We propose an efficient strategy for dynamically adjusting the grain size in partitioning in order to minimize the total complexity. We further evaluate a redundant-ordering scheme that uses two different subgoal orders to improve the solution quality, without greatly sacrificing run-time efficiency. Experimental results on using Metric-FF, YAHSP, and LPG-TD-speed as the embedded planners in incremental planning show that our strategies are general for improving the time and quality of these planners across various benchmarks. Finally, we compare the performance of the three planners, the incremental versions using these planners as embedded planners, and SGPlan4.1.

J87
D. Xin, J. W. Han, X Li, Z. Shao, and B. W. Wah ``Computing Iceberg Cubes by Top-Down and Bottom-Up Integration: The StarCubing Approach,''
IEEE Trans. on Knowledge and Data Engineering, vol. 19, no. 1, Jan. 2007, pp. 111-126.

Data cube computation is one of the most essential but expensive operations in data warehousing. Previous studies have developed two major approaches, top-down versus bottom-up. The former, represented by the MultiWay Array Cube (called the MultiWay) algorithm, aggregates simultaneously on multiple dimensions; however, it cannot take advantage of a priori pruning when computing iceberg cubes (cubes that contain only aggregate cells whose measure values satisfy a threshold, called the iceberg condition). The latter, represented by BUC, computes the iceberg cube bottom-up and facilitates a priori pruning. BUC explores fast sorting and partitioning techniques; however, it does not fully explore multidimensional simultaneous aggregation. In this paper, we present a new method, Star-Cubing, that integrates the strengths of the previous two algorithms and performs aggregations on multiple dimensions simultaneously. It utilizes a star-tree structure, extends the simultaneous aggregation methods, and enables the pruning of the group-bys that do not satisfy the iceberg condition. Our performance study shows that Star-Cubing is highly efficient and outperforms the previous methods.

J88
B. W. Wah and M. L. Qian,
``Constrained Formulations and Algorithms for Predicting Stock Prices by Recurrent FIR Neural Networks,''
Int'l J. of Information Technology and Decision Making, World Scientific Pub. Co. vol. 5, no. 4, Dec. 2006, pp. 638-658.

In this paper, we develop a new constrained artificial-neural-network (ANN) formulation and the associated learning algorithm for predicting stock prices, a difficult time-series prediction problem. We characterize daily stock prices as a noisy non-stationary time series and identify its predictable low-frequency components. Using a recurrent FIR (finite-impulse-response) ANN, we formulate the learning problem as a constrained optimization problem, develop constraints for incorporating cross validations, and solve the learning problem using algorithms based on the theory of extended saddle points for nonlinear constrained optimization. Finally, we illustrate our prediction results on ten stock-price time series. Our main contributions in this paper are the channel-specific low-pass filtering of noisy time series obtained by wavelet decomposition, the transformation of the low-pass signals to improve their stationarity, and the incorporation of constraints on cross validation that can improve the accuracy of predictions. Our experimental results demonstrate good prediction accuracy and annual returns.

J89
B. W. Wah, Y. X. Chen, and T. Wang,
``Simulated Annealing with Asymptotic Convergence for Nonlinear Constrained Optimization,''
Journal of Global Optimization, Springer-Verlag, vol. 39, pp. 1-37, 2007.

In this paper, we present constrained simulated annealing (CSA), an algorithm that extends conventional simulated annealing to look for constrained local minima of nonlinear constrained optimization problems. The algorithm is based on the theory of extended saddle points (ESPs) that shows the one-to-one correspondence between a constrained local minimum and an ESP of the corresponding penalty function. CSA finds ESPs by systematically controlling probabilistic descents in the problem-variable subspace of the penalty function and probabilistic ascents in the penalty subspace. Based on the decomposition of the necessary and sufficient ESP condition into multiple necessary conditions, we present constraint-partitioned simulated annealing (CPSA) that exploits the locality of constraints in nonlinear optimization problems. CPSA leads to much lower complexity as compared to that of CSA by partitioning the constraints of a problem into significantly simpler subproblems, solving each independently, and resolving those violated global constraints across the subproblems. We prove that both CSA and CPSA asymptotically converge to a constrained global minimum with probability one in discrete optimization problems. The result extends conventional simulated annealing (SA), which guarantees asymptotic convergence in discrete unconstrained optimization, to that in discrete constrained optimization. Moreover, it establishes the condition under which optimal solutions can be found in constraint-partitioned nonlinear optimization problems. Finally, we evaluate CSA and CPSA by applying them to solve some continuous constrained optimization benchmarks and compare their performance to that of other penalty methods.

CO71
Lon-Chan Chu and Benjamin W. Wah, ``Optimization in Real Time,'' Proc. IEEE Real Time Systems Symposium, Nov. 1991, pp. 150-159.

In this paper, we present a search algorithm called real-time search for solving combinatorial optimization problems under real-time constraints. The problems we study are NP-hard and have good heuristic algorithms for generating feasible solutions. The algorithm we develop aims at finding the best possible solution in a given deadline. Since this objective is generally not achievable without first solving the problem, we use an alternative heuristic objective that looks for the solution with the best ascertained approximation degree. Our algorithm schedules a sequence of guided depth-first searches, each searching for a more accurate solution (based on the approximation degree set), or solutions deeper in the search tree (based on the threshold set), or a combination of both. Five versions of the RTS algorithm for setting approximation degrees and/or thresholds are formulated, evaluated, and analyzed. We develop an exponential model for characterizing the relationship between the approximation degrees achieved and the time taken. Our results show that our RTS algorithm finds solutions better than a guided depth-first search, and solutions found are very close to the best possible solution that can be found in the time allowed. We evaluate our results using the symmetric traveling-salesman, the knapsack, and the production-planning problems.

CO76
Kumar Ganapathy and Benjamin Wah, ``Optimal Design of Processor Arrays for Uniform Recurrences,'' Proc. Int'l Conf. on Application-Specific Array Processors, IEEE Computer Society, Aug. 1992, pp. 636-648.

In this paper we present a parameter-based approach for synthesizing systolic architectures from uniform recurrence equations. The scheme presented is a generalization of the Parameter Method proposed by Li and Wah. The approach synthesizes optimal arrays of any lower dimension from a general uniform recurrence description of the problem. In other previous attempts for mapping uniform recurrences into lower-dimensional arrays, optimality of the resulting designs is not guaranteed. As an illustration of the technique, optimal linear arrays for matrix multiplication are given. Due to space limitation, proofs to theorems in this paper are not shown. In a companion paper, we present details of the proof of Theorems 1 and 3 and a different approach for defining the constraints shown in Theorem 2. A detailed design for solving path-finding problems is also presented there.

CO77
Arthur Ieumwananonthachai and Benjamin W. Wah, ``Parallel Statistical Selection in Multiprocessors,'' Proc. Int'l Conf. on Parallel Processing, Pennsylvania State University Press, Aug. 1992, vol. III, pp. 190-194.

In this paper, we develop parallel multistage selection strategies for guiding the concurrent evaluation of candidates from a large pool of candidates, with the objective of finding the candidate with the maximum (resp. minimum) value in a limited amount of time. We present a statistical model for finding the appropriate number of candidates to be sampled and when each should be sampled. We verify the analytical results by simulations and evaluate the performance of strategies that are analytically intractable. Our performance results show that there is linear speedup in parallel processing over selection done sequentially. Results obtained from application of our selection method to learn heuristics for a real-world application demonstrate that heuristics with improved performance can be found by a systematic search of the space of feasible heuristics in limited time. Due to space limitation, we are unable to present the details of the experiments and the exact heuristics found.

CO78
Kumar Ganapathy and Benjamin Wah, ``Synthesizing Optimal Lower Dimensional Processor Arrays,'' Proc. Int'l Conf. on Parallel Processing, CRC Press, vol. 3 (Aug. 1992), pp. 96-103.

Most existing methods for synthesizing systolic architectures can only map n-dimensional recurrences to n-1-dimensional arrays. In this paper, we generalize the parameter-based approach of Li and Wah to map n-dimensional uniform recurrences to any k-dimensional processor arrays, where k
CO80
Lon-Chan Chu and Benjamin W. Wah, ``Band Search: An Efficient Alternative to Guided Depth-first Search,'' Proc. 4th IEEE Int'l Conf. on Tools with Artificial Intelligence (Nov. 1992), pp. 154-161.

In this paper, we propose a novel search algorithm called band search that generalizes guided depth-first and best-first searches. The search allocates a band of at most W nodes in each level of the search tree for storing active nodes in that level, D priority lists, one for each level, for storing overflow nodes, and D counters, one for each level, for keeping track of and to limit the degree of backtracking allowed. The algorithm has three major features: a) it selects for expansion in a best-first fashion from all nodes in the bands, b) it moves nodes from an overflow list into the corresponding band in a depth-first fashion, and c) it restricts backtracking so that at most W nodes in a level are fully searched before allowing new nodes in this level into the band. The algorithm specializes to be a guided depth-first search (GDFS) when W is one, and a best-first search (BFS) when W is unlimited. By choosing W appropriately, we show empirically that the algorithm can often outperform GDFS and has performance close to that of BFS. Moreover, the algorithm can be used instead of GDFS in iterative searches, such as IDA*, MIDA*, and DFS*. Finally, we identify conditions upon which the algorithm behaves like BFS, and the anomalous behavior when the algorithm performs worse than GDFS.

CO81
Steven R. Schwartz and Benjamin W. Wah, ``Automated Parameter Tuning in Stereo Vision under Time Constraints,'' Proc. 4th Int'l Conf. on Tools with Artificial Intelligence (Nov. 1992), pp. 162-169.

This paper presents a method for tuning parameters under a fixed time constraint for a general binocular stereo-vision algorithm. A major difficulty in stereo vision, as well as in other vision algorithms, lies in adjusting the large variety of parameters for maximizing performance. This effort is usually performed by human experts with a minimum of formal guidelines. To automate this process, we develop Teacher 4.2, a generate-and-test system that systematically generates new parameter values by analyzing the results of previous tests, and that performs limited and controlled tests on the candidates generated using high-speed computers. The system is modeled as a statistical selection problem operating under a given time constraint. It divides the time allowed into stages, where promising parameter-value sets found in one stage are passed to the next stage for further testing, and selects the parameter-value set deemed best by the final stage as the result. We show experimentally that our system can find new parameter-value sets which in some cases are better than the ones originally found by extensive hand-tuning and commonly used heuristics. Our experiments further show that different parameter values may be required under different objectives and performance constraints. Our system provides an automated tool for generating new parameters that can be part of an adaptive stereo-vision system, capable of adapting to changing algorithm specifications as well as changing goals and target domains.

CO85
M. Aboelaze, D. Lee, and B. W. Wah, ``Two-Dimensional Digital Filtering Using Constant-I/O Systolic Arrays,'' Proc. IEEE Int'l Symposium on Circuits and Systems (March 1993), Chicago, IL, pp. 255-258.

We present in this paper systolic arrays with constant number of input/output (I/O) ports for two-dimensional (2-D) FIR and IIR filtering. Our design has an array of LxN processing elements (PE's), where L (less than or equal to N) is a technology-dependent parameter related to the number of I/O ports. Each PE in our design has a microprogrammed arithmetic logic unit (ALU), a control unit, a fixed number of I/O buffers, and O(N/L) memory. Our design specializes to a square mesh when L=N, and a linear array when L=1. It can implement both FIR and IIR filtering in O(N2M/L) time which is asymptotically optimal.

CO86
Pankaj Mehra and Benjamin Wah, ``Automated Learning of Workload Measures for Load Balancing on a Distributed System,'' Proc. Int'l Conference on Parallel Processing, CRC Press, Aug. 1993, pp. III-263-III-270.

Load-balancing systems use workload indices to dynamically schedule jobs. We present a novel method of automatically learning such indices. Our approach uses comparator neural networks, one per site, which learn to predict the relative speedup of an incoming job using only the resource-utilization patterns observed prior to the job's arrival. Our load indices combine information from the key resources of contention: CPU, disk, network, and memory. Our learning algorithm overcomes the lack of job-specific information by learning to compare the relative speedups of different sites with respect to the same job, rather than attempting to predict absolute speedups. We present conditions under which such learning is viable. Our results show that indices learnt using comparator networks correctly pick the best destination in most cases when incoming jobs are short; accuracy degrades as execution time increases.

CO87
Wei-jia Shang and Benjamin W. Wah, ``Dependence Analysis and Architecture Design for Bit-Level Algorithms,'' Proc. Int'l Conference on Parallel Processing, CRC Press, Aug. 1993, pp. I-30-I-38.

In designing application-specific bit-level architectures and in programming existing bit-level processor arrays, it is necessary to expand a word-level algorithm into its bit-level form before dependence analysis can be performed. In this paper, we consider dependence structures of bit-level algorithms as functions of three components -- dependence structures of word-level algorithms, dependence structures of the arithmetic algorithms implementing word-wise operations, and algorithm expansions. Based on these components, we can derive dependence structures of bit-level algorithms without using time consuming general dependence analysis methods. To illustrate our approach, we derive two dependence structures for bit-level matrix multiplication and apply a method developed earlier to design two bit-level architectures. One of these architectures is O(p) times faster than the best word-level architecture, where p is the word length. The speedup we found here is true in general because a bit in a bit-level architecture goes to the next processor for processing as soon as it is available.

CO88
A. Aizawa and B. W. Wah, ``Scheduling of Genetic Algorithms in a Noisy Environment,'' Proc. Int'l Conf. on Genetic Algorithms (July 1993), Int'l Soc. for Genetic Algorithms, pp. 48-55.

In this paper, we present efficient algorithms for adjusting configuration parameters of genetic algorithms that operate in a noisy environment. Assuming that the population size is given, we address two problems specifically important in a noisy environment. First, we study the duration-sizing problem that determines dynamically the duration of each generation. Next, we study the sample-allocation (sizing) problem that determines adaptively the number of evaluations taken from each population in a generation. For these two problems, we model the search process as a statistical selection process and derive equations useful for controlling the duration and the sample sizes. Our result shows that these adaptive procedures improve the performance of genetic algorithms over those of commonly used static ones.

CO89
Vijay Karamcheti and Benjamin W. Wah, ``Scheduling of Dynamic Divide-and-Conquer Computations on Multicomputers,'' Proc. Computers and Software Applications Conference (Nov. 1993), IEEE Computer Society Press, pp. 352-359.

The scheduling of tasks for applications with dynamic behavior traditionally rely on externally observable metrics such as the number of active processes. This paper presents a new scheduling strategy based on the observation that it may be possible to capture the near-term resource requirements of active tasks by expressions involving task parameters. Run-time evaluation of these expressions yields estimates of task behavior that are valid over a short, future interval of time. The heuristics proposed, which when used in conjunction with information supplied by profiling, can be used to annotate the source program with such expressions. Preliminary simulation results show that the use of near-future estimates in a dynamic scheduling strategy for divide-and-conquer algorithms consistently improves over traditional dynamic strategies. The performance of this strategy approaches that of the best-known deterministic strategy while incurring an overhead of the same order as other dynamic strategies.

CO90
Chin-Chi Teng and Benjamin W. Wah, ``Improvement of Supervised Learning by Linear Mapping,'' Proc. IEEE Int'l Joint Conf. on Neural Networks (October 1993), pp. 1697-1700.

In this paper, we propose a new supervised learning method to improve the learning speed for feed-forward neural networks. Our method is different from traditional supervised learning methods (such as back-propagation) that find a one-to-one mapping between the given input pattern matrix and the desired output pattern matrix. Instead, it finds one of the one-to-many mappings between the input matrix and an intermediate output matrix, and transforms the intermediate output matrix to the desired output matrix in one step using linear mapping techniques. Learning is faster with our method because there exist many intermediate output matrices, and learning can stop whenever one such matrix is found. Our extensive experimental results show that our learning algorithm converges to within a reasonable range of error after a few training epochs, making it suitable for dynamic real-time applications in which the network may need to be re-trained periodically.

CO91
Pankaj Mehra and Benjamin Wah, ``Population-Based Learning of Load Balancing Policies for a Distributed Computer System,'' Proc. of Computing in Aerospace 9 Conference (Oct. 19-21, 1993), AIAA, pp. 1120-1130.

Effective load-balancing policies use dynamic resource information to schedule tasks in a distributed computer system. In this paper, we present a novel method for automatically learning such policies. At each site in our system, we use a comparator neural network to predict the relative speedup of an incoming task using only the resource-utilization patterns obtained prior to the task's arrival. Outputs of these comparator networks are broadcast periodically over the distributed system, and the resource schedulers at each site use these values to determine the best site for executing an incoming task. The delays incurred in propagating workload information and tasks from one site to another, as well as the dynamic and unpredictable nature of workloads in multiprogrammed multiprocessors, may cause the workload pattern at the time of execution to differ from patterns prevailing at the times of load-index computation and decision making. Our load-balancing policy accommodates this uncertainty by using certain tunable parameters. We present a population-based machine-learning algorithm that adjusts these parameters in order to achieve high average speedups with respect to local execution. Our learning algorithm overcomes the lack of task-specific information by learning to compare the relative speedups of different sites with respect to the same site, rather than attempting to predict absolute speedups. We present experimental results based on the execution of benchmark programs on a network of Sun workstations connected by a local area network. Our results show that our load-balancing policy, when combined with the comparator neural network for workload characterization, is effective in exploiting idle resources in a distributed computer system.

CO92
Kumar Ganapathy and Benjamin Wah, ``Designing a Coprocessor for Recurrent Computations,'' Proc. 5th IEEE Symposium on Parallel and Distributed Processing (December 1-4, 1993), pp. 806-813.

In this paper, we present the design of an application-specific coprocessor for algorithms that can be modeled as uniform recurrences or ``uniformized'' affine recurrences. The coprocessor has a regular array of processors connected to an access-unit for intermediate storage of data. The distinguishing feature of our approach is that we assume the coprocessor to be interfaced to a standard, slow (single-ported) memory with low bandwidth. Hence, good performance is achieved by effectively exploiting data locality in the applications by the compiler, and the final architecture is chosen by a tradeoff analysis driven by the mapping process. Preliminary results indicate that the coprocessor has significantly lower clock rates or higher performance than that of existing RISC processors and is cost-effective for executing loop computations.

CO93
H. Kitan, W. V. Hahn, L. Hunter, R. Oka, B. W. Wah, and T. Yokoi, ``Grand Challenge AI Applications,'' Proc. 13th Int'l Joint Conf. on Artificial Intelligence (San Mateo, CA, Aug. 1993), Morgan Kaufman Pub., pp. 1677-1683.

This paper presents the position statements of the panelists in a panel discussion at the 13th International Joint Conference on Artificial Intelligence.

CO94
Chin-Chi Teng and Benjamin Wah, ``Mixed-Mode Learning: A Method for Reducing the Number of Hidden Units in Cascade Correlation,'' Proc. Int'l Symposium on Artificial Neural Networks (National Chiao Tung University, Hsinchu, Taiwan, ROC, Dec. 1-4, 1993), pp. I-101--I-107.

In this paper, we present a new learning mechanism called mixed-mode learning. This learning mechanism transforms an existing supervising learning algorithm from one that finds a unique one-to-one mapping into an algorithm that finds one of the many one-to-many mappings. Since the objective of learning is relaxed by this transformation, the number of learning epochs can be reduced significantly. We show in this paper that mixed-mode learning can be applied in the well-known cascade correlation learning algorithm to reduce the number of hidden units required for convergence when applied to classification problems whose desired outputs are binary. Our experimental results confirm this reduction, although they show that more learning time is required than that of cascade correlation. In general, reducing the number of hidden units at the expense of additional learning time is usually desirable.

CO95
Kumar N. Ganapathy and Benjamin Wah, ``Optimizing General Design Objectives in Processor-Array Design,'' Proc. International Parallel Processing Symposium (April 1994), IEEE Computer Society, pp. 295-302.

In this paper we present an improved search procedure for the General Parameter Method (GPM) [GanWah92a, GanWah92b]. Our procedure maps uniform dependence algorithms to application-specific processor arrays (PAs). It can optimize general design objectives with certain non-monotonicity properties, i.e., those that do not increase monotonically with the parameters. An example of such an objective is the minimization of the total completion time, including load and drain times. In contrast, earlier design methods can only deal with monotonic objectives. We present results for the matrix-product problem using this search technique; application of the technique on the transitive-closure problem is presented elsewhere [ForWahShaGan93]. We also show that the parameters in GPM can be expressed in terms of the schedule vector Pi and allocation matrix S in the popular dependence-based method (DM), thereby allowing GPM to be used in DM for finding optimal designs for uniform dependence algorithms.

CO96
Chin-Chi Teng and Benjamin W. Wah, ``An Automated Design System for finding the Minimal Configuration of a Feed-Forward Neural Network,'' Proc. Int'l Conf. on Neural Networks (June 1994), IEEE, vol. 3, pp. 1295-1300.

In this paper, we present a method for finding the minimal configuration of a feed-forward artificial neural network (ANN) for solving a given application problem. We assume that the cascade-correlation (CAS) training algorithm is used to train the weights of the ANNs concerned. Under a given time constraint that is long enough to train tens of ANNs completely, we divide the time into quanta, and present a method for scheduling dynamically the ANN to be trained in each quantum from a pool of partially trained ANNs. Our goal is to find an ANN configuration with smaller number of hidden units as compared to the alternative of applying the CAS algorithm repeatedly to train each ANN to completion before exploring new ANNs. Our system is a population-based generate-and-test method that maintains a population of candidate ANNs, and that selectively train those that are predicted to require smaller configurations. Since it is difficult to predict the exact number of hidden units required when the CAS algorithm terminates, our system compares two partially trained ANNs and predicts which one will converge with a smaller number of hidden units relative to the other. Our prediction mechanism is based on a comparator neural network (CANN) that takes as inputs the TSSE-versus-time behavior of training performed already on two ANNs, and that predicts which one will require a smaller number of hidden units when convergence is reached. We show that our CANN can predict correctly most of the time, and present experimental results on better configurations found in a given time limit for a classification problem and the two-spiral problem.

CO97
Benjamin W. Wah, ``Issues in Strategy Learning (Plenary Address),'' Proc. 1994 Symposium on Intelligent Systems in Communications and Power (Feb. 21-23, 1994), pp. 238-244.

In this paper, we discuss twelve application-specific issues in strategy learning. These issues are important when designing strategy learning systems as well as in learning new application-specific strategies. In each issue we discuss the problems involved and identified some effective solutions.

CO98
Yao-Jen Chang and Benjamin W. Wah, ``Polynomial Programming Using Groebner Bases,'' Proc. IEEE Int'l Conference on Computer Software and Applications (Nov. 1994), pp. 236-241.

Finding the global optimal solution for a general nonlinear program is a difficult task except for very small problems. In this paper we identify a class of nonlinear programming problems called polynomial programming problems (PP). A polynomial program is an optimization problem with a scalar polynomial objective function and a set of polynomial constraints. By using Groebner Bases, we can determine the global minimum of a polynomial program in a reasonable amount of time and memory.
CO99
Benjamin W. Wah and Yi Shang, ``A Comparative Study of IDA*-Style Searches,'' Proc. 6th IEEE Int'l Conference on Tools with Artificial Intelligence (Nov. 1994), pp. 290-296.

In this paper, we study the performance of various IDA*-style searches and investigate methods to improve their performance by predicting in each stage the threshold to use for pruning. We first present three models to approximate the distribution of number of search nodes by lower bounds: exponential, geometric, and linear, and illustrate these distributions based on some well-known combinatorial search problems. We then show the performance of an ideal IDA* algorithm and identify reasons why existing IDA*-style algorithms perform well. In practice, we will be able to know from previous experience the distribution for a given problem instance but will not be able to determine the parameters of the distribution. Hence, we develop RIDA*, a method that estimates dynamically the parameters of the distribution, and predicts the best threshold to use. Finally, we compare the performance of several IDA*-style algorithms --- Korf's IDA*, RIDA*, IDA*-CR and DFS* --- on several application problems, and identify conditions under which each of these algorithms will perform well.
C100
Benjamin W. Wah, Pankaj Mehra, and Chin-Chi Teng, ``Comparator Neural Network for Dynamic Prediction,'' Proc. 2nd Int'l Symposium on Neural Networks (Cheng Kung University, Tainan, Taiwan, Dec. 1994), pp. 571-580.

In this paper we present the design of a comparator neural network (CANN) and its associated learning algorithms. This network is useful for comparing two time series in order to predict which one will have a better future value based on a user-specified objective function. We illustrate its application in distributed load balancing and in an automated design system for neural networks. In the first application, we use a CANN as a workload predictor that takes as input a time window of measurable values of CPU occupancy, network traffic, disk traffic, and memory occupancy from two computers, and predicts which one of the two computers will have a shorter turnaround time for executing an incoming job. In the second application, we use a CANN to predict which one of two partially trained neural networks will have a shorter convergence time, given current TSSE traces of these networks. In each application, experimental results show that prediction is improved using the proposed CANNs.

C101
Yao-Jen Chang and Benjamin W. Wah, ``Lagrangian Techniques for Solving a Class of Zero-One Integer Linear Programs,'' Proc. IEEE Int'l Conference on Computer Software and Applications (Aug. 1995), pp. 156-161.

We consider a class of zero-one integer programming feasibility problems (0-1 ILPF problems) in which the coefficients of variables can be integers, and the objective is to find an assignment of binary variables so that all constraints are satisfied. We propose a Lagrangian formulation in the continuous space and develop a gradient search in this space. By using two counteracting forces, one performing gradient search in the primal space (of the original variables) and the other in the dual space (of the Lagrangian variables), we show that our search algorithm does not get trapped in local minima and reaches equilibrium only when a feasible assignment to the original problem is found. We present experimental results comparing our method with backtracking and local search (based on random restarts). Our results show that 0-1 ILPF problems of reasonable sizes can be solved by an order of magnitude faster than existing methods.

C102
Arthur Ieumwananonthachai and Benjamin W. Wah, ``Statistical Generalization Of Performance-Related Heuristics for Knowledge-Lean Applications,'' Proc. IEEE Int'l Conf. on Tools with Artificial Intelligence (Nov. 1995), pp. 174-181.

In this paper, we present new results on the automated generalization of performance-related heuristics learned for knowledge-lean applications. We study methods to statistically generalize new heuristics learned for some small subsets of a problem space (using methods such as genetics-based learning) to unlearned problem subdomains. Our method uses a new statistical metric called probability of win. By assessing the performance of heuristics in a range-independent and distribution-independent manner, we can compare heuristics across problem subdomains in a consistent manner. To illustrate our approach, we show experimental results on generalizing heuristics learned for sequential circuit testing, VLSI cell placement and routing, and branch-and-bound search. We show that generalization can lead to new and robust heuristics that perform better than the original heuristics across problem instances of different characteristics.

C103
Benjamin W. Wah, A. Ieumwananonthachai, Shu Yao, and Ting Yu, ``Statistical Generalization: Theory and Applications,'' Proc. Int'l Conf. on Computer Design (Oct. 1995), pp. 4-10.

In this paper, we discuss a new approach to generalize heuristic methods (HMs) to new test cases of an application, and conditions under which such generalization is possible. Generalization is difficult when performance values of HMs are characterized by multiple statistical distributions across subsets of test cases of an application. We define a new measure called probability of win and propose three methods to evaluate it: interval analysis, maximum likelihood estimate, and Bayesian analysis. We show experimental results on new HMs found for blind equalization and branch-and-bound search.

C104
Yi Shang and Benjamin W. Wah, ``A Global Optimization Method for Neural Network Training,'' Proc. 1996 IEEE Int'l Conf. on Neural Networks (June 1996), Plenary, Panel and Special Sessions, pp. 7-11.

In this paper, we present a new supervised learning method called Novel (Nonlinear Optimization Via External Lead) for training feed-forward neural networks. In general, such learning can be considered as a nonlinear global optimization problem in which the goal is to minimize the nonlinear error function that spans the space of weights. Novel is a trajectory-based nonlinear optimization method that combines global and local searches to find good local minima. It relies on an external force to pull a search out of a local minimum in its global search and employs local descents to locate local minima in its local search. The result is an efficient search method that identifies good basins without spending a lot of time in them. We have shown improved training and testing results for five neural-network benchmark problems as compared to those of existing minimization and neural-network learning algorithms. For the two-spiral problem, Novel has found a design using only four hidden units and 25 weights. (The best known design requires nine hidden units and 75 weights.) In short, Novel represents a significant advance in the state-of-the-art in supervised learning of neural networks and general optimization of continuous non- linear functions.

C105
Benjamin W. Wah and Yi Shang, ``A Discrete Lagrangian-Based Global-Search Method for Solving Satisfiability Problems,'' Satisfiability Problem: Theory and Applications, Ding-Zhu Du, Jun Gu, and Panos Pardalos (eds.), DIMACS Series in Discrete Mathematics and Theoretical Computer Science, American Mathematical Society, 1997, pp. 365-392.

Satisfiability is a class of NP-complete problems that model a wide range of real-world applications. These problems are difficult to solve because they have many local minima in their search space, often trapping greedy search methods that utilize some form of descent. Further, gradient changes in the search space are in integers, leading to possibly many directions of the same gradient. In this paper, we propose a new discrete Lagrange-multiplier-based global-search method for solving satisfiability problems. We derive new approaches for applying Lagrangian methods in discrete space, show that equilibrium is reached when a feasible assignment to the original problem is found, and present heuristic algorithms to look for equilibrium points. Instead of restarting from a new starting point when a search reaches a local trap, the Lagrangian variables in our method provide a force to lead the search out of a local minimum and move it in the direction provided by the Lagrangian variables. One of the major advantages of our method is that it has very few algorithmic parameters to be tuned by users, and the search procedure can be made deterministic and the results, reproducible. We demonstrate our method by applying it to solve an extensive set of benchmark problems archived in DIMACS of Rutgers University. Our method generally performs better than the best existing methods and can achieve an order-of-magnitude speedup for some problems. Moreover, our method can solve some new benchmark problems that cannot be solved by other local-search methods.

C106
Chien-Wei Li and Benjamin W. Wah, ``Optimal Bit-Level Processor Arrays for Matrix Multiplication,'' Proc. 11th Int'l Conf. on Systems Engineering (July 1996), pp. 596-601.

Uniform recurrent algorithms belong to a fundamental class of operations used in almost every scientific and engineering computation. An important example of these algorithms is matrix multiplication. In this paper, we present optimal designs of bit-level processor arrays for matrix multiplication, and analyze trade-offs between bit-level and word-level designs in terms of execution time and chip area. We formulate the design as an optimization problem and search for optimal designs using the General Parameter Method developed by Ganapathy and Wah. We show that previous bit-level processor arrays for matrix multiplication are two special cases of our more general designs, and prove that one of the previous designs is optimal in execution time. Finally, we show that the fastest bit-level design is only O(log p) faster than than the fastest word-level design, instead of O(p) as claimed in the previous work of Shang and Wah, where p is the number of bits per word.

C107
Jun Gu, Paul W. Purdom, John Franco, and Benjamin W. Wah, ``Algorithms for the Satisfiability (SAT) Problem: A Survey,'' Satisfiability Problem: Theory and Applications, Ding-Zhu Du, Jun Gu, and Panos Pardalos (eds.), DIMACS Series in Discrete Mathematics and Theoretical Computer Science, American Mathematical Society, 1997, pp. 19-152.

The satisfiability (SAT) problem is a core problem in mathematical logic and computing theory. In practice, SAT is fundamental in solving many problems in automated reasoning, computer-aided design, computer-aided manufacturing, machine vision, database, robotics, integrated circuit design, computer architecture design, and computer network design. Traditional methods treat SAT as a discrete, constrained decision problem. In recent years, many optimization methods, parallel algorithms, and practical techniques have been developed for solving SAT. In this survey, we present a general framework (an algorithm space) that integrates existing SAT algorithms into a unified perspective. We describe sequential and parallel SAT algorithms including variable splitting, resolution, local search, global optimization, mathematical programming, and practical SAT algorithms. We give performance evaluation of some existing SAT algorithms. Finally, we provide a set of practical applications of the satisfiability problems.

C108
Benjamin W. Wah, Yi Shang, Tao Wang, and Ting Yu, ``Global Optimization Design of QMF Filter Banks,'' Proc. IEEE Midwest Symposium on Circuits and Systems (Aug. 1996), Iowa City, Iowa., vol. 2, pp. 640-643.

In this paper, we present a new global optimization method for designing QMF (quadrature mirror filters) filter banks. We formulate the design problem as a nonlinear constrained optimization problem, using the reconstruction error as the objective, and the stopband ripple, stopband energy, passband ripple, passband energy and transition bandwidth as constraints. This formulation allows us to search for solutions that improves with respect to the objective, and that performs better than or equal to the best existing designs with respect to the constrained measures. We present Novel, a global optimization method we have developed for solving nonlinear continuous constrained optimization problems, and apply it to find improved designs. We also show that relaxing the constraints on transition bandwidth and stopband energy will lead to significant improvements in the other performance measures.

C109
Tao Wang and Benjamin W. Wah, ``Handling Inequality Constraints in Continuous Nonlinear Global Optimization,'' Proc. Society for Design and Process Science Conference (Austin, Texas), December 1996, pp. 267-274.

In this paper, we present a new method to handle inequality constraints and apply it in Novel (Nonlinear Optimization via External Lead), a system we have developed for solving constrained continuous nonlinear optimization problems. In general, in applying Lagrange-multiplier methods to solve these problems, inequality constraints are first converted into equivalent equality constraints. One such conversion method adds a slack variable to each inequality constraint in order to convert it into an equality constraint. The disadvantage of this conversion is that when the search is inside a feasible region, some satisfied constraints may still pose a nonzero weight in the Lagrangian function, leading to possible oscillations and divergence when a local optimum lies on the boundary of a feasible region. We propose a new conversion method called the MaxQ method such that all satisfied constraints in a feasible region always carry zero weight in the Lagrange function; hence, minimizing the Lagrange function in a feasible region always leads to local minima of the objective function. We demonstrate that oscillations do not happen in our method. We also propose methods to speed up convergence when a local optimum lies on the boundary of a feasible region. Finally, we show improved experimental results in applying our proposed method in Novel on some existing benchmark problems and compare them to those obtained by applying the method based on slack variables.

C110
Benjamin W. Wah, Yi Shang, Tao Wang, and Ting Yu, ``QMF Filter Bank Design by a New Global Optimization Method,'' Proc. IEEE Int'l Conf. on Accoustics, Speech and Signal Processing, vol. 3, pp. 2081-2084, April 1997.

In this paper, we study various global optimization methods for designing QMF (quadrature mirror filter) filter banks. We formulate the design problem as a nonlinear constrained optimization problem, using the reconstruction error as the objective, and other performance metrics as constraints. This formulation allows us to search for designs that improve over the best existing designs. We present Novel, a global optimization method for solving nonlinear continuous constrained optimization problems. We show that Novel finds better designs with respect to simulated annealing and genetic algorithms in solving QMF benchmark design problems. We also show that relaxing the constraints on transition bandwidth and stopband energy leads to significant improvements in the other performance measures.

C111
Benjamin W. Wah and Yi Shang, ``Discrete Lagrangian-Based Search for Solving MAX-SAT Problems,'' Proc. 15th Int'l Joint Conf. on Artificial Intelligence (IJCAI), August 1997, pp. 378-383.

Weighted maximum satisfiability problems (MAX-SAT) are difficult to solve due to the large number of local minima in their search space. In this paper we propose a new discrete Lagrangian based search method (DLM) for solving these problems. Instead of restarting from a new point when the search reaches a local minimum, the Lagrange multipliers in DLM provide a force to lead the search out of the local minimum and move it in a direction provided by the multipliers. Since DLM has very few parameters to be tuned, it can be made deterministic and the results, reproducible. We compare DLM with GRASP in solving a large set of test problems, and show that it finds better solutions and is substantially faster. DLM has a solid theoretical foundation that can be used as a systematic approach for solving general discrete optimization problems.

C112
Benjamin W. Wah, Yi Shang, and Zhe Wu, ``Discrete Lagrangian Method for Optimizing the Design of Multiplierless QMF Filter Banks,'' Proc. 1997 IEEE International Conference on Application-Specific Systems, Architectures and Processors (July 1997), pp. 529-538.

In this paper, we present a new discrete Lagrangian optimization method for designing multiplierless QMF (quadrature mirror filter) filter banks. In multiplierless QMF filter banks, filter coefficients are powers-of-two (PO2) where numbers are represented as sums or differences of powers of two (also called Canonical Signed Digit--CSD--representation), and multiplications can be carried out as additions, subtractions and shifting. We formulate the design problem as a nonlinear discrete constrained optimization problem, using the reconstruction error as the objective, and other performance metrics as constraints. One of the major advantages of this formulation is that it allows us to search for designs that improve over the best existing designs with respect to all performance metrics, rather than finding designs that trade one performance metric for another. We show that our design method can find designs that improve over Johnston's benchmark designs using a maximum of three to six ONE bits in each filter coefficient.

C113
Benjamin W. Wah, Tao Wang, Yi Shang, and Zhe Wu, ``Improving the Performance of Weighted Lagrange-Multiplier Methods for Nonlinear Constrained Optimization,'' Proc. IEEE International Conference on Tools with Artificial Intelligence (November 1997), pp. 224-231.

Nonlinear constrained optimization problems can be solved by a Lagrange-multiplier method in a c