[to DCS Home Page]

[10 Kings College Road, SF 3302; Toronto, Ontario, Canada]

[DCS Homepage] [Grad Homepage] [Search] [..Next] [Prev..]

[Department] [Faculty] [Applications] [Grad Program] [Financial Assist.] [Grad Courses] [Theses] [Administration]

M.Sc. Theses: [1998-99] [1999-2000] [2000-01] [2001-02]
Ph.D. Theses: [1998-99] [ 1999-2000 ] [2000-01] [2001-02]

Ph.D. Theses 1999-2000

This page lists theses' titles and abstracts of recent Ph.D. graduates of the department, starting from July 1, 1999. Stay tuned for some wonderful graduates!


Name: Luk, Chi-Keung
Supervisor(s): Mowry, Todd
Degree: Ph.D.
Date: January 20, 2000
Title: Optimizing the Cache Performance of Non-Numeric Applications
Abstract: The latency of accessing instructions and data from the memory subsystem is an increasingly crucial performance bottleneck in modern computer systems. While cache hierarchies are an important first step, they alone cannot solve the problem. Further, though a variety of latency-hiding techniques have been proposed, their success has been largely limited to regular, numeric applications. Few promising latency-hiding techniques that can handle irregular, non-numeric codes have been proposed, in spite of the popularity of such codes in computer applications.
This dissertation investigates hardware and software techniques for coping with the instruction-access latency and data-access latency in non-numeric applications. To deal with instruction-access latency, we propose cooperative instruction prefetching, a novel technique which significantly outperforms state-of-the-art instruction prefetching schemes by being able to prefetch more aggressively and much further ahead of time while at the same time substantially reducing the amount of useless prefetches.
To cope with data-access latency, we investigate three complementary techniques. First, we study how to use compiler-inserted data prefetching to tolerate the latency of accessing pointer-based data structures. To schedule prefetches early enough, we design three prefetching schemes to overcome the pointer-chasing problem associated with these data structures, and we automate them in an optimizing research compiler. Second, we study how to safely perform an important class of locality optimizations, namely dynamic data layout optimizations, in non-numeric codes. Specifically, we propose the use of an architectural mechanism called memory forwarding which can guarantee the safety of data relocation, thereby enabling many aggressive data layout optimizations (which also facilitate prefetching) that cannot be safely performed using current hardware or compiler technology. Finally, in an effort to minimize the overheads of latency tolerance techniques, we propose new cache miss prediction techniques based on correlation profiling. By correlating cache miss behaviors with dynamic execution contexts, these techniques can accurately isolate dynamic miss instances and so pay the latency tolerance overhead only when there would have been cache misses.
Detailed design considerations and experimental evaluations are provided for our proposed techniques, confirming them as viable solutions for coping with memory latency in non-numeric applications.
Key words and phrases: Cache performance, non-numeric applications, tolerating latency, instruction prefetching, data prefetching, locality optimizations, cache miss prediction.


Name: Ruppert, Eric
Supervisor(s): Fich, Faith
Degree: Ph.D.
Date: December, 1999
Title: The Consensus Power of Shared-Memory Distributed Systems
Abstract: In many asynchronous distributed systems, processes communicate by accessing objects in a shared memory. The ability of systems to solve problems in a fault-tolerant manner depends on the types of objects provided. Here, the wait-free model of fault-tolerance is used: non-faulty processes must run correctly even if other processes experience halting failures. The consensus problem, where processes begin with private inputs and must agree on one of them, has played a central role in analysing the power of distributed systems. This thesis studies the ability of different types of objects to solve consensus.

An object type has consensus number n if it can be used (with read/write registers) to solve consensus among n processes but not among n+1 processes. Conditions are given that are necessary and sufficient for an object type to have consensus number n. This characterization applies to two large classes of objects: readable objects and read-modify-write (RMW) objects. An object is readable if processes can read its state without changing the state. For a RMW object, all operations update the state and then return the previous state of the object. When the type is of bounded size, the characterization may be used to decide the question ``Does the type T have consensus number n?'', which is undecidable for arbitrary types. The characterization is also used to show that different readable and RMW types with consensus number n cannot be used in combination to solve consensus for n+1 processes.

Ordinarily, processes may access only one object in shared memory at a time. This thesis also studies how much the consensus number of a type increases in the multi-object and transactional models, where processes can perform operations on up to m of the objects in a single atomic action. These models are much more convenient for programmers to use, since they guarantee that certain blocks of operations will be executed without interruptions from other processes. This thesis establishes bounds on the consensus numbers of multi-objects and transactional objects as a function of m and the consensus numbers of the corresponding single-access types.


Name: Dissett, Luis
Supervisor(s): Mathon, Rudolf
Degree: Ph.D.
Date: November, 1999
Title: Combinatorial and computational aspects of finite geometries
Abstract: This thesis presents a methodology for the systematic generation of strongly regular graphs by fusing classes in large association schemes; this methodology includes:

Using the methodology presented here, we have settled the existence problem for strongly regular graphs with the following parameters:
        625,156,29,42;
        729,308,127,132;
        729,336,153,156;
        1024,231,38,56;
        1024,297,76,90; 
        1024,330,98,110;
        1024,363,122,132;
        1024,396,148,156;
        1024,429,176,182;
        1024,462,206,210.
We have also found new graphs for many parameter sets for which some graphs were already known, as well as graphs which were themselves known.
Many of the graphs found are pseudo-geometric, i.e., their parameters correspond to those of the point graph of some hypothetical finite geometry. Some of the graphs are actually geometric, and it is possible to recover the geometry from the maximum cliques of the graph.
The graphs presented here are all of cyclotomic type; therefore they all can be expressed as two-intersection sets in suitable projective geometries, and they all determine two-weight codes. However, the methodology is not restricted to this kind of graphs.
We have found six interesting non-isomorphic strongly regular graphs with parameters (625, 156, 29, 42). If we consider the corresponding two-intersection sets in PG(3, 5), the affine lines determined by each of these sets form a geometric structure that resembles a semipartial geometry: given any antiflag (p, L), the number of points in L which are collinear with p is either 0, 1 or 2 (i.e., this number takes one of three values, while in the case of partial geometries it takes only one value and in the case of semipartial geometries it takes one of two values).
Another interesting property of these graphs is that it is possible to factorize the complete graph K_{625} into copies of some of these graphs; we exhibit three such factorizations.


Name: Lu, Chien-Ping Paul
Supervisor(s): Zhou, Songnian
Degree: Ph.D.
Date: November 1999
Title: Scoped Behaviour for Optimized Distributed Data Sharing
Abstract: We introduce the novel scoped behaviour abstraction and examine how it is used to optimize distributed data-sharing patterns within the Aurora parallel programming system. Scoped behaviour is an application programmer's interface to a set of system-provided optimizations; it is also an implementation framework for the optimizations. Aurora is a distributed shared data system where shared-data objects are implemented in C++ as abstract data types. Aurora has been prototyped on a network of workstations connected by an ATM network.

The design, implementation, and evaluation of Aurora and scoped behaviour contributes to the field of parallel and distributed systems by demonstrating that:

1. Scoped behaviour can provide per-object and per-context (i.e., specific portion of the source code) flexibility when applying data-sharing optimizations. In contrast to some other systems, Aurora programs can be incrementally tuned and only a minimum number of error-prone changes to the source code are required in order to experiment with different optimization strategies.

2. Scoped behaviour can be implemented without language extensions and without special compiler support. Scoped behaviour's novel implementation framework can exploit both compile-time and run-time information about the parallel program.

3. A parallel programming system based on a high-level shared-data abstraction can achieve high performance. In a performance evaluation of four applications implemented using three different types of parallel programming systems, Aurora usually matches or outperforms, sometimes by a wide margin, TreadMarks (a distributed shared memory system) and a message-passing system (either MPICH or PVM, depending on the application).


Name: Edmonds, Philip
Supervisor(s): Hirst, Graeme
Degree: Ph.D.
Date: October 1999
Title: Semantic Representations of Near-Synonyms for Automatic Lexical Choice
Abstract: We develop a new computational model for representing the fine-grained meanings of near-synonyms and the differences between them. We also develop a sophisticated lexical-choice process that can decide which of several near-synonyms is most appropriate in any particular context. This research has direct applications in machine translation and text generation, and also in intelligent electronic dictionaries and automated style-checking and document editing.
We first identify the problems of representing near-synonyms in a computational lexicon and show that no previous model adequately accounts for near-synonymy. We then propose a preliminary theory to account for near-synonymy in which the meaning of a word arises out of a context-dependent combination of a context-independent core meaning and a set of explicit differences to its near-synonyms. That is, near-synonyms cluster together.
After considering a statistical model and its weaknesses, we develop a clustered model of lexical knowledge, based on the conventional ontological model. The model cuts off the ontology at a coarse grain, thus avoiding an awkward proliferation of language-dependent concepts in the ontology, and groups near-synonyms into subconceptual clusters that are linked to the ontology. A cluster acts as a formal usage note that differentiates near-synonyms in terms of fine-grained aspects of denotation, implication, expressed attitude, and style. The model is general enough to account for other types of variation, for instance, in collocational behaviour.
We formalize various criteria for lexical choice as preferences to express certain concepts with varying indirectness, to express attitudes, and to establish certain styles. The lexical-choice process chooses the near-synonym that best satisfies the most preferences. The process uses an approximate-matching algorithm that determines how well the set of lexical distinctions of each near-synonym in a cluster matches a set of input preferences.
We implemented the lexical-choice process in a prototype sentence-planning system. We evaluate the system to show that it can make the appropriate word choices when given a set of preferences.


Name: Mac Donald, Craig
Supervisor(s): Enright, Wayne
Degree: Ph.D.
Date: October 28, 1999
Title: A New Approach for DAEs
Abstract: This thesis introduces a new 'least squares' implementation of a continuous extension of an underlying discrete Runge-Kutta (RK) formula that is particularly effective for solving DAEs. The resulting method provides a continuous approximation to the solution of the DAE and does not assume that the problem has a special form or that the index of the problem is known.
The methods we develop and implement include some that are based on underlying discrete formulas that were previously not considered suitable for DAEs because of severe order reduction or stability restrictions. Specifically we propose a particular implementation of a modified extension of these formulas and develop stability and convergence results which indicate that these methods can be effective when solving DAEs. Numerical results are included to verify this.
We also investigate and justify an error control and stepsize choosing strategy that is based on directly monitoring and bounding the continuous residual (or defect) of the numerical solution. We investigate the relationship between the size of the residual and the local error for index one and index two DAEs. We also derive the extra order conditions that arise for special classes of index one and index two problems.
Our fixed stepsize and variable stepsize numerical results confirm the improved order and stability that is achievable with our approach. These results also seem to indicate that our approach is less likely to run into convergence difficulties (with nonlinear equations that must be solved on each timestep) than is observed with traditional implementations.


If you have any thoughts, suggestions or queries, please feel free to contact us at www@cs.toronto.edu

Last updated March 2, 2000