At Rice University:

Thesis: "Programming Models and Runtimes for Heterogeneous Systems"

HadoopCL: This project accelerates the computation in Hadoop jobs (Mappers and Reducers) using OpenCL. This enables access to native execution on multicore and heterogeneous hardware in the OpenCL runtime, and significant performance benefits for compute intensive applications as a result.

DyGR GPU Runtime: This work extends NVIDIA's Compute Unified Device Architecture (CUDA) by implementing a work stealing load balancing run-time on the GPU. We introduce a finish-async style API to GPU device programming with the aim of executing irregular applications efficiently across multiple shared multiprocessors (SM) in a GPU device without sacrificing the performance of regular data-parallel applications on the GPU.

Applying Machine Learning to GPUs: In this work, we investigated how machine learning algorithms (in this case, genetic algorithms) can be used to find optimal configurations of certain system parameters for GPU execution.

CnC-CUDA: This project extends past work on Intel's Concurrent Collections(CnC) programming model to address the heterogeneous programming challenge using a model called CnC-CUDA. CnC is a declarative and implicitly parallel coordination language that supports flexible combinations of task and data parallelism while retaining determinism. CnC computations are built using steps that are related by data and control dependence edges, which are represented by a CnC graph. The CnC-CUDA extensions in this paper include the definition of multithreaded steps for execution on GPUs, and automatic generation of data and control flow between CPU steps and GPU steps.

JCUDA: In this work, we presented a programming interface called JCUDA that can be used by Java programmers to invoke CUDA kernels. Using this interface, programmers can write Java codes that directly call CUDA kernels, and delegate the responsibility of generating the Java-CUDA bridge codes and host-device data transfer calls to the compiler. Max was responsible for testing JCUDA.

At Oracle Labs:

Work at Oracle has focused on building tools for programmers which automatically detect performance bugs related to concurrency in Java by aggregating data from JVM bytecode and JRockit Flight Recordings. Novel functionality was implemented which automatically altered the analyzed Java bytecode to remove performance bottlenecks.

At Repsol USA:

Kirchoff Migration: This project applies heterogeneous architectures to computationally challenging geophysical problems at Repsol USA, using hybrid parallelism to map multi-core hardware across many nodes to compatible code sections.

At NASA JPL:

KINETICS: This project works to prepare a dynamical and chemical simulation for larger data sets and more complex models by using MPI to enable execution on compute clusters.

Independently:

Yum Route: Yum Route is a website (also under development) which aims to help people find dining between two locations.