Active Messages: A Mechanism for Integrated Communication and Computation

T. von Eicken, D.E. Culler, S.C. Goldstein, and K.E. Schauser. Proceedings of the Nineteenth Annual International Symposium on Computer Architecture. May 1992, pp. 256-266.

The Problem: Communication latency is high, and there is not enough overlap between communication and computation.

The Solution: Embed the address of the receiving message handler in the message itself. The handler immediately handles the message, integrating it into the computation (think matrix row GET responses) or responds right away (a matrix row GET handler). This is efficient on ordinary hardware

The details:

Eliminates the need for message buffering, since messages handled right away (no memory allocation per-message).
Requires the same code image on all machines
Handlers are not allowed to block for "a long time"
Overlap can be achieved by compiler support (prefetching)
It's better even on message-passing-optimized hardware (Monsoon, JMachine).
But there are HW things that can help: message registers, multiple simultaneous message creating/receiving, protection checks
Architectural changes that would help too: Polling instead of interrupts, user-level handlers, separate message threads, message-only CPU

A question: for large messages, still need to allocate memory. Is there time after the first (size-indicating) message to do so (before the rest comes in)? You can't let those messages queue up.

Umesh Shankar

Last modified: Tue Jul 3 17:18:20 PDT 2001