LCM type definition language

LCM1Type Definition Language

LCM types define the use and features of the language.

Introduction

In addition to providing a set of communication primitives, LCM also includes facilities for generating marshalling and unmarshalling functions for platform-independent data types. It’s similar to XDR, but it targets greater type safety, and first-class support for languages like C, Java, and Python. This document describes the data marshalling functionality; the communication functionality is described elsewhere. Note that the data marshalling function of the LCM can be used independently of the communication function of the LCM.

Design goals

The main design goals of the LCM marshalling functionality are:

  • Provides a simple mechanism to define complex types, which is very comfortable for users of C and Java
  • Provides native support for various client languages
  • Provides an abstraction mechanism for platform-dependent details such as endianness
  • Maximize compile and run type safety
  • Ability to identify type incompatible messages, such as when two applications have different versions of the same data type
  • Produce space-efficient encoded messages
  • Reduce the computational cost of encoding and decoding

The current version of LCM has only a few compromises to achieve these goals. In some cases, only a minimum convention can be used to ensure that all platforms support the features provided by LCM.

Type definition

Type definitions are contained in files with a “.lcm” suffix. They are usually named using lowercase letters and underscores: for example, the type “wind_speed_t” is defined in the file “wind_speed_t.lcm”. The tool lcm-gen translates LCM type definitions into language-specific implementations.

Structure

LCM structs are composite types composed of other types. We start with a simple struct called “temperature_t” that contains a 64-bit integer called “utime” and a 64-bit float called “degCelsius”. Two types of annotations are also shown in the following example.

struct temperature_t
{
    int64_t utime; // Timestamp, in microseconds

    /* Temperature in degrees Celsius. A "float" would probably
     * be good enough, unless we're measuring temperatures during
     * the big bang. Note that the asterisk on the beginning of this
     * line is not syntactically necessary, it's just pretty.
     */
    double degCelsius;
}

These declarations must appear in a file called temperature_t.lcm.

LCM types do not contain pointers (but support arrays, see below): this eliminates the possibility of circular references.

Before going any further, let’s take a look at the various primitive types available.

Basic type

LCM supports the following basic types:

type Description
int8_t 8-bit signed integer
int16_t 16-bit signed integer
int32_t 32-bit signed integer
int64_t 64-bit signed integer
float 32-bit IEEE floating point value
double 64-bit IEEE floating point value
string UTF-8 string
boolean true/false logical value
byte 8-bit value

Integer types are all signed (necessary to ensure simple interaction with Java, which lacks unsigned types), and encoded in network byte order.

The type byte is represented as uint8_t in C/C++. Languages that have a native byte representation use their respective native byte representation (for example, the type byte in Java).

Floating point types are encoded using IEEE 32 and 64 bit formats. LCM implementations may not use any other encodings. 32 and 64 bit quantities are transmitted in network byte order.

The type boolean is encoded as a byte with a value of 0 or 1. An array of N booleans will require N bytes.

Type string encodes a NULL-terminated UTF-8 string. The string is sent as a 32-bit integer containing the total length of the string in bytes, including the terminating NULL character, followed immediately by the bytes of the string itself (again including the NULL character).

Array

LCM supports multidimensional arrays consisting of primitive types, structures, or constant declarations. The dimensionality of an array is declared by the LCM type declaration: you cannot encode an LCM type that contains a variable-dimensional array. In contrast, variable-sized arrays are fine. Take a look at the example below:

struct point2d_list_t
{
    int32_t npoints;
    double points[npoints][2];
}

This example shows a two-dimensional array declaration consisting of variable-length and fixed-length components. In a variable-length declaration, the variable containing the length must be declared before it is used as the length of the array. Note also that the length variable (npoints in the example above) must be of integer type and must always be greater than or equal to zero.

When the array is encoded and decoded, the size of each dimension is known: either it is a constant (given by the LCM type declaration), or it is a variable from a previous encoding/decoding. Arrays are thus encoded by recursively encoding each element of the array, with the innermost dimensions encoded together. In other words, the above array will be encoded as points[0][0], points[0][1], points[1][0], points[1][1], points[2] in order ][0], points[2][1], and so on.

Constant

LCM provides an easy way to declare constants that can then be used to populate other data fields. Users are free to use these constants in any way they choose: as magic numbers, enums, or bitfields.

Constants can be declared by using the const keyword.

struct my_constants_t
{
    const int32_t YELLOW=1, GOLDENROD=2, CANARY=3;
    const double E=2.8718;
}

Note that the type must be declared for the constant. All integer and floating point types are supported. String constants are not supported.

Namespace

LCM allows types to be defined in namespaces, making it easier for users to use types from other organizations even if the types have the same name. The namespace mechanism is very similar to that of Java. In languages that support namespaces (such as Java and Python), LCM namespace mechanisms map to native mechanisms. In languages like C, namespaces are approximated by prefixing the type name with the package name.
See below for examples of namespaces. Note that the package keyword identifies the namespace of the structures defined in this file, and fully qualified types are formed by joining the package and type names with a period in between.

package mycorp;

struct camera_image_t {
    int64_t utime;
    string camera_name;
    jpeg.image_t jpeg_image;
    mit. pose_t pose;
}

LCM users are encouraged to put their types into unique namespaces and fully qualify the types of all member fields.

Performance considerations

The runtime cost of encoding and decoding using LCM is usually not a system bottleneck. Marshalling functions is much faster than XML implementations, but since each member has to be processed individually (for example, to ensure correct endianness), LCM is more expensive than using raw C structures. The first application of LCM used more than 40MB/s.

Fingerprint calculation

Fingerprinting ensures that encoding and decoding methods agree on the format of the data type. A fingerprint is a recursive function that is a function of all types that a type contains. This creates a potential problem when types may be mutually recursive: we must avoid infinite recursion.

The basic idea is that each type has a “base” fingerprint, which we will denote as “K_A” for type “A”. K_A is a constant derived from the lcm type description (and it is stored as lcm_struct->hash). We wish to compute the actual fingerprint (or hash), A() , which is a function of all types that A contains.

Furthermore, to be able to recognize recursion, the A() function requires a parameter which is a list of visited types. For example, C([A,B]) means that we wish to compute the hash of type C, given that C is a member of type B, which is a member of type A. If [list] contains C, avoid recursion by setting C([list]) = 0.

Contributions of primitive types are handled through K_A; they have no recursion.

A small problem arises in the above definition: if types A, B and C are mutually recursive, we may have two types with the same hash value. This is obviously not desirable. We solve this by making the order of the recursion relevant: at each node in the tree, we rotate the value (bitwise) left by 1 bit. Contributions of types contained at recursion depth N are rotated by N bits.

It’s worth noting that for enums (they can’t contain other types), this mechanism is completely unnecessary; for enums, we just use the hash from lcmenum->hash.

pseudocode:

 v = compute_hash(type, parents)
  
  if type is member of parents
     return 0

  v = K_type;

  for each member m of type
      v + = compute_hash(m, [parents, type])

  return rot_left(v);

When encoding/decoding type T, we will use compute_hash(T, []) as the hash function.

example:

struct A
{
        B b;
        C c;
}

struct B
{
        A a;
}

struct C
{
        B b;
}

From the graph, we can compute the hash of each branch by displaying their children. We use lowercase letters for terminal leaves (where a leaf has the same class as one of its parents).

 A B C
       / \ | |
      B C A B
      | | / \ |
      a B b C A
            | | / \
            a b b c

A() = R{K_A + R{K_B}} + R{K_C + R{K_B}}

B() = R{K_B + R{K_A + R{K_C}}}

C() = R{K_C + R{K_B + R{K_A}}}

Notably, without rotation, B() == C().

Implementation

The algorithm for calculating the fingerprint is simple:

  • Computes the basic hash of the structure
    • Include field names in basic hash calculations.
    • If the type is a primitive type, its type name is included in the base hash calculation.
    • If the type is not a primitive type, then do not include it in the basic hash calculation (as above). It will be included in the recursive calculation.
  • Recursively computes the fingerprint of a structure
    • This will recurse into non-primitive struct types, using their basic hash.

Here’s a fixed reference implementation in C, used on the parsing side, generating local structure hashes (non-recursive) as well as C-bound types (where computation is done recursively):

  • https://github.com/lcm-proj/lcm/blob/v1.4.0/lcmgen/lcmgen.c#L233-L267
  • https://github.com/hoxovic/lcm/blob/v1.4.0/lcmgen/emit_c.c#L390-L439

Of course, this should also be consistent with other language implementations.

Related work

LCM is most similar to XDR, which is used for RPC and described by RFC4506. Both use C-like syntax (even including C keywords like “struct”). LCM differs in that it has a smaller language: less commonly used features like unions are not supported. LCM does not support pointers: this eliminates pointer tracking issues that can occur in XDR. LCMs support variable-length arrays in a more natural way, and LCMs include a type “signature” in the encoded data. This type signature allows runtime error detection.

Data encoded representations are often compared to XML. XML and LCM are used for very different functions. XML’s verbosity and general structure help agents use information they understand, while safely skipping attributes they are not familiar with. In contrast, LCM aims at tightly coupled agents, but they may not be in the same memory space. Tighter type definitions, and space-efficient and computationally efficient encodings, are better suited for these types of applications.

Development History

The LCM marshalling feature was created for MIT’s DARPA Urban Challenge vehicle, and development work began in the summer of 2006. Earlier versions supported a lot of now-deprecated features: some unnecessary features were removed, which greatly simplifies the codebase, since most features usually affect multiple language backends (currently C, Java, and Python) .


  1. LCM, Lightweight Communication Marshalling, is a library for communication and data marshalling. Marshalling means “assembly” in Chinese, which refers to the process of converting a data structure into a byte stream.