PackStream
PackStream is a binary presentation format for the exchange of richly-typed data. It provides a syntax layer for the Bolt messaging protocol.
Version 1
PackStream is a general purpose data serialisation format, originally inspired by (but incompatible with) MessagePack.
The format provides a type system fully compatible with the types supported by Cypher, see the Cypher Manual → Values and types for more information.
PackStream offers a number of core data types, many supported by multiple binary representations, as well as a flexible extension mechanism.
The core data types are described in the table below.
Data type | Description |
---|---|
missing or empty value |
|
true or false |
|
signed 64-bit integer |
|
64-bit floating point number |
|
byte array |
|
unicode text, UTF-8 |
|
ordered collection of values |
|
collection of key-value entries (no order guaranteed) |
|
composite value with a type signature |
Neither unsigned integers nor 32-bit floating point numbers are included. This is a deliberate design decision to allow broader compatibility across client languages. |
General representation
Every serialised PackStream value begins with a marker byte.
The marker contains both data type information and direct or indirect size information for types that require it. How that size information is encoded varies by marker type.
Some values, such as Boolean true, can be encoded within a single marker byte. Many small integers (specifically between -16 and +127 inclusive) are also encoded within a single byte.
A number of marker bytes are reserved for future expansion of the format itself. These bytes should not be used and encountering them in an incoming stream should treated as an error.
Sized values
Some representations are of a variable length and have their size explicitly encoded in the representation. Such values generally begin with a single marker byte, followed by a size, followed by the data content itself. In this context, the marker denotes both type and scale and therefore determines the number of bytes used to represent the size of the data. The size itself is encoded as either an 8-bit, 16-bit or 32-bit unsigned integer. Sizes longer than this are not supported.
The diagram below illustrates the general layout for a sized value, here with a 16-bit size:
Data types
Boolean
Marker, false: C2
Marker, true: C3
Boolean values are encoded within a single marker byte, using C3
to denote true and C2
to denote false.
Integer
Markers, TINY_INT
:
Marker | Decimal number |
---|---|
|
-16 |
|
-15 |
|
-14 |
|
-13 |
|
-12 |
|
-11 |
|
-10 |
|
-9 |
|
-8 |
|
-7 |
|
-6 |
|
-5 |
|
-4 |
|
-3 |
|
-2 |
|
-1 |
|
0 |
|
1 |
|
2 |
… |
… |
… |
… |
… |
… |
|
126 |
|
127 |
Marker, INT_8
: C8
Marker, INT_16
: C9
Marker, INT_32
: CA
Marker, INT_64
: CB
Integer values occupy either 1, 2, 3, 5, or 9 bytes depending on magnitude. The available representations are:
Representation | Size (bytes) | Description |
---|---|---|
|
1 |
marker byte only |
|
2 |
marker byte |
|
3 |
marker byte |
|
5 |
marker byte |
|
9 |
marker byte |
The available encodings are illustrated below and each shows a valid representation for the decimal value 42
:
Representation | Size (bytes) | Description |
---|---|---|
|
1 |
|
|
2 |
|
|
3 |
|
|
5 |
|
|
9 |
|
Some marker bytes can be used to carry the value of a small integer as well as its type.
These markers can be identified by a zero high-order bit (for positive values) or by a high-order nibble containing only ones (for negative values).
Specifically, values between 00
and 7F
inclusive can be directly translated to and from positive integers with the same value.
Similarly, values between F0
and FF
inclusive can do the same for negative numbers between -16 and -1.
While it is possible to encode small numbers in wider formats, it is generally recommended to use the most compact representation possible. |
The following table shows the optimal representation for every possible integer in the signed 64-bit range:
Range minimum | Range maximum | Optimal representation |
---|---|---|
-9 223 372 036 854 775 808 |
-2 147 483 649 |
|
-2 147 483 648 |
-32 769 |
|
-32 768 |
-129 |
|
-128 |
-17 |
|
-16 |
+127 |
|
+128 |
+32 767 |
|
+32 768 |
+2 147 483 647 |
|
+2 147 483 648 |
+9 223 372 036 854 775 807 |
|
The value -9223372036854775808
(the minimum) can be represented as:
CB 80 00 00 00 00 00 00 00
The value 9223372036854775807
(the maximum) can be represented as:
CB 7F FF FF FF FF FF FF FF
Float
Marker: C1
Floats are double-precision floating-point values, generally used for representing fractions and decimals. They are encoded as a single C1 marker byte followed by 8 bytes which are formatted according to the IEEE 754 floating-point “double format” bit layout in big-endian order.
-
Bit 63 (the bit that is selected by the mask
0x8000000000000000
) represents the sign of the number. -
Bits 62-52 (the bits that are selected by the mask
0x7ff0000000000000
) represent the exponent. -
Bits 51-0 (the bits that are selected by the mask
0x000fffffffffffff
) represent the significand (sometimes called the mantissa) of the number.
The value 1.23
in decimal can be represented as:
C1 3F F3 AE 14 7A E1 47 AE
Bytes
Bytes are arrays of byte values. These are used to transmit raw binary data and the size represents the number of bytes contained. Unlike other values, there is no separate encoding for byte arrays containing fewer than 16 bytes.
Marker | Size | Maximum Size |
---|---|---|
|
8-bit big-endian unsigned integer |
255 bytes |
|
16-bit big-endian unsigned integer |
65 535 bytes |
|
32-bit big-endian unsigned integer |
2 147 483 647 bytes |
One of the markers CC
, CD
, or CE
should be used, depending on scale.
This marker is followed by the size and bytes themselves.
N.B. While the unsigned 32-bit integer following CE
could hold a bigger number, the maximum size of byte arrays is limited to 2 147 483 647 (maximum value of a signed 32-bit integer).
Empty byte array b[]
CC 00
Byte array containing three values 1, 2 and 3; b[1, 2, 3]
CC 03 01 02 03
String
Markers
For shorter strings:
Marker | Size (bytes) |
---|---|
|
0 |
|
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
6 |
|
7 |
|
8 |
|
9 |
|
10 |
|
11 |
|
12 |
|
13 |
|
14 |
|
15 |
For longer strings:
Marker | Size | Maximum number of bytes |
---|---|---|
|
8-bit big-endian unsigned integer |
255 bytes |
|
16-bit big-endian unsigned integer |
65 535 bytes |
|
32-bit big-endian unsigned integer |
2 147 483 647 bytes |
Text data is represented as UTF-8 encoded bytes.
The sizes used in string representations are the byte counts of the UTF-8 encoded data, not the character count of the original text. |
For encoded text containing fewer than 16 bytes, including empty strings, the marker byte should contain the high-order nibble 8
(binary 1000) followed by a low-order nibble containing the size.
The encoded data then immediately follows the marker.
For encoded text containing 16 bytes or more, the marker D0
, D1
or D2
should be used, depending on scale.
This marker is followed by the size and the UTF-8 encoded data.
N.B. While the unsigned 32-bit integer following D3
could hold a bigger number, the maximum byte size of strings is limited to 2 147 483 647 (maximum value of a signed 32-bit integer).
Value | Encoding |
---|---|
|
|
|
|
|
|
|
|
List
Lists are heterogeneous sequences of values and therefore permit a mixture of types within the same list. The size of a list denotes the number of items within that list, rather than the total packed byte size.
Markers:
Marker | Size (items) | Maximum size |
---|---|---|
|
the low-order nibble of marker |
0 items |
|
the low-order nibble of marker |
1 item |
|
the low-order nibble of marker |
2 items |
|
the low-order nibble of marker |
3 items |
|
the low-order nibble of marker |
4 items |
|
the low-order nibble of marker |
5 items |
|
the low-order nibble of marker |
6 items |
|
the low-order nibble of marker |
7 items |
|
the low-order nibble of marker |
8 items |
|
the low-order nibble of marker |
9 items |
|
the low-order nibble of marker |
10 items |
|
the low-order nibble of marker |
11 items |
|
the low-order nibble of marker |
12 items |
|
the low-order nibble of marker |
13 items |
|
the low-order nibble of marker |
14 items |
|
the low-order nibble of marker |
15 items |
|
8-bit big-endian unsigned integer |
255 items |
|
16-bit big-endian unsigned integer |
65 535 items |
|
32-bit big-endian unsigned integer |
2 147 483 647 items |
For lists containing fewer than 16 items, including empty lists, the marker byte should contain the high-order nibble 9
(binary 1001) followed by a low-order nibble containing the size.
The items within the list are then serialised in order immediately after the marker.
For lists containing 16 items or more, the marker D4
, D5
or D6
should be used, depending on scale.
This marker is followed by the size and list items, serialized in order.
N.B. While the unsigned 32-bit integer following D6
could hold a bigger number, the maximum size of lists is limited to 2 147 483 647 (maximum value of a signed 32-bit integer).
[]
90
[Integer(1), Integer(2), Integer(3)]
93 01 02 03
[ Integer(1), Float(2.0), String("three") ]
93 01 C1 40 00 00 00 00 00 00 00 85 74 68 72 65 65
[ Integer(1), Integer(2), ... Integer(40) ]
D4 28 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 26 27 28
Dictionary
A Dictionary
is a list containing key-value entries:
-
keys must be a
String
-
can contain multiple instances of the same key
-
permit a mixture of types
The size of a Dictionary
denotes the number of key-value entries within that dictionary, not the total packed byte size.
Markers:
Marker | Size (key-value entries) | Maximum size |
---|---|---|
|
contained within low-order nibble of marker |
0 |
|
contained within low-order nibble of marker |
1 |
|
contained within low-order nibble of marker |
2 |
|
contained within low-order nibble of marker |
3 |
|
contained within low-order nibble of marker |
4 |
|
contained within low-order nibble of marker |
5 |
|
contained within low-order nibble of marker |
6 |
|
contained within low-order nibble of marker |
7 |
|
contained within low-order nibble of marker |
8 |
|
contained within low-order nibble of marker |
9 |
|
contained within low-order nibble of marker |
10 |
|
contained within low-order nibble of marker |
11 |
|
contained within low-order nibble of marker |
12 |
|
contained within low-order nibble of marker |
13 |
|
contained within low-order nibble of marker |
14 |
|
contained within low-order nibble of marker |
15 |
|
8-bit big-endian unsigned integer |
255 entries |
|
16-bit big-endian unsigned integer |
65 535 entries |
|
32-bit big-endian unsigned integer |
2 147 483 647 entries |
For a dictionary containing fewer than 16 key-value entries, including an empty dictionary, the marker byte should contain the high-order nibble A
(binary 1010
) followed by a low-order nibble containing the size.
The entries within the dictionary are then serialized in [key, value, key, value]
order immediately after the marker.
Keys are always |
For a dictionary containing 16 key-value entries or more, the marker D8
, D9
or DA
should be used, depending on scale.
This marker is followed by the size and the key-value entries.
N.B. While the unsigned 32-bit integer following DA
could hold a bigger number, the maximum size of dictionaries is limited to 2 147 483 647 (maximum value of a signed 32-bit integer).
{}
A0
{"one": "eins"}
A1 83 6F 6E 65 84 65 69 6E 73
{"A": 1, "B": 2 ... "Z": 26}
D8 1A 81 41 01 81 42 02 81 43 03 81 44 04 81 45 05 81 46 06 81 47 07 81 48 08 81 49 09 81 4A 0A 81 4B 0B 81 4C 0C 81 4D 0D 81 4E 0E 81 4F 0F 81 50 10 81 51 11 81 52 12 81 53 13 81 54 14 81 55 15 81 56 16 81 57 17 81 58 18 81 59 19 81 5A 1A
If there are multiple instances of the same key when unpacked, the last seen value for that key should be used.
[("key_1", 1), ("key_2", 2), ("key_1", 3)] -> {"key_1": 3, "key_2": 2}
Structure
A structure is a composite value, comprised of fields and a unique type code. Structure encodings consist, beyond the marker, of a single byte, the tag byte, followed by a sequence of up to 15 fields, each an individual value. The size of a structure is measured as the number of fields and not the total byte size. This count does not include the tag.
Markers:
Marker | Size (fields) | Maximum size |
---|---|---|
|
contained within low-order nibble of marker |
0 fields |
|
contained within low-order nibble of marker |
1 field |
|
contained within low-order nibble of marker |
2 fields |
|
contained within low-order nibble of marker |
3 fields |
|
contained within low-order nibble of marker |
4 fields |
|
contained within low-order nibble of marker |
5 fields |
|
contained within low-order nibble of marker |
6 fields |
|
contained within low-order nibble of marker |
7 fields |
|
contained within low-order nibble of marker |
8 fields |
|
contained within low-order nibble of marker |
9 fields |
|
contained within low-order nibble of marker |
10 fields |
|
contained within low-order nibble of marker |
11 fields |
|
contained within low-order nibble of marker |
12 fields |
|
contained within low-order nibble of marker |
13 fields |
|
contained within low-order nibble of marker |
14 fields |
|
contained within low-order nibble of marker |
15 fields |
For structures containing fewer than 16 fields, the marker byte should contain the high-order nibble B
(binary 1011
) followed by a low-order nibble containing the size.
The marker is immediately followed by the tag byte and the field values in that order.
The tag byte is used to identify the type or class of the structure and may hold any value between 0 and +127.
PackStream itself does not define semantics for different structures. Instead, refer to the Structure Semantics for the Bolt version in question.