Milvus

Here is a list of all available Milvus procedures:

name description

apoc.vectordb.milvus.createCollection(hostOrKey, collection, similarity, size, $config)

Creates a collection, with the name specified in the 2nd parameter, and with the specified similarity and size. The default endpoint is <hostOrKey param>/v2/vectordb/collections/create.

apoc.vectordb.milvus.deleteCollection(hostOrKey, collection, $config)

Deletes a collection with the name specified in the 2nd parameter. The default endpoint is <hostOrKey param>/v2/vectordb/collections/drop.

apoc.vectordb.milvus.upsert(hostOrKey, collection, vectors, $config)

Upserts, in the collection with the name specified in the 2nd parameter, the vectors [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}]. The default endpoint is <hostOrKey param>/v2/vectordb/entities/upsert.

apoc.vectordb.milvus.delete(hostOrKey, collection, ids, $config)

Delete the vectors with the specified ids. The default endpoint is <hostOrKey param>/v2/vectordb/entities/delete.

apoc.vectordb.milvus.get(hostOrKey, collection, ids, $config)

Get the vectors with the specified ids. The default endpoint is <hostOrKey param>/v2/vectordb/entities/get.

apoc.vectordb.milvus.query(hostOrKey, collection, vector, filter, limit, $config)

Retrieve closest vectors the the defined vector, limit of results, in the collection with the name specified in the 2nd parameter. The default endpoint is <hostOrKey param>/v2/vectordb/entities/search.

apoc.vectordb.milvus.getAndUpdate(hostOrKey, collection, ids, $config)

Get the vectors with the specified ids. The default endpoint is <hostOrKey param>/v2/vectordb/entities/get, and optionally creates/updates neo4j entities.

apoc.vectordb.milvus.queryAndUpdate(hostOrKey, collection, vector, filter, limit, $config)

Retrieve closest vectors the the defined vector, limit of results, in the collection with the name specified in the 2nd parameter, and optionally creates/updates neo4j entities. The default endpoint is <hostOrKey param>/v2/vectordb/entities/search.

where the 1st parameter can be a key defined by the apoc config apoc.milvus.<key>.host=myHost. With hostOrKey=null, the default host is 'http://localhost:19530'.

Examples

Here is a list of example using a local installation using th default port 19531.

Create a collection (it leverages this API)
CALL apoc.vectordb.milvus.createCollection('http://localhost:19531', 'test_collection', 'COSINE', 4, {<optional config>})
Delete a collection (it leverages this API)
CALL apoc.vectordb.milvus.deleteCollection('http://localhost:19531', 'test_collection', {<optional config>})
Upsert vectors (it leverages this API)
CALL apoc.vectordb.milvus.upsert('http://localhost:19531', 'test_collection',
    [
        {id: 1, vector: [0.05, 0.61, 0.76, 0.74], metadata: {city: "Berlin", foo: "one"}},
        {id: 2, vector: [0.19, 0.81, 0.75, 0.11], metadata: {city: "London", foo: "two"}}
    ],
    {<optional config>})
Get vectors (it leverages this API)
CALL apoc.vectordb.milvus.get('http://localhost:19531', 'test_collection', [1,2], {<optional config>})
Table 1. Example results
score metadata id vector text entity errors

null

{city: "Berlin", foo: "one"}

null

null

null

null

null

null

{city: "Berlin", foo: "two"}

null

null

null

null

null

In case of errors, e.g. due to apoc.vectordb.milvus.query with wrong vector size as a 3rd parameter, the error field will be populated, for example:

Table 2. Example results
score metadata id vector text errors

null

null

null

null

null

..please check the primary key and its' type can only in [int, string], error: unable to cast "wrong" of type string to int64..

Get vectors with {allResults: true}
CALL apoc.vectordb.milvus.get('http://localhost:19531', 'test_collection', [1,2], {allResults: true, <optional config>})
Table 3. Example results
score metadata id vector text entity errors

null

{city: "Berlin", foo: "one"}

1

[…​]

null

null

null

null

{city: "Berlin", foo: "two"}

2

[…​]

null

null

null

Query vectors (it leverages this API)
CALL apoc.vectordb.milvus.query('http://localhost:19531',
    'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    { must:
        [ { key: "city", match: { value: "London" } } ]
    },
    5,
    {allResults: true, <optional config>})
Table 4. Example results
score metadata id vector text entity errors

1,

{city: "Berlin", foo: "one"}

1

[…​]

null

null

null

0.1

{city: "Berlin", foo: "two"}

2

[…​]

null

null

null

In case of errors, e.g. due to apoc.vectordb.milvus.query with wrong vector size as a 3rd parameter, the error field will be populated, for example:

Table 5. Example results
score metadata id vector text errors

null

null

null

null

null

..can only accept json format request, error: dimension: 4, but length of []float: 3: invalid parameter[expected=FloatVector][actual=[0.2,0.1,0.9]]..

We can define a mapping, to auto-create one/multiple nodes and relationships, by leveraging the vector metadata.

For example, if we have created 2 vectors with the above upsert procedures, we can populate some existing nodes (i.e. (:Test {myId: 'one'}) and (:Test {myId: 'two'})):

CALL apoc.vectordb.milvus.queryAndUpdate('http://localhost:19531', 'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { mapping: {
            embeddingKey: "vect",
            nodeLabel: "Test",
            entityKey: "myId",
            metadataKey: "foo"
        }
    })

which populates the two nodes as: (:Test {myId: 'one', city: 'Berlin', vect: [vector1]}) and (:Test {myId: 'two', city: 'London', vect: [vector2]}), which will be returned in the entity column result.

We can also set the mapping configuration mode to CREATE_IF_MISSING (which creates nodes if not exist), READ_ONLY (to search for nodes/rels, without making updates) or UPDATE_EXISTING (default behavior):

CALL apoc.vectordb.milvus.queryAndUpdate('http://localhost:19531', 'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { mapping: {
            mode: "CREATE_IF_MISSING",
            embeddingKey: "vect",
            nodeLabel: "Test",
            entityKey: "myId",
            metadataKey: "foo"
        }
    })

which creates and 2 new nodes as above.

Or, we can populate an existing relationship (i.e. (:Start)-[:TEST {myId: 'one'}]→(:End) and (:Start)-[:TEST {myId: 'two'}]→(:End)):

CALL apoc.vectordb.milvus.queryAndUpdate('http://localhost:19531', 'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { mapping: {
            embeddingKey: "vect",
            relType: "TEST",
            entityKey: "myId",
            metadataKey: "foo"
        }
    })

which populates the two relationships as: ()-[:TEST {myId: 'one', city: 'Berlin', vect: [vector1]}]-() and ()-[:TEST {myId: 'two', city: 'London', vect: [vector2]}]-(), which will be returned in the entity column result.

We can also use mapping for apoc.vectordb.milvus.query procedure, to search for nodes/rels fitting label/type and metadataKey, without making updates (i.e. equivalent to *.queryOrUpdate procedure with mapping config having mode: "READ_ONLY").

For example, with the previous relationships, we can execute the following procedure, which just return the relationships in the column rel:

CALL apoc.vectordb.milvus.query('http://localhost:19531', 'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { mapping: {
            embeddingKey: "vect",
            relType: "TEST",
            entityKey: "myId",
            metadataKey: "foo"
        }
    })

We can use mapping with apoc.vectordb.milvus.get* procedures as well

To optimize performances, we can choose what to YIELD with the apoc.vectordb.milvus.query* and the apoc.vectordb.milvus.get* procedures.

For example, by executing a CALL apoc.vectordb.milvus.query(…​) YIELD metadata, score, id, the RestAPI request will have an {"with_payload": false, "with_vectors": false}, so that we do not return the other values that we do not need.

It is possible to execute vector db procedures together with the apoc.ml.rag as follow:

CALL apoc.vectordb.milvus.getAndUpdate($host, $collection, [<id1>, <id2>], $conf) YIELD node, metadata, id, vector
WITH collect(node) as paths
CALL apoc.ml.rag(paths, $attributes, $question, $confPrompt) YIELD value
RETURN value
Delete vectors (it leverages this API)
CALL apoc.vectordb.milvus.delete('http://localhost:19531', 'test_collection', [1,2], {<optional config>})

Performance

The table below shows the time spent on all operations on a sample of 16.384 records, tested with a MacBook Pro M3 Pro 18GB Ram using a Docker with 8 CPU, Memory limit 10GB and Swap 1.5GB.

Table 6. Performance results
Operation Time (ms)

apoc.vectordb.milvus.createCollection

69

apoc.vectordb.milvus.upsert

567

apoc.vectordb.milvus.get

3508

apoc.vectordb.milvus.query

459

apoc.vectordb.milvus.delete

411

apoc.vectordb.milvus.deleteCollection

62