Weaviate

Here is a list of all available Weaviate procedures, note that the list and the signature procedures are consistent with the others, like the Qdrant ones:

name description

apoc.vectordb.weaviate.createCollection(hostOrKey, collection, similarity, size, $config)

Creates a collection, with the name specified in the 2nd parameter, and with the specified similarity and size. The default endpoint is <hostOrKey param>/schema.

apoc.vectordb.weaviate.deleteCollection(hostOrKey, collection, $config)

Deletes a collection with the name specified in the 2nd parameter. The default endpoint is <hostOrKey param>/schema/<collection param>.

apoc.vectordb.weaviate.upsert(hostOrKey, collection, vectors, $config)

Upserts, in the collection with the name specified in the 2nd parameter, the vectors [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}]. The default endpoint is <hostOrKey param>/objects.

apoc.vectordb.weaviate.delete(hostOrKey, collection, ids, $config)

Deletes the vectors with the specified ids. The default endpoint is <hostOrKey param>/schema.

apoc.vectordb.weaviate.get(hostOrKey, collection, ids, $config)

Gets the vectors with the specified ids. The default endpoint is <hostOrKey param>/schema.

apoc.vectordb.weaviate.query(hostOrKey, collection, vector, filter, limit, $config)

Retrieve closest vectors from the defined vector, limit of results, in the collection with the name specified in the 2nd parameter. Note that, besides the common config parameters, this procedure requires a field: [listOfProperty] config, to define which properties are to be retrieved from GraphQL running under-the-hood. The default endpoint is <hostOrKey param>/graphql.

apoc.vectordb.weaviate.getAndUpdate(hostOrKey, collection, ids, $config)

Gets the vectors with the specified ids, and optionally creates/updates neo4j entities. The default endpoint is <hostOrKey param>/schema.

apoc.vectordb.weaviate.queryAndUpdate(hostOrKey, collection, vector, filter, limit, $config)

Retrieve closest vectors from the defined vector, limit of results, in the collection with the name specified in the 2nd parameter, and optionally creates/updates neo4j entities. Note that, besides the common config parameters, this procedure requires a field: [listOfProperty] config, to define which properties are to be retrieved from GraphQL running under-the-hood. The default endpoint is <hostOrKey param>/graphql.

where the 1st parameter can be a key defined by the apoc config apoc.weaviate.<key>.host=myHost. With hostOrKey=null, the default is 'http://localhost:8080/v1'.

Examples

Create a collection (it leverages this API)
CALL apoc.vectordb.weaviate.createCollection($host, 'test_collection', 'Cosine', 4, {<optional config>})
Create a collection against a remote connection using an API key (see here)
CALL apoc.vectordb.weaviate.createCollection("https://<weaviateInstanceId>.weaviate.network",
    'TestCollection',
    'cosine',
    4,
    {headers: {Authorization: 'Bearer <apiKey>'}})
Delete a collection (it leverages this API)
CALL apoc.vectordb.weaviate.deleteCollection($host, 'test_collection', {<optional config>})
Upsert vectors (it leverages this API)
CALL apoc.vectordb.weaviate.upsert($host, 'test_collection',
    [
        {id: "8ef2b3a7-1e56-4ddd-b8c3-2ca8901ce308", vector: [0.05, 0.61, 0.76, 0.74], metadata: {city: "Berlin", foo: "one"}},
        {id: "9ef2b3a7-1e56-4ddd-b8c3-2ca8901ce308", vector: [0.19, 0.81, 0.75, 0.11], metadata: {city: "London", foo: "two"}}
    ],
    {<optional config>})
Get vectors (it leverages this API)
CALL apoc.vectordb.weaviate.get($host, 'test_collection', [1,2], {<optional config>})
Table 1. Example results
score metadata id vector text entity errors

null

{city: "Berlin", foo: "one"}

null

null

null

null

null

null

{city: "Berlin", foo: "two"}

null

null

null

null

null

Get vectors with {allResults: true}
CALL apoc.vectordb.weaviate.get($host, 'test_collection', [1,2], {allResults: true, <optional config>})
Table 2. Example results
score metadata id vector text entity errors

null

{city: "Berlin", foo: "one"}

1

[…​]

null

null

null

null

{city: "Berlin", foo: "two"}

2

[…​]

null

null

null

Query vectors (it leverages here)
CALL apoc.vectordb.weaviate.query($host,
    'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    '{operator: Equal, valueString: "London", path: ["city"]}',
    5,
    {fields: ["city", "foo"], allResults: true, <other optional config>})
Table 3. Example results
score metadata id vector text errors

1,

{city: "Berlin", foo: "one"}

1

[…​]

null

null

0.1

{city: "Berlin", foo: "two"}

2

[…​]

null

null

In case of errors, e.g. due to apoc.vectordb.weaviate.query with wrong vector size as a 3rd parameter, the error field will be populated, for example:

Table 4. Example results
score metadata id vector text errors

null

null

null

null

null

..vector search: knn search: distance between entrypoint and query node: vector lengths don’t match: 4 vs 3..

We can define a mapping, to fetch the associated nodes and relationships and optionally create them, by leveraging the vector metadata.

For example, if we have created 2 vectors with the above upsert procedures, we can populate some existing nodes (i.e. (:Test {myId: 'one'}) and (:Test {myId: 'two'})):

CALL apoc.vectordb.weaviate.queryAndUpdate($host, 'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { fields: ["city", "foo"],
      mapping: {
        embeddingKey: "vect",
        nodeLabel: "Test",
        entityKey: "myId",
        metadataKey: "foo"
      }
    })

which populates the two nodes as: (:Test {myId: 'one', city: 'Berlin', vect: [vector1]}) and (:Test {myId: 'two', city: 'London', vect: [vector2]}), which will be returned in the entity column result.

We can also set the mapping configuration mode to CREATE_IF_MISSING (which creates nodes if not exist), READ_ONLY (to search for nodes/rels, without making updates) or UPDATE_EXISTING (default behavior):

CALL apoc.vectordb.weaviate.queryAndUpdate($host, 'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { fields: ["city", "foo"],
      mapping: {
        mode: "CREATE_IF_MISSING",
        embeddingKey: "vect",
        nodeLabel: "Test",
        entityKey: "myId",
        metadataKey: "foo"
      }
    })

which creates 2 new nodes as above.

Or, we can populate an existing relationship (i.e. (:Start)-[:TEST {myId: 'one'}]→(:End) and (:Start)-[:TEST {myId: 'two'}]→(:End)):

CALL apoc.vectordb.weaviate.queryAndUpdate($host, 'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { fields: ["city", "foo"],
      mapping: {
        embeddingKey: "vect",
        relType: "TEST",
        entityKey: "myId",
        metadataKey: "foo"
      }
    })

which populates the two relationships as: ()-[:TEST {myId: 'one', city: 'Berlin', vect: [vector1]}]-() and ()-[:TEST {myId: 'two', city: 'London', vect: [vector2]}]-(), which will be returned in the entity column result.

We can also use mapping for apoc.vectordb.weaviate.query procedure, to search for nodes/rels fitting label/type and metadataKey, without making updates (i.e. equivalent to *.queryOrUpdate procedure with mapping config having mode: "READ_ONLY").

For example, with the previous relationships, we can execute the following procedure, which just return the relationships in the column rel:

CALL apoc.vectordb.weaviate.query($host, 'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { fields: ["city", "foo"],
      mapping: {
        relType: "TEST",
        entityKey: "myId",
        metadataKey: "foo"
      }
    })

We can use mapping with apoc.vectordb.weaviate.get* procedures as well

To optimize performances, we can choose what to YIELD with the apoc.vectordb.weaviate.query and the apoc.vectordb.weaviate.get procedures.

For example, by executing a CALL apoc.vectordb.weaviate.query(…​) YIELD metadata, score, id, the RestAPI request will have an {"with_payload": false, "with_vectors": false}, so that we do not return the other values that we do not need.

It is possible to execute vector db procedures together with the apoc.ml.rag as follow:

CALL apoc.vectordb.weaviate.getAndUpdate($host, $collection, [<id1>, <id2>], $conf) YIELD score, node, metadata, id, vector
WITH collect(node) as paths
CALL apoc.ml.rag(paths, $attributes, $question, $confPrompt) YIELD value
RETURN value
Delete vectors (it leverages this API)
CALL apoc.vectordb.weaviate.delete($host, 'test_collection', [1,2], {<optional config>})

Performance

The table below shows the time spent on all operations on a sample of 100.000 records, tested with a MacBook Pro M3 Pro 18GB Ram using a Docker with 8 CPU, Memory limit 10GB and Swap 1.5GB.

Table 5. Performance results
Operation Time (ms)

apoc.vectordb.weaviate.createCollection

59

apoc.vectordb.weaviate.upsert

319766

apoc.vectordb.weaviate.get

41431

apoc.vectordb.weaviate.query

1887

apoc.vectordb.weaviate.delete

53218

apoc.vectordb.weaviate.deleteCollection

201