Pinecone

Here is a list of all available Pinecone procedures:

name description

apoc.vectordb.pinecone.createCollection(hostOrKey, index, similarity, size, $config)

Creates an index, with the name specified in the 2nd parameter, and with the specified similarity and size. The default endpoint is <hostOrKey param>/indexes.

apoc.vectordb.pinecone.deleteCollection(hostOrKey, index, $config)

Deletes an index with the name specified in the 2nd parameter. The default endpoint is <hostOrKey param>/indexes/<collection param>.

apoc.vectordb.pinecone.upsert(hostOrKey, index, vectors, $config)

Upserts, in the index with the name specified in the 2nd parameter, the vectors [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}]. The default endpoint is <hostOrKey param>/vectors/upsert.

apoc.vectordb.pinecone.delete(hostOrKey, index, ids, $config)

Delete the vectors with the specified ids. The default endpoint is <hostOrKey param>/indexes/<collection param>.

apoc.vectordb.pinecone.get(hostOrKey, index, ids, $config)

Get the vectors with the specified ids. The default endpoint is <hostOrKey param>/vectors/fetch.

apoc.vectordb.pinecone.getAndUpdate(hostOrKey, index, ids, $config)

Get the vectors with the specified ids, and optionally creates/updates neo4j entities. The default endpoint is <hostOrKey param>/vectors/fetch.

apoc.vectordb.pinecone.query(hostOrKey, index, vector, filter, limit, $config)

Retrieve closest vectors the the defined vector, limit of results, in the index with the name specified in the 2nd parameter. The default endpoint is <hostOrKey param>/query.

apoc.vectordb.pinecone.queryAndUpdate(hostOrKey, index, vector, filter, limit, $config)

Retrieve closest vectors the the defined vector, limit of results, in the index with the name specified in the 2nd parameter, and optionally creates/updates neo4j entities. The default endpoint is <hostOrKey param>/query.

where the 1st parameter can be a key defined by the apoc config apoc.pinecone.<key>.host=myHost.

The procedures create/drop/handle an index, instead of a collection like the other vectordb procedures, since in Pinecone a collection is a static and non-queryable copy of an index.

Anyway, the create / delete index procedures are named .createCollection and .deleteCollection to be consistent with the other.

The default hostOrKey is "https://api.pinecone.io", therefore in general can be null with the createCollection and deleteCollection procedures, and equal to the host name, with the other ones, that is, the one indicated in the Pinecone dashboard:

pinecone index

Examples

The following example assume we want to create and manage an index called test-index.

Create an index (it leverages this API)
CALL apoc.vectordb.pinecone.createCollection(null, 'test-index', 'cosine', 4, {<optional config>})
Delete an index (it leverages this API)
CALL apoc.vectordb.pinecone.deleteCollection(null, 'test-index', {<optional config>})
Upsert vectors (it leverages this API)
CALL apoc.vectordb.pinecone.upsert('https://test-index-ilx67g5.svc.aped-4627-b74a.pinecone.io',
  'test-index',
  [
    {id: '1', vector: [0.05, 0.61, 0.76, 0.74], metadata: {city: "Berlin", foo: "one"}},
    {id: '2', vector: [0.19, 0.81, 0.75, 0.11], metadata: {city: "London", foo: "two"}}
  ],
  {<optional config>})
Get vectors (it leverages this API)
CALL apoc.vectordb.pinecone.get($host, 'test-index', [1,2], {<optional config>})
Table 1. Example results
score metadata id vector text entity errors

null

{city: "Berlin", foo: "one"}

null

null

null

null

null

null

{city: "Berlin", foo: "two"}

null

null

null

null

null

Get vectors with {allResults: true}
CALL apoc.vectordb.pinecone.get($host, 'test-index', ['1','2'], {allResults: true, <optional config>})
Table 2. Example results
score metadata id vector text entity errors

null

{city: "Berlin", foo: "one"}

1

[…​]

null

null

null

null

{city: "Berlin", foo: "two"}

2

[…​]

null

null

null

Query vectors (it leverages this API)
CALL apoc.vectordb.pinecone.query($host,
    'test-index',
    [0.2, 0.1, 0.9, 0.7],
    { city: { `$eq`: "London" } },
    5,
    {allResults: true, <optional config>})
Table 3. Example results
score metadata id vector text entity errors

1,

{city: "Berlin", foo: "one"}

1

[…​]

null

null

null

0.1

{city: "Berlin", foo: "two"}

2

[…​]

null

null

null

We can define a mapping, to auto-create one/multiple nodes and relationships, by leveraging the vector metadata.

For example, if we have created 2 vectors with the above upsert procedures, we can populate some existing nodes (i.e. (:Test {myId: 'one'}) and (:Test {myId: 'two'})):

CALL apoc.vectordb.pinecone.queryAndUpdate($host, 'test-index',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { mapping: {
            embeddingKey: "vect",
            nodeLabel: "Test",
            entityKey: "myId",
            metadataKey: "foo"
        }
    })

which populates the two nodes as: (:Test {myId: 'one', city: 'Berlin', vect: [vector1]}) and (:Test {myId: 'two', city: 'London', vect: [vector2]}), which will be returned in the entity column result.

We can also set the mapping configuration mode to CREATE_IF_MISSING (which creates nodes if not exist), READ_ONLY (to search for nodes/rels, without making updates) or UPDATE_EXISTING (default behavior):

CALL apoc.vectordb.pinecone.queryAndUpdate($host, 'test-index',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { mapping: {
            mode: "CREATE_IF_MISSING",
            embeddingKey: "vect",
            nodeLabel: "Test",
            entityKey: "myId",
            metadataKey: "foo"
        }
    })

which creates and 2 new nodes as above.

Or, we can populate an existing relationship (i.e. (:Start)-[:TEST {myId: 'one'}]→(:End) and (:Start)-[:TEST {myId: 'two'}]→(:End)):

CALL apoc.vectordb.pinecone.queryAndUpdate($host, 'test-index',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { mapping: {
            embeddingKey: "vect",
            relType: "TEST",
            entityKey: "myId",
            metadataKey: "foo"
        }
    })

which populates the two relationships as: ()-[:TEST {myId: 'one', city: 'Berlin', vect: [vector1]}]-() and ()-[:TEST {myId: 'two', city: 'London', vect: [vector2]}]-(), which will be returned in the entity column result.

We can also use mapping for apoc.vectordb.pinecone.query procedure, to search for nodes/rels fitting label/type and metadataKey, without making updates (i.e. equivalent to *.queryOrUpdate procedure with mapping config having mode: "READ_ONLY").

For example, with the previous relationships, we can execute the following procedure, which just return the relationships in the column rel:

CALL apoc.vectordb.pinecone.query($host, 'test-index',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { mapping: {
            embeddingKey: "vect",
            relType: "TEST",
            entityKey: "myId",
            metadataKey: "foo"
        }
    })

We can use mapping with apoc.vectordb.pinecone.get* procedures as well

To optimize performances, we can choose what to YIELD with the apoc.vectordb.pinecone.query* and the apoc.vectordb.pinecone.get* procedures.

For example, by executing a CALL apoc.vectordb.pinecone.query(…​) YIELD metadata, score, id, the RestAPI request will have an {"with_payload": false, "with_vectors": false}, so that we do not return the other values that we do not need.

It is possible to execute vector db procedures together with the apoc.ml.rag as follow:

CALL apoc.vectordb.pinecone.getAndUpdate($host, $collection, [<id1>, <id2>], $conf) YIELD node, metadata, id, vector
WITH collect(node) as paths
CALL apoc.ml.rag(paths, $attributes, $question, $confPrompt) YIELD value
RETURN value
Delete vectors (it leverages this API)
CALL apoc.vectordb.pinecone.delete($host, 'test-index', ['1','2'], {<optional config>})