Documentation for local deployments

Prerequisites

You will need to have a Neo4j Database V5.18 or later with APOC installed to use this Knowledge Graph Builder. You can use any Neo4j Aura database (including the free tier database). Neo4j Aura automatically includes APOC and run on the latest Neo4j version, making it a great choice to get started quickly.

You can also use the free trial in Neo4j Sandbox, which also includes Graph Data Science. If want to use Neo4j Desktop instead, you need to configure your NEO4J_URI=bolt://host.docker.internal to allow the Docker container to access the network running on your computer.

Docker-compose

By default only OpenAI and Diffbot are enabled since Gemini requires extra GCP configurations.

In your root folder, create a .env file with your OPENAI and DIFFBOT keys (if you want to use both):

OPENAI_API_KEY="your-openai-key"
DIFFBOT_API_KEY="your-diffbot-key"

if you only want OpenAI:

LLM_MODELS="gpt-3.5,gpt-4o"
OPENAI_API_KEY="your-openai-key"

if you only want Diffbot:

LLM_MODELS="diffbot"
DIFFBOT_API_KEY="your-diffbot-key"

You can then run Docker Compose to build and start all components:

docker-compose up --build

Configuring LLM Models

You can configure the following LLM models besides the ones supported out of the box:

  • OpenAI GPT 3.5 and 4o (default)

  • VertexAI (Gemini 1.0) (default)

  • VertexAI (Gemini 1.5)

  • Diffbot

  • Bedrock models

  • Anthropic

  • OpenAI API compatible models like Ollama, Groq, Fireworks

To achieve that you need to set a number of environment variables:

In your .env file, add the following lines. You can of course also add other model configurations from these providers or any OpenAI API compatible provider.

LLM_MODEL_CONFIG_azure_ai_gpt_35="gpt-35,https://<deployment>.openai.azure.com/,<api-key>,<version>"
LLM_MODEL_CONFIG_anthropic_claude_35_sonnet="claude-3-5-sonnet-20240620,<api-key>"
LLM_MODEL_CONFIG_fireworks_llama_v3_70b="accounts/fireworks/models/llama-v3-70b-instruct,<api-key>"
LLM_MODEL_CONFIG_bedrock_claude_35_sonnet="anthropic.claude-3-sonnet-20240229-v1:0,<api-key>,<region>"
LLM_MODEL_CONFIG_ollama_llama3="llama3,http://host.docker.internal:11434"
LLM_MODEL_CONFIG_fireworks_qwen_72b="accounts/fireworks/models/qwen2-72b-instruct,<api-key>"

# Optional Frontend config
LLM_MODELS="diffbot,gpt-3.5,gpt-4o,azure_ai_gpt_35,azure_ai_gpt_4o,groq_llama3_70b,anthropic_claude_35_sonnet,fireworks_llama_v3_70b,bedrock_claude_35_sonnet,ollama_llama3,fireworks_qwen_72b"

In your docker-compose.yml you need to pass the variables through:

- LLM_MODEL_CONFIG_anthropic_claude_35_sonnet=${LLM_MODEL_CONFIG_anthropic_claude_35_sonnet-}
- LLM_MODEL_CONFIG_fireworks_llama_v3_70b=${LLM_MODEL_CONFIG_fireworks_llama_v3_70b-}
- LLM_MODEL_CONFIG_azure_ai_gpt_4o=${LLM_MODEL_CONFIG_azure_ai_gpt_4o-}
- LLM_MODEL_CONFIG_azure_ai_gpt_35=${LLM_MODEL_CONFIG_azure_ai_gpt_35-}
- LLM_MODEL_CONFIG_groq_llama3_70b=${LLM_MODEL_CONFIG_groq_llama3_70b-}
- LLM_MODEL_CONFIG_bedrock_claude_3_5_sonnet=${LLM_MODEL_CONFIG_bedrock_claude_3_5_sonnet-}
- LLM_MODEL_CONFIG_fireworks_qwen_72b=${LLM_MODEL_CONFIG_fireworks_qwen_72b-}
- LLM_MODEL_CONFIG_ollama_llama3=${LLM_MODEL_CONFIG_ollama_llama3-}

Additional configs

By default, the input sources will be: Local files, Youtube, Wikipedia and AWS S3. This is the default config applied if you do not overwrite it in your .env file:

REACT_APP_SOURCES="local,youtube,wiki,s3"

If however you want the Google GCS integration, add gcs and your Google client ID:

REACT_APP_SOURCES="local,youtube,wiki,s3,gcs"
GOOGLE_CLIENT_ID="xxxx"

The REACT_APP_SOURCES should be a comma-separated list of the sources you want to enable. You can of course combine all (local, youtube, wikipedia, s3 and gcs) or remove any you don’t want or need.

Development (Separate Frontend and Backend)

Alternatively, you can run the backend and frontend separately:

  • For the frontend:

    1. Create the frontend/.env file by copy/pasting the frontend/example.env.

    2. Change values as needed

    3. Run:

cd frontend
yarn
yarn run dev
  • For the backend:

    1. Create the backend/.env file by copy/pasting the backend/example.env.

    2. Change values as needed

    3. Run:

cd backend
python -m venv envName
source envName/bin/activate
pip install -r requirements.txt
uvicorn score:app --reload

ENV

Processing Configuration

Env Variable Name Mandatory/Optional Default Value Description

IS_EMBEDDING

Optional

true

Flag to enable text embedding for chunks

ENTITY_EMBEDDING

Optional

false

Flag to enable entity embedding (id and description)

KNN_MIN_SCORE

Optional

0.94

Minimum score for KNN algorithm for connecting similar Chunks

NUMBER_OF_CHUNKS_TO_COMBINE

Optional

6

Number of chunks to combine when extracting entities

UPDATE_GRAPH_CHUNKS_PROCESSED

Optional

20

Number of chunks processed before writing to the database and updating progress

ENV

Optional

DEV

Environment variable for the app

TIME_PER_CHUNK

Optional

4

Time per chunk for processing

CHUNK_SIZE

Optional

5242880

Size of each chunk for processing

Front-End Configuration

Env Variable Name Mandatory/Optional Default Value Description

BACKEND_API_URL

Optional

http://localhost:8000

URL for backend API

REACT_APP_SOURCES

Optional

local,youtube,wiki,s3

List of input sources that will be available

BLOOM_URL

Optional

https://workspace-preview.neo4j.io/workspace/explore?connectURL={CONNECT_URL}&search=Show+me+a+graph

URL for Bloom visualization

GCP Cloud Integration

Env Variable Name Mandatory/Optional Default Value Description

GEMINI_ENABLED

Optional

False

Flag to enable Gemini

GCP_LOG_METRICS_ENABLED

Optional

False

Flag to enable Google Cloud logs

GOOGLE_CLIENT_ID

Optional

Client ID for Google authentication for GCS upload

GCS_FILE_CACHE

Optional

False

If set to True, will save the files to process into GCS. If set to False, will save the files locally

LLM Model Configuration

Env Variable Name Mandatory/Optional Default Value Description

LLM_MODELS

Optional

diffbot,gpt-3.5,gpt-4o

Models available for selection on the frontend, used for entities extraction and Q&A Chatbot (other models: gemini-1.0-pro, gemini-1.5-pro)

OPENAI_API_KEY

Optional

sk-…​

API key for OpenAI (if enabled)

DIFFBOT_API_KEY

Optional

API key for Diffbot (if enabled)

EMBEDDING_MODEL

Optional

all-MiniLM-L6-v2

Model for generating the text embedding (all-MiniLM-L6-v2 , openai , vertexai)

GROQ_API_KEY

Optional

API key for Groq

GEMINI_ENABLED

Optional

False

Flag to enable Gemini

LLM_MODEL_CONFIG_<model_id>="<model>,<base-url>,<api-key>,<version>"

Optional

Configuration for additional LLM models

LangChain and Neo4j Configuration

Env Variable Name Mandatory/Optional Default Value Description

NEO4J_URI

Optional

neo4j://database:7687

URI for Neo4j database for the backend to connect to

NEO4J_USERNAME

Optional

neo4j

Username for Neo4j database for the backend to connect to

NEO4J_PASSWORD

Optional

password

Password for Neo4j database for the backend to connect to

LANGCHAIN_API_KEY

Optional

API key for LangSmith

LANGCHAIN_PROJECT

Optional

Project for LangSmith

LANGCHAIN_TRACING_V2

Optional

false

Flag to enable LangSmith tracing

LANGCHAIN_ENDPOINT

Optional

https://api.smith.langchain.com

Endpoint for LangSmith API