Features documentation

Sources

Local file upload

You can drag & drop files into the first input zone on the left. The application will store the uploaded sources as Document nodes in the graph using LangChain Loaders (PDFLoader and Unstructured Loader).

File Type Supported Extensions

Microsoft Office

.docx, .pptx, .xls

PDF

.pdf

Images

.jpeg, .jpg, .png, .svg

Text

.html, .txt, .md

The second input zone handles web links.

  • YouTube transcripts

  • Wikipedia pages

  • Web Pages

The application will parse and store the uploaded YouTube videos (transcript) as a Document nodes in the graph using YouTube parsers.

For Wikipedia links we use the Wikipedia Loader. For example, you can provide https://en.wikipedia.org/wiki/Neo4j and it will load the Neo4j Wikipedia page.

For web pages, we use the Unstructured Loader. For example, you can provide articles from https://theguardian.com/ and it will load the article content.

Cloud Storage

AWS S3

This AWS S3 integration allows you to connect to an S3 bucket and load the files from there. You will need to provide your AWS credentials and the bucket name.

Google Cloud Storage

This Google Cloud Storage integration allows you to connect to a GCS bucket and load the files from there. You will have provide your GCS bucket name and optionally a folder an follow an auth flow to give the application access to the bucket.

LLM Models

The application uses ML models to transform PDFs, web pages, and YouTube video transcripts into a knowledge graph of entities and their relationships. ENV variables can be set to configure/enable/disable specific models.

The following models are configured (but only the first 3 are available in the publicly hosted version)

  • OpenAI GPT 3.5 and 4o

  • VertexAI (Gemini 1.0),

  • Diffbot

  • Bedrock,

  • Anthropic

  • OpenAI API compatible models like Ollama, Groq, Fireworks

The selected LLM model will both be used for processing the newly uploaded files and for powering the chatbot. Please note that the models have different capabilities, so they will work not equally well especially for extraction.

Graph Enhancements

Graph Schema

llm graph builder taxonomy

If you want to use a pre-defined or your own graph schema, you can do so in the Graph Enhancements popup. This is also shown the first time you construct a graph and the state of the model configuration is listed below the connection information.

You can either: * select a pre-defined schema from the dropdown on top, * use your own by entering the node labels and relationships, * fetch the existing schema from an existing Neo4j database (Use Existing Schema), * or copy/paste a text or schema description (also works with RDF ontologies or Cypher/GQL schema) and ask the LLM to analyze it and come up with a suggested schema (Get Schema From Text).

Delete Disconnected Nodes

When extracting entities, it can happen that after the extraction a number of nodes are only connected to text chunks but not to other entities. Which results to disconnected entities in the entity graph.

While they can hold relevant information for question answering they might affect your downstream usage. So in this view you can select which of the entities that are only connected to text chunks should be deleted.

Visualization

You can visualize the lexical, the entity graph or the full knowledge graph of your extracted documents.

Graph Visualization

You have 2 options - either per document with the magifying glass icon at the end of the table or for all selected documents with the "Preview Graph" button.

The graph visualization will show for the relevant files in a pop-up and you can filter for the type of graph you wanto to see:

  • Lexical Graph - the document and chunk nodes and their relationships

  • Entity Graph - the entity nodes and their relationships

  • Full Graph - all nodes and relationships

Explore in Neo4j Bloom

With the button "Explore in Neo4j Bloom" you can open the constructed knowledge graph in Neo4j Workspace for further visual exploration, querying and analysis.

In Bloom/Explore you can run low code pattern based queries (or use the co-pilot) to fetch data from the graph and succesfully expand. If you are running against a GDS enabled instance, you can also run graph algorithms and visualize the results. You can also interactively edit and add to the graph.

In Neo4j Data Importer you can additionally import structured data from CSV files and connect it to your extracted knowledge graph.

In Neo4j Query you can write Cypher queries (or use the co-pilot) to pull tabular and graph data from your database.

Chatbot

How it works

When the user asks a question, we use the configured RAG mode to answer it with the data from the graph of extracted documents. That can mean the question is turned into an embedding or a graph query or a more advanced RAG approach.

We also summarize the chat history and use it as an element to enrich the context.

Features

  • Select RAG Mode you can select vector-only or GraphRAG (vector+graph) modes

  • Chat with selected documents: Will use the selected documents only for RAG, uses pre-filtering to achieve that

  • Details: Will open a Retrieval information pop-up showing details on how the RAG agent collected and used sources (documents), chunks, and entities. Also provides information on the model used and the token consumption.

  • Clear chat: Will delete the current session’s chat history.

  • Expand view: Will open the chatbot interface in a fullscreen mode.

  • Copy: Will copy the content of the response to the clipboard.

  • Text-To-Speech: Will read out loud the content of the response.

GraphRAG

For GraphRAG we use the Neo4j Vector Index (and a fulltext index for hybrid search) with a Retrieval Query to find the most relevant chunks and entities connected to these and then follow the entity relationships up to a depth of 2 hops.

Vectory Only RAG

For Vector only RAG we only use the vector and fulltext index (hybrid) search results and don’t include additional information from the entity graph.

Answer Generation

The various inputs and determined sources (the question, vector results, entities (name + description), relationship pairs, chat history) are all sent to the selected LLM model as context information in a custom prompt, asking to provide and format a response to the question asked based on the elements and context provided.

Of course, there is more magic to the prompt such as formatting, asking to cite sources, not speculating if the answer is not known, etc. The full prompt and instructions can be found in the GitHub repository.