Text2Cypher - natural language queries

Use natural language to generate Cypher® queries in NeoDash. Connect to an LLM through an API, and let NeoDash use your database schema and the report types to generate queries automatically.

How it works

This extension feature allows users to interact with NeoDash using natural language to generate Cypher queries for querying Neo4j graph databases. This integration leverages Large Language Models (LLMs) to interpret user inputs and generate Cypher queries based on the provided schema definition.

Configuration

To enable natural language queries in NeoDash, follow these configuration steps:

  1. Open NeoDash and navigate to the extensions section in the left sidebar.

  2. Locate the "Text2Cypher" extension and click it to activate it.

  3. Once activated, a new button will appear on top of the screen, with a red exclamation mark (⚠️). Click this button.

  4. In the configuration window, you are prompted to provide the necessary information to connect to the LLM. Enter the model provider, API key, deployment URL if the model provider needs it, and select the desired model to use.

  5. After providing the required information, click the Start Querying button to finalize the configuration.

Configuration settings for the Natural Language Queries extension

Usage

Once the extension is configured, you can start using it in your NeoDash reports:

  1. Open the report settings for the desired report.

  2. In the report settings, you can find a toggle located above the editor. It toggles between Cypher and the English language.

  3. Since you have enabled the extension and authenticated by providing your API key, you can switch to English.

  4. Start formulating your queries in plain English, using natural language expressions to describe the data you want to retrieve.

  5. After composing your query, you have two options for further actions:

    • Translate: By clicking the Translate button, your query is translated to Cypher using the LLM. The translated Cypher query is displayed in the editor when you toggle to the Cypher view. This allows you to review and modify the generated Cypher query before execution.

    • Run: If you wish to directly execute the query and view the results, click the Run button in the top right corner. The execution of the query depends on the selected report type, and the results are displayed accordingly.

Example of the English editor in NeoDash

Improving accuracy with custom prompting

To boost the accuracy of the language model, you can provide your own example queries to be fed into the prompt. Specifying queries specific to your data model and use cases can significantly improve the quality of Text2Cypher translations.

To access the model examples screen, open up the settings for the extensions. After specifying the provider and model, click the Tweak Prompts button on the bottom-left of the window. This leads you to the example interface:

Custom Examples for your prompt

In this interface, you can specify one or more examples that are sent to the language model. An example consists of both a Cypher query, and a natural language equivalent of that query. You can create as many examples as you want, but keeping them close to your user queries will yield best results.

Underlying functionality

  • Retrieve the schema: The system prompts at the beginning of the interaction to retrieve the database schema. This ensures that the generated queries adhere to the provided schema and available relationship types and properties.

  • Prompting in english: Once the schema is retrieved, you can start prompting your queries in plain English. NeoDash, powered by the LLM, interprets your English query and generates the corresponding Cypher query based on the provided schema.

  • Automatic query generation: NeoDash automatically generates the Cypher queries for you, taking into account the report type you specified. Whether it’s a table, graph, bar chart, line chart, or any other supported report type, the generated queries retrieve the necessary data based on the report requirements.

  • Retry logic: To enhance the reliability of the generated queries, we have implemented retry logic. If there is an issue or error during the query generation process, the system attempts to retry three times as a maximum and provides a valid query to ensure smooth query execution.

Prompting tips

When using natural language queries in NeoDash, keep the following in mind to enhance your experience:

  1. Be clear and specific in your queries. Provide detailed descriptions of the data you want to retrieve, including node labels, relationship types, and property values.

  2. Use keywords and phrases. Incorporate relevant keywords and phrases that are commonly used in the context of your data to improve query accuracy.

  3. Ask precise questions. Frame your queries as questions to obtain specific information. For example, instead of "Show me all customers," try "Which customers have made a purchase in the last month?"

  4. Experiment with different phrasings. If you’re not getting the desired results, try rephrasing your query using synonyms or alternative expressions.

  5. Avoid ambiguous queries. Ambiguous or vague queries may yield unexpected results. Make sure to provide sufficient context and clarify any ambiguities.

  6. Validate and review generated queries. Always review the generated Cypher queries to ensure they accurately represent your intent and produce the expected results.

Important considerations

When using natural language queries with LLMs, it’s important to be aware of the following:

  1. Multiple model providers. Depending on your configuration, your queries may be processed by different model providers. Take into account that this means your data is being sent to different providers.

  2. Non-deterministic nature. LLMs can produce non-deterministic outputs. The generated queries may vary between different runs, even with the same input prompt. Validate the generated queries and perform thorough testing to ensure correctness.

  3. Potential hallucination. LLMs can generate outputs that may not align with the specific schema or data constraints. Exercise caution and verify the results to prevent potential inaccuracies or hallucinations.