Text Functions

See the Cypher Manual for built-in Cypher String Functions and Operators.

Comparing strings using the Levenshtein distance

Compare the given STRING values with the StringUtils.distance(text1, text2) method (Levenshtein).

RETURN apoc.text.distance("Levenshtein", "Levenstein") // 1

Comparing the given strings using the Sørensen–Dice coefficient formula.

computes the similarity assuming Locale.ENGLISH
RETURN apoc.text.sorensenDiceSimilarity("belly", "jolly") // 0.5
computes the similarity with an explicit locale
RETURN apoc.text.sorensenDiceSimilarity("halım", "halim", "tr-TR") // 0.5

Check if 2 words can be matched in a fuzzy way with fuzzyMatch

Depending on the length of the given STRING (distance: length < 3 then 0, length < 5 then 1, else 2) it will allow more characters that needs to be edited to match the second STRING (Levenshtein distance).

RETURN apoc.text.fuzzyMatch("The", "the") // true

Phonetic Comparison Functions

The phonetic text (soundex) functions allow you to compute the soundex encoding of a given string. There is also a procedure to compare how similar two strings sound under the soundex algorithm. All soundex procedures by default assume the used language is US English.

apoc.text.phonetic(text STRING)

Returns the US_ENGLISH phonetic soundex encoding of all words of the STRING.

apoc.text.doubleMetaphone(value STRING)

Returns the double metaphone phonetic encoding of all words in the given STRING value.

apoc.text.clean(text STRING)

Strips the given STRING of everything except alpha numeric characters and converts it to lower case.

apoc.text.compareCleaned(text1 STRING, text2 STRING)

Compares two given STRING values stripped of everything except alpha numeric characters converted to lower case.

Table 1. Procedure

apoc.text.phoneticDelta(text1 STRING, text2 STRING)

Returns the US_ENGLISH soundex character difference between the two given STRING values.

// will return 'H436'
RETURN apoc.text.phonetic('Hello, dear User!')
// will return '4'  (very similar)
RETURN apoc.text.phoneticDelta('Hello Mr Rabbit', 'Hello Mr Ribbit')

Formatting Text

Format the given STRING with the given parameters, and optional parameter language.

without language param ('en' default)
RETURN apoc.text.format('ab%s %d %.1f %s%n',['cd', 42, 3.14, true]) AS value // abcd 42 3.1 true
with language param
RETURN apoc.text.format('ab%s %d %.1f %s%n',['cd', 42, 3.14, true],'it') AS value // abcd 42 3,1 true

The indexOf function, provides the fist occurrence of the given lookup string within the text, or -1 if not found. It can optionally take from (inclusive) and to (exclusive) parameters.

RETURN apoc.text.indexOf('Hello World!', 'World') // 6

The indexesOf function, provides all occurrences of the given lookup string within the text, or empty list if not found. It can optionally take from (inclusive) and to (exclusive) parameters.

RETURN apoc.text.indexesOf('Hello World!', 'o',2,9) // [4,7]

To get a substring starting from the index match:

returns World!
WITH 'Hello World!' as text, length(text) as len
WITH text, len, apoc.text.indexOf(text, 'World',3) as index
RETURN substring(text, case index when -1 then len-1 else index end, len);

Regular Expressions

Returns 'HelloWorld'
RETURN apoc.text.replace('Hello World!', '[^a-zA-Z]', '')
Returns matches found by the given regex pattern
RETURN apoc.text.regexGroups('abc <link xxx1>yyy1</link> def <link xxx2>yyy2</link>','<link (\\w+)>(\\w+)</link>') AS result

// [["<link xxx1>yyy1</link>", "xxx1", "yyy1"], ["<link xxx2>yyy2</link>", "xxx2", "yyy2"]]
Returns matches found by the given regex pattern connected to their given group names
RETURN apoc.text.regexGroupsByName(
  'abc <link xxx1>yyy1</link> def <link xxx2>yyy2</link>',
  '<link (?<firstPart>\\w+)>(?<secondPart>\\w+)</link>'
) AS output;

// [{ "group": "<link xxx1>yyy1</link>", "matches" : {"firstPart": "xxx1", "secondPart": "yyy1"}}, {"group": <link xxx2>yyy2</link>", "matches" : { "firstPart":  "xxx2", "secondPart": "yyy2"}}]

Split and Join

will split with the given regular expression return ['Hello', 'World']
RETURN apoc.text.split('Hello   World', ' +')
will return 'Hello World'
RETURN apoc.text.join(['Hello', 'World'], ' ')

Data Cleaning

will return 'helloworld'
RETURN apoc.text.clean('Hello World!')
will return true
RETURN apoc.text.compareCleaned('Hello World!', '_hello-world_')
will return only 'Hello World!'
UNWIND ['Hello World!', 'hello worlds'] as text
RETURN apoc.text.filterCleanMatches(text, 'hello_world') as text

The clean functionality can be useful for cleaning up slightly dirty text data with inconsistent formatting for non-exact comparisons.

Cleaning will strip the string of all non-alphanumeric characters (including spaces) and convert it to lower case.

Case Change Functions

Capitalise the first letter of the word with capitalize
RETURN apoc.text.capitalize("neo4j") // "Neo4j"
Capitalise the first letter of every word in the text with capitalizeAll
RETURN apoc.text.capitalizeAll("graph database") // "Graph Database"
Decapitalize the first letter of the string with decapitalize
RETURN apoc.text.decapitalize("Graph Database") // "graph Database"
Decapitalize the first letter of all words with decapitalizeAll
RETURN apoc.text.decapitalizeAll("Graph Databases") // "graph databases"
Swap the case of a string with swapCase
RETURN apoc.text.swapCase("Neo4j") // nEO4J
Convert a string to lower camelCase with camelCase
RETURN apoc.text.camelCase("FOO_BAR");    // "fooBar"
RETURN apoc.text.camelCase("Foo bar");    // "fooBar"
RETURN apoc.text.camelCase("Foo22 bar");  // "foo22Bar"
RETURN apoc.text.camelCase("foo-bar");    // "fooBar"
RETURN apoc.text.camelCase("Foobar");     // "foobar"
RETURN apoc.text.camelCase("Foo$$Bar");   // "fooBar"
Convert a string to UpperCamelCase with upperCamelCase
RETURN apoc.text.upperCamelCase("FOO_BAR");   // "FooBar"
RETURN apoc.text.upperCamelCase("Foo bar");   // "FooBar"
RETURN apoc.text.upperCamelCase("Foo22 bar"); // "Foo22Bar"
RETURN apoc.text.upperCamelCase("foo-bar");   // "FooBar"
RETURN apoc.text.upperCamelCase("Foobar");    // "Foobar"
RETURN apoc.text.upperCamelCase("Foo$$Bar");  // "FooBar"
Convert a string to snake-case with snakeCase
RETURN apoc.text.snakeCase("test Snake Case"); // "test-snake-case"
RETURN apoc.text.snakeCase("FOO_BAR");         // "foo-bar"
RETURN apoc.text.snakeCase("Foo bar");         // "foo-bar"
RETURN apoc.text.snakeCase("fooBar");          // "foo-bar"
RETURN apoc.text.snakeCase("foo-bar");         // "foo-bar"
RETURN apoc.text.snakeCase("Foo bar");         // "foo-bar"
RETURN apoc.text.snakeCase("Foo  bar");        // "foo-bar"
Convert a string to UPPER_CASE with toUpperCase
RETURN apoc.text.toUpperCase("test upper case"); // "TEST_UPPER_CASE"
RETURN apoc.text.toUpperCase("FooBar");          // "FOO_BAR"
RETURN apoc.text.toUpperCase("fooBar");          // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo-bar");         // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo--bar");        // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo$$bar");        // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo 22 bar");      // "FOO_22_BAR"

Base64 De- and Encoding

Encode or decode a string in base64 or base64Url

Encode base 64
RETURN apoc.text.base64Encode("neo4j") // bmVvNGo=
Decode base 64
RETURN apoc.text.base64Decode("bmVvNGo=") // neo4j
Encode base 64 URL
RETURN apoc.text.base64UrlEncode("http://neo4j.com/?test=test") // aHR0cDovL25lbzRqLmNvbS8_dGVzdD10ZXN0
Decode base 64 URL
RETURN apoc.text.base64UrlDecode("aHR0cDovL25lbzRqLmNvbS8_dGVzdD10ZXN0") // http://neo4j.com/?test=test

Random String

You can generate a random string to a specified length by calling apoc.text.random with a length parameter and optional string of valid characters.

The valid parameter will accept the following regex patterns, alternatively you can provide a string of letters and/or characters.

Pattern

Description

A-Z

A-Z in uppercase

a-z

A-Z in lowercase

0-9

Numbers 0-9 inclusive

The following call will return a random string including uppercase letters, numbers and . and $ characters.
RETURN apoc.text.random(10, "A-Z0-9.$")

Hashing Functions

apoc.util.sha1([values])

computes the sha1 of the concatenation of all string values of the list

apoc.util.md5([values])

computes the md5 of the concatenation of all string values of the list