Text Functions
See the Cypher Manual for built-in Cypher String Functions and Operators.
Comparing strings using the Levenshtein distance
Compare the given STRING
values with the StringUtils.distance(text1, text2)
method (Levenshtein).
RETURN apoc.text.distance("Levenshtein", "Levenstein") // 1
Comparing the given strings using the Sørensen–Dice coefficient formula.
RETURN apoc.text.sorensenDiceSimilarity("belly", "jolly") // 0.5
RETURN apoc.text.sorensenDiceSimilarity("halım", "halim", "tr-TR") // 0.5
Check if 2 words can be matched in a fuzzy way with fuzzyMatch
Depending on the length of the given STRING
(distance: length < 3 then 0, length < 5 then 1, else 2) it will allow more characters that needs to be edited to match the second STRING
(Levenshtein distance).
RETURN apoc.text.fuzzyMatch("The", "the") // true
Phonetic Comparison Functions
The phonetic text (soundex) functions allow you to compute the soundex encoding of a given string. There is also a procedure to compare how similar two strings sound under the soundex algorithm. All soundex procedures by default assume the used language is US English.
|
Returns the US_ENGLISH phonetic soundex encoding of all words of the |
|
Returns the double metaphone phonetic encoding of all words in the given |
|
Strips the given |
|
Compares two given |
|
Returns the US_ENGLISH soundex character difference between the two given |
// will return 'H436'
RETURN apoc.text.phonetic('Hello, dear User!')
// will return '4' (very similar)
RETURN apoc.text.phoneticDelta('Hello Mr Rabbit', 'Hello Mr Ribbit')
Formatting Text
Format the given STRING
with the given parameters, and optional parameter language.
RETURN apoc.text.format('ab%s %d %.1f %s%n',['cd', 42, 3.14, true]) AS value // abcd 42 3.1 true
RETURN apoc.text.format('ab%s %d %.1f %s%n',['cd', 42, 3.14, true],'it') AS value // abcd 42 3,1 true
String Search
The indexOf
function, provides the fist occurrence of the given lookup
string within the text
, or -1 if not found.
It can optionally take from
(inclusive) and to
(exclusive) parameters.
RETURN apoc.text.indexOf('Hello World!', 'World') // 6
The indexesOf
function, provides all occurrences of the given lookup string within the text, or empty list if not found.
It can optionally take from
(inclusive) and to
(exclusive) parameters.
RETURN apoc.text.indexesOf('Hello World!', 'o',2,9) // [4,7]
To get a substring starting from the index match:
World!
WITH 'Hello World!' as text, length(text) as len
WITH text, len, apoc.text.indexOf(text, 'World',3) as index
RETURN substring(text, case index when -1 then len-1 else index end, len);
Regular Expressions
RETURN apoc.text.replace('Hello World!', '[^a-zA-Z]', '')
RETURN apoc.text.regexGroups('abc <link xxx1>yyy1</link> def <link xxx2>yyy2</link>','<link (\\w+)>(\\w+)</link>') AS result
// [["<link xxx1>yyy1</link>", "xxx1", "yyy1"], ["<link xxx2>yyy2</link>", "xxx2", "yyy2"]]
RETURN apoc.text.regexGroupsByName(
'abc <link xxx1>yyy1</link> def <link xxx2>yyy2</link>',
'<link (?<firstPart>\\w+)>(?<secondPart>\\w+)</link>'
) AS output;
// [{ "group": "<link xxx1>yyy1</link>", "matches" : {"firstPart": "xxx1", "secondPart": "yyy1"}}, {"group": <link xxx2>yyy2</link>", "matches" : { "firstPart": "xxx2", "secondPart": "yyy2"}}]
Split and Join
RETURN apoc.text.split('Hello World', ' +')
RETURN apoc.text.join(['Hello', 'World'], ' ')
Data Cleaning
RETURN apoc.text.clean('Hello World!')
true
RETURN apoc.text.compareCleaned('Hello World!', '_hello-world_')
UNWIND ['Hello World!', 'hello worlds'] as text
RETURN apoc.text.filterCleanMatches(text, 'hello_world') as text
The clean functionality can be useful for cleaning up slightly dirty text data with inconsistent formatting for non-exact comparisons.
Cleaning will strip the string of all non-alphanumeric characters (including spaces) and convert it to lower case.
Case Change Functions
capitalize
RETURN apoc.text.capitalize("neo4j") // "Neo4j"
capitalizeAll
RETURN apoc.text.capitalizeAll("graph database") // "Graph Database"
decapitalize
RETURN apoc.text.decapitalize("Graph Database") // "graph Database"
decapitalizeAll
RETURN apoc.text.decapitalizeAll("Graph Databases") // "graph databases"
swapCase
RETURN apoc.text.swapCase("Neo4j") // nEO4J
camelCase
RETURN apoc.text.camelCase("FOO_BAR"); // "fooBar"
RETURN apoc.text.camelCase("Foo bar"); // "fooBar"
RETURN apoc.text.camelCase("Foo22 bar"); // "foo22Bar"
RETURN apoc.text.camelCase("foo-bar"); // "fooBar"
RETURN apoc.text.camelCase("Foobar"); // "foobar"
RETURN apoc.text.camelCase("Foo$$Bar"); // "fooBar"
upperCamelCase
RETURN apoc.text.upperCamelCase("FOO_BAR"); // "FooBar"
RETURN apoc.text.upperCamelCase("Foo bar"); // "FooBar"
RETURN apoc.text.upperCamelCase("Foo22 bar"); // "Foo22Bar"
RETURN apoc.text.upperCamelCase("foo-bar"); // "FooBar"
RETURN apoc.text.upperCamelCase("Foobar"); // "Foobar"
RETURN apoc.text.upperCamelCase("Foo$$Bar"); // "FooBar"
snakeCase
RETURN apoc.text.snakeCase("test Snake Case"); // "test-snake-case"
RETURN apoc.text.snakeCase("FOO_BAR"); // "foo-bar"
RETURN apoc.text.snakeCase("Foo bar"); // "foo-bar"
RETURN apoc.text.snakeCase("fooBar"); // "foo-bar"
RETURN apoc.text.snakeCase("foo-bar"); // "foo-bar"
RETURN apoc.text.snakeCase("Foo bar"); // "foo-bar"
RETURN apoc.text.snakeCase("Foo bar"); // "foo-bar"
toUpperCase
RETURN apoc.text.toUpperCase("test upper case"); // "TEST_UPPER_CASE"
RETURN apoc.text.toUpperCase("FooBar"); // "FOO_BAR"
RETURN apoc.text.toUpperCase("fooBar"); // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo-bar"); // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo--bar"); // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo$$bar"); // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo 22 bar"); // "FOO_22_BAR"
Base64 De- and Encoding
Encode or decode a string in base64 or base64Url
RETURN apoc.text.base64Encode("neo4j") // bmVvNGo=
RETURN apoc.text.base64Decode("bmVvNGo=") // neo4j
RETURN apoc.text.base64UrlEncode("http://neo4j.com/?test=test") // aHR0cDovL25lbzRqLmNvbS8_dGVzdD10ZXN0
RETURN apoc.text.base64UrlDecode("aHR0cDovL25lbzRqLmNvbS8_dGVzdD10ZXN0") // http://neo4j.com/?test=test
Random String
You can generate a random string to a specified length by calling apoc.text.random
with a length parameter and optional string of valid characters.
The valid
parameter will accept the following regex patterns, alternatively you can provide a string of letters and/or characters.
|
Description |
|
A-Z in uppercase |
|
A-Z in lowercase |
|
Numbers 0-9 inclusive |
.
and $
characters.RETURN apoc.text.random(10, "A-Z0-9.$")