Cypher has some basic functions to work with text like:
split(string, delim)
toLower
and toUpper
+
CONTAINS, STARTS WITH, ENDS WITH
and regular expression matches via =~
A lot of useful functions for string manipulation, comparison, filtering are missing though. APOC tries to add many of them.
|
find the first occurence of the lookup string in the text, from inclusive, to exclusive,, -1 if not found, null if text is null. |
|
finds all occurences of the lookup string in the text, return list, from inclusive, to exclusive, empty list if not found, null if text is null. |
|
replace each substring of the given string that matches the given regular expression with the given replacement. |
|
returns an array containing a nested array for each match. The inner array contains all match groups. |
|
join the given strings with the given delimiter. |
|
sprintf format the string with the params given, and optional param language (default value is 'en'). |
|
left pad the string to the given width |
|
right pad the string to the given width |
|
returns a random string to the specified length |
|
capitalise the first letter of the word |
|
capitalise the first letter of every word in the text |
|
decapitalize the first letter of the word |
|
decapitalize the first letter of all words |
|
Swap the case of a string |
|
Convert a string to camelCase |
|
Convert a string to UpperCamelCase |
|
Convert a string to snake-case |
|
Convert a string to UPPER_CASE |
|
Returns the decimal value of the character at the given index |
|
Returns the unicode character of the given codepoint |
|
Returns the hex value string of the character at the given index |
|
Returns the hex value string of the given value |
|
return size of text in bytes |
|
apoc.text.toCypher(value, {skipKeys,keepKeys,skipValues,keepValues,skipNull,node,relationship,start,end}) |
|
apoc.text.base64Encode(text) - Encode a string with Base64 |
|
apoc.text.base64UrlEncode(url) - Encode a url with Base64 |
The replace
, split
and regexGroups
functions work with regular expressions.
|
turn URL into map structure |
|
extract the personal name, user and domain as a map (needs javax.mail jar) |
|
deprecated returns domain part of the value |
|
compare the given strings with the Levenshtein distance algorithm |
|
compare the given strings with the Levenshtein distance algorithm |
|
calculate the similarity (a value within 0 and 1) between two texts based on Levenshtein distance. |
|
compare the given strings with the Hamming distance algorithm |
|
compare the given strings with the Jaro-Winkler distance algorithm |
|
compare the given strings with the Sørensen–Dice coefficient formula, assuming an English locale |
|
compare the given strings with the Sørensen–Dice coefficient formula, with the provided IETF language tag |
|
check if 2 words can be matched in a fuzzy way. Depending on the length of the String it will allow more characters that needs to be edited to match the second String. |
|
Compute the US_ENGLISH phonetic soundex encoding of all words of the text value which can be a single string or a list of strings |
|
Compute the Double Metaphone phonetic encoding of all words of the text value which can be a single string or a list of strings |
|
strip the given string of everything except alpha numeric characters and convert it to lower case. |
|
compare the given strings stripped of everything except alpha numeric characters converted to lower case. |
|
Compute the US_ENGLISH soundex character difference between two given strings |
Format the string with the params given, and optional param language.
without language param ('en' default).
RETURN apoc.text.format('ab%s %d %.1f %s%n',['cd',42,3.14,true]) AS value // abcd 42 3.1 true
with language param.
RETURN apoc.text.format('ab%s %d %.1f %s%n',['cd',42,3.14,true],'it') AS value // abcd 42 3,1 true
The indexOf
function, provides the fist occurrence of the given lookup
string within the text
, or -1 if not found.
It can optionally take from
(inclusive) and to
(exclusive) parameters.
RETURN apoc.text.indexOf('Hello World!', 'World') // 6
The indexesOf
function, provides all occurrences of the given lookup string within the text, or empty list if not found.
It can optionally take from
(inclusive) and to
(exclusive) parameters.
RETURN apoc.text.indexesOf('Hello World!', 'o',2,9) // [4,7]
If you want to get a substring starting from your index match, you can use this
returns World!
WITH 'Hello World!' as text, length(text) as len
WITH text, len, apoc.text.indexOf(text, 'World',3) as index
RETURN substring(text, case index when -1 then len-1 else index end, len);
will return 'HelloWorld'.
RETURN apoc.text.replace('Hello World!', '[^a-zA-Z]', '')
RETURN apoc.text.regexGroups('abc <link xxx1>yyy1</link> def <link xxx2>yyy2</link>','<link (\\w+)>(\\w+)</link>') AS result
// [["<link xxx1>yyy1</link>", "xxx1", "yyy1"], ["<link xxx2>yyy2</link>", "xxx2", "yyy2"]]
will split with the given regular expression return ['Hello', 'World'].
RETURN apoc.text.split('Hello World', ' +')
will return 'Hello World'.
RETURN apoc.text.join(['Hello', 'World'], ' ')
will return 'helloworld'.
RETURN apoc.text.clean('Hello World!')
will return true
.
RETURN apoc.text.compareCleaned('Hello World!', '_hello-world_')
will return only 'Hello World!'.
UNWIND ['Hello World!', 'hello worlds'] as text
RETURN apoc.text.filterCleanMatches(text, 'hello_world') as text
The clean functionality can be useful for cleaning up slightly dirty text data with inconsistent formatting for non-exact comparisons.
Cleaning will strip the string of all non-alphanumeric characters (including spaces) and convert it to lower case.
RETURN apoc.text.capitalize("neo4j") // "Neo4j"
RETURN apoc.text.capitalizeAll("graph database") // "Graph Database"
RETURN apoc.text.decapitalize("Graph Database") // "graph Database"
RETURN apoc.text.decapitalizeAll("Graph Databases") // "graph databases"
RETURN apoc.text.swapCase("Neo4j") // nEO4J
RETURN apoc.text.camelCase("FOO_BAR"); // "fooBar"
RETURN apoc.text.camelCase("Foo bar"); // "fooBar"
RETURN apoc.text.camelCase("Foo22 bar"); // "foo22Bar"
RETURN apoc.text.camelCase("foo-bar"); // "fooBar"
RETURN apoc.text.camelCase("Foobar"); // "foobar"
RETURN apoc.text.camelCase("Foo$$Bar"); // "fooBar"
RETURN apoc.text.upperCamelCase("FOO_BAR"); // "FooBar"
RETURN apoc.text.upperCamelCase("Foo bar"); // "FooBar"
RETURN apoc.text.upperCamelCase("Foo22 bar"); // "Foo22Bar"
RETURN apoc.text.upperCamelCase("foo-bar"); // "FooBar"
RETURN apoc.text.upperCamelCase("Foobar"); // "Foobar"
RETURN apoc.text.upperCamelCase("Foo$$Bar"); // "FooBar"
RETURN apoc.text.snakeCase("test Snake Case"); // "test-snake-case"
RETURN apoc.text.snakeCase("FOO_BAR"); // "foo-bar"
RETURN apoc.text.snakeCase("Foo bar"); // "foo-bar"
RETURN apoc.text.snakeCase("fooBar"); // "foo-bar"
RETURN apoc.text.snakeCase("foo-bar"); // "foo-bar"
RETURN apoc.text.snakeCase("Foo bar"); // "foo-bar"
RETURN apoc.text.snakeCase("Foo bar"); // "foo-bar"
RETURN apoc.text.toUpperCase("test upper case"); // "TEST_UPPER_CASE"
RETURN apoc.text.toUpperCase("FooBar"); // "FOO_BAR"
RETURN apoc.text.toUpperCase("fooBar"); // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo-bar"); // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo--bar"); // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo$$bar"); // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo 22 bar"); // "FOO_22_BAR"
Encode or decode a string in base64 or base64Url
EncodeBase64.
RETURN apoc.text.base64Encode("neo4j") // bmVvNGo=
DecodeBase64.
RETURN apoc.text.base64Decode("bmVvNGo=") // neo4j
EncodeBase64Url.
RETURN apoc.text.base64EncodeUrl("http://neo4j.com/?test=test") // aHR0cDovL25lbzRqLmNvbS8_dGVzdD10ZXN0
DecodeBase64Url.
RETURN apoc.text.base64DecodeUrl("aHR0cDovL25lbzRqLmNvbS8_dGVzdD10ZXN0") // http://neo4j.com/?test=test
You can generate a random string to a specified length by calling apoc.text.random
with a length parameter and optional string of valid characters.
The valid
parameter will accept the following regex patterns, alternatively you can provide a string of letters and/or characters.
|
Description |
|
A-Z in uppercase |
|
A-Z in lowercase |
|
Numbers 0-9 inclusive |
The following call will return a random string including uppercase letters, numbers and .
and $
characters.
RETURN apoc.text.random(10, "A-Z0-9.$")
Compare the given strings with the StringUtils.distance(text1, text2)
method (Levenshtein).
RETURN apoc.text.distance("Levenshtein", "Levenstein") // 1
computes the similarity assuming Locale.ENGLISH.
RETURN apoc.text.sorensenDiceSimilarity("belly", "jolly") // 0.5
computes the similarity with an explicit locale.
RETURN apoc.text.sorensenDiceSimilarityWithLanguage("halım", "halim", "tr-TR") // 0.5
Depending on the length of the String it will allow more characters that needs to be edited to match the second String.
RETURN apoc.text.fuzzyMatch("The", "the") // true