10.4. Text Functions

Cypher has some basic functions to work with text like:

A lot of useful functions for string manipulation, comparison, filtering are missing though. APOC tries to add many of them.

10.4.1. Overview Text Functions

apoc.text.indexOf(text, lookup, offset=0, to=-1==len)

find the first occurence of the lookup string in the text, from inclusive, to exclusive,, -1 if not found, null if text is null.

apoc.text.indexesOf(text, lookup, from=0, to=-1==len)

finds all occurences of the lookup string in the text, return list, from inclusive, to exclusive, empty list if not found, null if text is null.

apoc.text.replace(text, regex, replacement)

replace each substring of the given string that matches the given regular expression with the given replacement.

apoc.text.regexGroups(text, regex)

returns an array containing a nested array for each match. The inner array contains all match groups.

apoc.text.join(['text1','text2',…​], delimiter)

join the given strings with the given delimiter.

apoc.text.format(text,[params],language)

sprintf format the string with the params given, and optional param language (default value is 'en').

apoc.text.lpad(text,count,delim)

left pad the string to the given width

apoc.text.rpad(text,count,delim)

right pad the string to the given width

apoc.text.random(length, [valid])

returns a random string to the specified length

apoc.text.capitalize(text)

capitalise the first letter of the word

apoc.text.capitalizeAll(text)

capitalise the first letter of every word in the text

apoc.text.decapitalize(text)

decapitalize the first letter of the word

apoc.text.decapitalizeAll(text)

decapitalize the first letter of all words

apoc.text.swapCase(text)

Swap the case of a string

apoc.text.camelCase(text)

Convert a string to camelCase

apoc.text.upperCamelCase(text)

Convert a string to UpperCamelCase

apoc.text.snakeCase(text)

Convert a string to snake-case

apoc.text.toUpperCase(text)

Convert a string to UPPER_CASE

apoc.text.charAt(text, index)

Returns the decimal value of the character at the given index

apoc.text.code(codepoint)

Returns the unicode character of the given codepoint

apoc.text.hexCharAt(text, index)

Returns the hex value string of the character at the given index

apoc.text.hexValue(value)

Returns the hex value string of the given value

apoc.text.byteCount(text,[charset])

return size of text in bytes

apoc.text.bytes(text,[charset]) - return bytes of the text

apoc.text.toCypher(value, {skipKeys,keepKeys,skipValues,keepValues,skipNull,node,relationship,start,end})

tries it’s best to convert the value to a cypher-property-string

apoc.text.base64Encode(text) - Encode a string with Base64

apoc.text.base64Decode(text) - Decode Base64 encoded string

apoc.text.base64UrlEncode(url) - Encode a url with Base64

The replace, split and regexGroups functions work with regular expressions.

10.4.2. Data Extraction

apoc.data.url('url') as {protocol,user,host,port,path,query,file,anchor}

turn URL into map structure

apoc.data.email('email_address') as {personal,user,domain}

extract the personal name, user and domain as a map (needs javax.mail jar)

apoc.data.domain(email_or_url)

deprecated returns domain part of the value

10.4.3. Text Similarity Functions

apoc.text.distance(text1, text2)

compare the given strings with the Levenshtein distance algorithm

apoc.text.levenshteinDistance(text1, text2)

compare the given strings with the Levenshtein distance algorithm

apoc.text.levenshteinSimilarity(text1, text2)

calculate the similarity (a value within 0 and 1) between two texts based on Levenshtein distance.

apoc.text.hammingDistance(text1, text2)

compare the given strings with the Hamming distance algorithm

apoc.text.jaroWinklerDistance(text1, text2)

compare the given strings with the Jaro-Winkler distance algorithm

apoc.text.sorensenDiceSimilarity(text1, text2)

compare the given strings with the Sørensen–Dice coefficient formula, assuming an English locale

apoc.text.sorensenDiceSimilarityWithLanguage(text1, text2, languageTag)

compare the given strings with the Sørensen–Dice coefficient formula, with the provided IETF language tag

apoc.text.fuzzyMatch(text1, text2)

check if 2 words can be matched in a fuzzy way. Depending on the length of the String it will allow more characters that needs to be edited to match the second String.

10.4.4. Phonetic Comparison Functions

apoc.text.phonetic(value)

Compute the US_ENGLISH phonetic soundex encoding of all words of the text value which can be a single string or a list of strings

apoc.text.doubleMetaphone(value)

Compute the Double Metaphone phonetic encoding of all words of the text value which can be a single string or a list of strings

apoc.text.clean(text)

strip the given string of everything except alpha numeric characters and convert it to lower case.

apoc.text.compareCleaned(text1, text2)

compare the given strings stripped of everything except alpha numeric characters converted to lower case.

Table 10.1. Procedure

apoc.text.phoneticDelta(text1, text2) yield phonetic1, phonetic2, delta

Compute the US_ENGLISH soundex character difference between two given strings

10.4.5. Formatting Text

Format the string with the params given, and optional param language.

without language param ('en' default). 

RETURN apoc.text.format('ab%s %d %.1f %s%n',['cd',42,3.14,true]) AS value // abcd 42 3.1 true

with language param. 

RETURN apoc.text.format('ab%s %d %.1f %s%n',['cd',42,3.14,true],'it') AS value // abcd 42 3,1 true

10.4.7. Regular Expressions

will return 'HelloWorld'. 

RETURN apoc.text.replace('Hello World!', '[^a-zA-Z]', '')

RETURN apoc.text.regexGroups('abc <link xxx1>yyy1</link> def <link xxx2>yyy2</link>','<link (\\w+)>(\\w+)</link>') AS result

// [["<link xxx1>yyy1</link>", "xxx1", "yyy1"], ["<link xxx2>yyy2</link>", "xxx2", "yyy2"]]

10.4.8. Split and Join

will split with the given regular expression return ['Hello', 'World']. 

RETURN apoc.text.split('Hello   World', ' +')

will return 'Hello World'. 

RETURN apoc.text.join(['Hello', 'World'], ' ')

10.4.9. Data Cleaning

will return 'helloworld'. 

RETURN apoc.text.clean('Hello World!')

will return true

RETURN apoc.text.compareCleaned('Hello World!', '_hello-world_')

will return only 'Hello World!'. 

UNWIND ['Hello World!', 'hello worlds'] as text
RETURN apoc.text.filterCleanMatches(text, 'hello_world') as text

The clean functionality can be useful for cleaning up slightly dirty text data with inconsistent formatting for non-exact comparisons.

Cleaning will strip the string of all non-alphanumeric characters (including spaces) and convert it to lower case.

10.4.10. Case Change Functions

10.4.10.1. Capitalise the first letter of the word with capitalize

RETURN apoc.text.capitalize("neo4j") // "Neo4j"

10.4.10.2. Capitalise the first letter of every word in the text with capitalizeAll

RETURN apoc.text.capitalizeAll("graph database") // "Graph Database"

10.4.10.3. Decapitalize the first letter of the string with decapitalize

RETURN apoc.text.decapitalize("Graph Database") // "graph Database"

10.4.10.4. Decapitalize the first letter of all words with decapitalizeAll

RETURN apoc.text.decapitalizeAll("Graph Databases") // "graph databases"

10.4.10.5. Swap the case of a string with swapCase

RETURN apoc.text.swapCase("Neo4j") // nEO4J

10.4.10.6. Convert a string to lower camelCase with camelCase

RETURN apoc.text.camelCase("FOO_BAR");    // "fooBar"
RETURN apoc.text.camelCase("Foo bar");    // "fooBar"
RETURN apoc.text.camelCase("Foo22 bar");  // "foo22Bar"
RETURN apoc.text.camelCase("foo-bar");    // "fooBar"
RETURN apoc.text.camelCase("Foobar");     // "foobar"
RETURN apoc.text.camelCase("Foo$$Bar");   // "fooBar"

10.4.10.7. Convert a string to UpperCamelCase with upperCamelCase

RETURN apoc.text.upperCamelCase("FOO_BAR");   // "FooBar"
RETURN apoc.text.upperCamelCase("Foo bar");   // "FooBar"
RETURN apoc.text.upperCamelCase("Foo22 bar"); // "Foo22Bar"
RETURN apoc.text.upperCamelCase("foo-bar");   // "FooBar"
RETURN apoc.text.upperCamelCase("Foobar");    // "Foobar"
RETURN apoc.text.upperCamelCase("Foo$$Bar");  // "FooBar"

10.4.10.8. Convert a string to snake-case with snakeCase

RETURN apoc.text.snakeCase("test Snake Case"); // "test-snake-case"
RETURN apoc.text.snakeCase("FOO_BAR");         // "foo-bar"
RETURN apoc.text.snakeCase("Foo bar");         // "foo-bar"
RETURN apoc.text.snakeCase("fooBar");          // "foo-bar"
RETURN apoc.text.snakeCase("foo-bar");         // "foo-bar"
RETURN apoc.text.snakeCase("Foo bar");         // "foo-bar"
RETURN apoc.text.snakeCase("Foo  bar");        // "foo-bar"

10.4.10.9. Convert a string to UPPER_CASE with `toUpperCase

RETURN apoc.text.toUpperCase("test upper case"); // "TEST_UPPER_CASE"
RETURN apoc.text.toUpperCase("FooBar");          // "FOO_BAR"
RETURN apoc.text.toUpperCase("fooBar");          // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo-bar");         // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo--bar");        // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo$$bar");        // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo 22 bar");      // "FOO_22_BAR"

10.4.11. Base64 De- and Encoding

Encode or decode a string in base64 or base64Url

EncodeBase64. 

RETURN apoc.text.base64Encode("neo4j") // bmVvNGo=

DecodeBase64. 

RETURN apoc.text.base64Decode("bmVvNGo=") // neo4j

EncodeBase64Url. 

RETURN apoc.text.base64EncodeUrl("http://neo4j.com/?test=test") // aHR0cDovL25lbzRqLmNvbS8_dGVzdD10ZXN0

DecodeBase64Url. 

RETURN apoc.text.base64DecodeUrl("aHR0cDovL25lbzRqLmNvbS8_dGVzdD10ZXN0") // http://neo4j.com/?test=test

10.4.12. Random String

You can generate a random string to a specified length by calling apoc.text.random with a length parameter and optional string of valid characters.

The valid parameter will accept the following regex patterns, alternatively you can provide a string of letters and/or characters.

Pattern

Description

A-Z

A-Z in uppercase

a-z

A-Z in lowercase

0-9

Numbers 0-9 inclusive

The following call will return a random string including uppercase letters, numbers and . and $ characters.

RETURN apoc.text.random(10, "A-Z0-9.$")

10.4.13. Text Similarity Functions

10.4.13.1. Compare the strings with the Levenshtein distance

Compare the given strings with the StringUtils.distance(text1, text2) method (Levenshtein).

RETURN apoc.text.distance("Levenshtein", "Levenstein") // 1

10.4.13.2. Compare the given strings with the Sørensen–Dice coefficient formula.

computes the similarity assuming Locale.ENGLISH. 

RETURN apoc.text.sorensenDiceSimilarity("belly", "jolly") // 0.5

computes the similarity with an explicit locale. 

RETURN apoc.text.sorensenDiceSimilarityWithLanguage("halım", "halim", "tr-TR") // 0.5

10.4.13.3. Check if 2 words can be matched in a fuzzy way with fuzzyMatch

Depending on the length of the String it will allow more characters that needs to be edited to match the second String.

RETURN apoc.text.fuzzyMatch("The", "the") // true