17.4. Deprecated: Full Text Search

These Fulltext-Index procedures are deprecated and will be removed from APOC soon. From version 3.5 Neo4j provides built-in, case-insensitive, configurable fulltext indices.

Indexes are used for finding nodes in the graph that further operations can then continue from. Just like in a book where you look at the index to find a section that interest you, and then start reading from there. A full text index allows you to find occurrences of individual words or phrases across all attributes.

In order to use the full text search feature, we have to first index our data by specifying all the attributes we want to index. Here we create a full text index called “locations” (we will use this name when searching in the index) with our data.

by default these fulltext indexes do not automatically track changes you perform in your graph. See …. for how to enabled automatic index tracking.

CALL apoc.index.addAllNodes('locations',{
  Company: ["name", "description"],
  Person:  ["name","address"],
  Address: ["address"]})

Creating the index will take a little while since the procedure has to read through the entire database to create the index.

We can now use this index to search for nodes in the database. The most simple case would be to search across all data for a particular word.

It does not matter which property that word exists in, any node that has that word in any of its indexed properties will be found.

If you use a name in the call, all occurrences will be found (but limited to 100 results).

CALL apoc.index.search("locations", 'name')

17.4.1. Advanced Search

We can further restrict our search to only searching in a particular attribute. In order to search for a Person with an address in France, we use the following.

CALL apoc.index.search("locations", "Person.address:France")

Now we can search for nodes with a specific property value, and then explore their neighbourhoods visually.

But integrating it with an graph query is so much more powerful.

17.4.2. Fulltext and Graph Search

We could for instance search for addresses in the database that contain the word "Paris", and then find all companies registered at those addresses:

CALL apoc.index.search("locations", "Address.address:Paris~") YIELD node AS addr
MATCH (addr)<-[:HAS_ADDRESS]-(company:Company)
RETURN company LIMIT 50

The tilde (~) instructs the index search procedure to do a fuzzy match, allowing us to find "Paris" even if the spelling is slightly off.

We might notice that there are addresses that contain the word “Paris” that are not in Paris, France. For example there might be a Paris Street somewhere.

We can further specify that we want the text to contain both the word Paris, and the word France:

CALL apoc.index.search("locations", "+Address.address:Paris~ +France~")
YIELD node AS addr
MATCH (addr)<-[:HAS_ADDRESS]-(company:Company)
RETURN company LIMIT 50

17.4.3. Complex Searches

Things start to get interesting when we look at how the different entities in Paris are connected to one another. We can do that by finding all the entities with addresses in Paris, then creating all pairs of such entities and finding the shortest path between each such pair:

CALL apoc.index.search("locations", "+Address.address:Paris~ +France~") YIELD node AS addr
MATCH (addr)<-[:HAS_ADDRESS]-(company:Company)
WITH collect(company) AS companies

// create unique pairs
UNWIND companies AS x UNWIND companies AS y
WITH x, y WHERE ID(x) < ID(y)

MATCH path = shortestPath((x)-[*..10]-(y))
RETURN path

For more details on the query syntax used in the second parameter of the search procedure, please see this Lucene query tutorial

17.4.3.1. Index Configuration

apoc.index.addAllNodes(<name>, <labelPropsMap>, <option>) allows to fine tune your indexes using the options parameter defaulting to an empty map. All standard options for Neo4j manual indexes are allowed plus apoc specific options:

name	value	description
`type`	`fulltext/exact`	type of the index
`to_lower_case`	`false/true`	if terms should be converted to lower case before indexing
`analyzer`	`classname`	classname of lucene analyzer to be used for this index
`similarity`	`classname`	classname for lucene similarity to be used for this index
`autoUpdate`	`true/false`	if this index should be tracked for graph updates

An index configuration cannot be changed once the index is created. However subsequent invocations of apoc.index.addAllNodes will delete the index if existing and create it afterwards.

17.4.4. Automatic Index Tracking for Manual Indexes

As mentioned above, apoc.index.addAllNodes() populates an fulltext index. But it does not track changes being made to the graph and reflect these changes to the index. You would have to rebuild that index regularly yourself.

Or alternatively use the automatic index tracking, that keeps the index in sync with your graph changes. To enable this feature a two step configuration approach is required.

Please note that there is a performance impact if you enable automatic index tracking.

in neo4j.conf set.

apoc.autoIndex.enabled=true

This global setting will initialize a transaction event handler to take care of reflecting changes of any added nodes, deleted nodes, changed properties to the indexes.

In addition to enable index tracking globally using apoc.autoIndex.enabled each individual index must be configured as "trackable" by setting autoUpdate:true in the options when initially creating an index:

CALL apoc.index.addAllNodes('locations',{
  Company: ["name", "description"],
  Person:  ["name","address"],
  Address: ["address"]}, {autoUpdate:true})

By default index tracking is done synchronously. That means updates to fulltext indexes are part of same transaction as the originating change (e.g. changing a node property). While this guarantees instant consistency it has an impact on performance.

Alternatively, you can decide to perform index updates asynchronously in a separate thread by setting this flag in neo4j.conf

apoc.autoIndex.async=true

With this setting enabled, index updates are fed to a buffer queue that is consumed asynchronously using transaction batches. The batching can be further configured using

apoc.autoIndex.queue_capacity=100000
apoc.autoIndex.async_rollover_opscount=50000
apoc.autoIndex.async_rollover_millis=5000
apoc.autoIndex.tx_handler_stopwatch=false

The values above are the default setting. In this example the index updates are consumed in transactions of maximum 50000 operations or 5000 milliseconds - whichever triggers first will cause the index update transaction to be committed and rolled over.

If apoc.autoIndex.tx_handler_stopwatch is enabled, the time spent in beforeCommit and afterCommit is traced to debug.log. Use this setting only for diagnosis.

17.4.4.1. A Worked Example on Fulltext Index Tracking

This section provides a small but still usable example to understand automatic index updates.

Make sure apoc.autoIndex.enabled=true is set. First we create some nodes - note there’s no index yet.

UNWIND ["Johnny Walker", "Jim Beam", "Jack Daniels"] as name CREATE (:Person{name:name})

Now we index them:

CALL apoc.index.addAllNodes('people', { Person:["name"]}, {autoUpdate:true})

Check if we can find "Johnny" - we expect one result.

CALL apoc.index.search("people", "Johnny") YIELD node, weight
RETURN node.name, weight

Adding some more people - note, we have another "Johnny":

UNWIND ["Johnny Rotten", "Axel Rose"] as name CREATE (:Person{name:name})

Again we’re search for "Johnny", expecting now two of them:

CALL apoc.index.search("people", "Johnny") YIELD node, weight
RETURN node.name, weight

17.4.4.2. Fulltext index count

Accompanying UserFunctions that just return counts for nodes and relationships manual index

apoc.index.nodes.count('Label','prop:value*') YIELD value

apoc.index.relationships.count('TYPE','prop:value*') YIELD value

apoc.index.between.count(node1,'TYPE',node2,'prop:value*') YIELD value

apoc.index.out.count(node,'TYPE','prop:value*') YIELD value

apoc.index.in.count(node,'TYPE','prop:value*') YIELD value

Some example with the userFunctions describe above

First we create this set of data:

CREATE (joe:Person:Hipster {name:'Joe',age:42})-[checkin:CHECKIN {on:'2015-12-01'}]->(philz:Place {name:'Philz'})
MATCH (p:Person) CALL apoc.index.addNode(p, ["name"]) RETURN count(*)
MATCH (p:Place) CALL apoc.index.addNode(p, ["name"]) RETURN count(*)
MATCH (person:Person)-[check:CHECKIN]->(place:Place) CALL apoc.index.addRelationship(check, ["on"]) RETURN count(*)

We call the apoc.index.nodes.count function as follow:

RETURN apoc.index.nodes.count('Person', 'name:Jo*') AS value

The result is:

We call the apoc.index.relationships.count function as follow:

RETURN apoc.index.relationships.count('CHECKIN', 'on:2015-*') as value

The result is:

We call the apoc.index.between.count function as follow:

MATCH (joe:Person:Hipster {name:'Joe',age:42}),(philz:Place {name:'Philz'}) WITH joe,philz RETURN apoc.index.between.count(joe, 'CHECKIN', philz, 'on:2015-*') as value

The result is:

We call the apoc.index.in.count function as follow:

MATCH (philz:Place {name:'Philz'}) WITH philz RETURN apoc.index.in.count(philz, 'CHECKIN','on:2015-*') as value

The result is:

We call the apoc.index.out.count function as follow:

MATCH (joe:Person:Hipster {name:'Joe',age:42}) WITH joe RETURN apoc.index.out.count(joe, 'CHECKIN','on:2015-*') as value

The result is: