These Fulltext-Index procedures are deprecated and will be removed from APOC soon. From version 3.5 Neo4j provides built-in, case-insensitive, configurable fulltext indices. |
Indexes are used for finding nodes in the graph that further operations can then continue from. Just like in a book where you look at the index to find a section that interest you, and then start reading from there. A full text index allows you to find occurrences of individual words or phrases across all attributes.
In order to use the full text search feature, we have to first index our data by specifying all the attributes we want to
index.
Here we create a full text index called “locations”
(we will use this name when searching in the index) with our data.
by default these fulltext indexes do not automatically track changes you perform in your graph. See …. for how to enabled automatic index tracking. |
CALL apoc.index.addAllNodes('locations',{
Company: ["name", "description"],
Person: ["name","address"],
Address: ["address"]})
Creating the index will take a little while since the procedure has to read through the entire database to create the index.
We can now use this index to search for nodes in the database. The most simple case would be to search across all data for a particular word.
It does not matter which property that word exists in, any node that has that word in any of its indexed properties will be found.
If you use a name in the call, all occurrences will be found (but limited to 100 results).
CALL apoc.index.search("locations", 'name')
We can further restrict our search to only searching in a particular attribute.
In order to search for a Person
with an address in France, we use the following.
CALL apoc.index.search("locations", "Person.address:France")
Now we can search for nodes with a specific property value, and then explore their neighbourhoods visually.
But integrating it with an graph query is so much more powerful.
We could for instance search for addresses in the database that contain the word "Paris", and then find all companies registered at those addresses:
CALL apoc.index.search("locations", "Address.address:Paris~") YIELD node AS addr
MATCH (addr)<-[:HAS_ADDRESS]-(company:Company)
RETURN company LIMIT 50
The tilde (~) instructs the index search procedure to do a fuzzy match, allowing us to find "Paris" even if the spelling is slightly off.
We might notice that there are addresses that contain the word “Paris” that are not in Paris, France. For example there might be a Paris Street somewhere.
We can further specify that we want the text to contain both the word Paris, and the word France:
CALL apoc.index.search("locations", "+Address.address:Paris~ +France~")
YIELD node AS addr
MATCH (addr)<-[:HAS_ADDRESS]-(company:Company)
RETURN company LIMIT 50
Things start to get interesting when we look at how the different entities in Paris are connected to one another. We can do that by finding all the entities with addresses in Paris, then creating all pairs of such entities and finding the shortest path between each such pair:
CALL apoc.index.search("locations", "+Address.address:Paris~ +France~") YIELD node AS addr
MATCH (addr)<-[:HAS_ADDRESS]-(company:Company)
WITH collect(company) AS companies
// create unique pairs
UNWIND companies AS x UNWIND companies AS y
WITH x, y WHERE ID(x) < ID(y)
MATCH path = shortestPath((x)-[*..10]-(y))
RETURN path
For more details on the query syntax used in the second parameter of the search
procedure,
please see this Lucene query tutorial
apoc.index.addAllNodes(<name>, <labelPropsMap>, <option>)
allows to fine tune your indexes using the options parameter defaulting to an empty map.
All standard options for Neo4j manual indexes are allowed plus apoc specific options:
name | value | description |
---|---|---|
|
|
type of the index |
|
|
if terms should be converted to lower case before indexing |
|
|
classname of lucene analyzer to be used for this index |
|
|
classname for lucene similarity to be used for this index |
|
|
if this index should be tracked for graph updates |
An index configuration cannot be changed once the index is created.
However subsequent invocations of |
As mentioned above, apoc.index.addAllNodes()
populates an fulltext index.
But it does not track changes being made to the graph and reflect these changes to the index.
You would have to rebuild that index regularly yourself.
Or alternatively use the automatic index tracking, that keeps the index in sync with your graph changes. To enable this feature a two step configuration approach is required.
Please note that there is a performance impact if you enable automatic index tracking. |
in neo4j.conf
set.
apoc.autoIndex.enabled=true
This global setting will initialize a transaction event handler to take care of reflecting changes of any added nodes, deleted nodes, changed properties to the indexes.
In addition to enable index tracking globally using apoc.autoIndex.enabled
each individual index must be configured as "trackable" by setting autoUpdate:true
in the options when initially creating an index:
CALL apoc.index.addAllNodes('locations',{
Company: ["name", "description"],
Person: ["name","address"],
Address: ["address"]}, {autoUpdate:true})
By default index tracking is done synchronously. That means updates to fulltext indexes are part of same transaction as the originating change (e.g. changing a node property). While this guarantees instant consistency it has an impact on performance.
Alternatively, you can decide to perform index updates asynchronously in a separate thread by setting this flag in neo4j.conf
apoc.autoIndex.async=true
With this setting enabled, index updates are fed to a buffer queue that is consumed asynchronously using transaction batches. The batching can be further configured using
apoc.autoIndex.queue_capacity=100000
apoc.autoIndex.async_rollover_opscount=50000
apoc.autoIndex.async_rollover_millis=5000
apoc.autoIndex.tx_handler_stopwatch=false
The values above are the default setting. In this example the index updates are consumed in transactions of maximum 50000 operations or 5000 milliseconds - whichever triggers first will cause the index update transaction to be committed and rolled over.
If apoc.autoIndex.tx_handler_stopwatch
is enabled, the time spent in beforeCommit
and afterCommit
is traced to debug.log
.
Use this setting only for diagnosis.
This section provides a small but still usable example to understand automatic index updates.
Make sure apoc.autoIndex.enabled=true
is set.
First we create some nodes - note there’s no index yet.
UNWIND ["Johnny Walker", "Jim Beam", "Jack Daniels"] as name CREATE (:Person{name:name})
Now we index them:
CALL apoc.index.addAllNodes('people', { Person:["name"]}, {autoUpdate:true})
Check if we can find "Johnny" - we expect one result.
CALL apoc.index.search("people", "Johnny") YIELD node, weight
RETURN node.name, weight
Adding some more people - note, we have another "Johnny":
UNWIND ["Johnny Rotten", "Axel Rose"] as name CREATE (:Person{name:name})
Again we’re search for "Johnny", expecting now two of them:
CALL apoc.index.search("people", "Johnny") YIELD node, weight
RETURN node.name, weight
Accompanying UserFunctions that just return counts for nodes and relationships manual index
apoc.index.nodes.count('Label','prop:value*') YIELD value
apoc.index.relationships.count('TYPE','prop:value*') YIELD value
apoc.index.between.count(node1,'TYPE',node2,'prop:value*') YIELD value
apoc.index.out.count(node,'TYPE','prop:value*') YIELD value
apoc.index.in.count(node,'TYPE','prop:value*') YIELD value
First we create this set of data:
CREATE (joe:Person:Hipster {name:'Joe',age:42})-[checkin:CHECKIN {on:'2015-12-01'}]->(philz:Place {name:'Philz'})
MATCH (p:Person) CALL apoc.index.addNode(p, ["name"]) RETURN count(*)
MATCH (p:Place) CALL apoc.index.addNode(p, ["name"]) RETURN count(*)
MATCH (person:Person)-[check:CHECKIN]->(place:Place) CALL apoc.index.addRelationship(check, ["on"]) RETURN count(*)
RETURN apoc.index.nodes.count('Person', 'name:Jo*') AS value
The result is:
RETURN apoc.index.relationships.count('CHECKIN', 'on:2015-*') as value
The result is:
MATCH (joe:Person:Hipster {name:'Joe',age:42}),(philz:Place {name:'Philz'}) WITH joe,philz RETURN apoc.index.between.count(joe, 'CHECKIN', philz, 'on:2015-*') as value
The result is:
MATCH (philz:Place {name:'Philz'}) WITH philz RETURN apoc.index.in.count(philz, 'CHECKIN','on:2015-*') as value
The result is:
MATCH (joe:Person:Hipster {name:'Joe',age:42}) WITH joe RETURN apoc.index.out.count(joe, 'CHECKIN','on:2015-*') as value
The result is: