Hot questions for Using Neo4j in indexing

Question:

As per new indexing rules, the auto_index will go away in future and its expected to create indexes using cypher. According to this new way, to index a node property, you MUST provide a Node Label.

I have a 'nodeId' property present on all types of Node Labels - User, Employee, Bank, Car, etc. I used to auto-index this property to retrieve any type of node if its nodeId is known. Please note that since auto-index did not require me to give a Node Label, it was possible for me to do what I did.

ReadableIndex<Node> readableIndex = this.graphDatabaseService.index().getNodeAutoIndexer().getAutoIndex();
readableIndex.get("nodeId", "0").getSingle();

But with new style, I have to create index on nodeId property for each and every Node Label. So I have to do this:

create index on :User(nodeId)
create index on :Employee(nodeId)
...

Moreover, my method getByNodeId(String nodeId) is useless now because this cypher query IMHO will not be able to use the index anymore since I am not passing any node label.

match (node) where node.nodeId = {nodeId} return node;

Since the whole point of my getByNodeId() method was to be generic across all nodes, I cannot give this cypher query a node label. So what should I do here. My 2 questions are:

  • How do I tell neo4j via cypher to index on all node labels
  • How do I write a cypher query which uses index not based on node label, but based on node property.

Note:

  • It is essential for me to use cypher because I am using neo4j-jdbc and they have no method to create auto-index or access the auto-indexer (atleast not that I know of).

  • Some might suggest me to change the neo4j.properties to enable auto-indexing there, but I dont like changing configuration files. I want to do it in my program. Anyway, that would have only solved the first issue. Second issue is still there.


Answer:

A node can have multiple labels.

Thus, if you make all your nodes share a common label, say Base (in addition to whatever labels they currently have), you can just have a single index that covers all your nodes:

CREATE INDEX ON :Base(nodeId)

Question:

I want to auto index on a property of Particular node. According to this documentation, we have to provide property name and that will get indexed. http://neo4j.com/docs/stable/auto-indexing.html

I have two nodes node1 and node2 and both have same property as name. I want to index only on property name on node node1 not on node2. How do I do this at runtime and not using CYPHER ?

Any help would be appreciated.


Answer:

I found the neo4j documentation for creating schema indexes on property of Particular node using java code

Adding code here. This might help someone like me :)

IndexDefinition indexDefinition;
try ( Transaction tx = graphDb.beginTx() ){
Schema schema = graphDb.schema();
indexDefinition = schema.indexFor( DynamicLabel.label( "node1" ) )
        .on( "name" )
        .create();
tx.success();
}

http://neo4j.com/docs/stable/tutorials-java-embedded-new-index.html

Question:

I am experiencing a problem with Neo4j, where the directory graph.db/index/ increases dramatically in size due to many large lucene.log.vXXX files being produced. This happens for a computation which does not use indexing at all, but simply adds numerical properties to some nodes in the network.

The problem is reproducible for versions 2.1.3, 2.1.7, and 2.2.0 on two different 64-bit computers running Ubuntu Linux (14.04.1 and 14.04.2).

My database:

  • 16’636’351 nodes with 4 properties: id (string), name (string), country code (string), and type (string).
  • 14’724’489 weighted links.

This results in a graph.db directory of 11 GB. The directory graph.db/index/ is 2.4 GB large.

I use Neo4j embedded in Java and always instantiate as follows:

        String i1 = "id";
        String i2 = "name";
        String i3 = "country";
        String i4 = "type";
        String myIndeables = i1 + "," + i2 + "," + i3 + "," + i4;
        GraphDatabaseService gdbs = new GraphDatabaseFactory().newEmbeddedDatabaseBuilder(cfg.dbPath).
                setConfig(GraphDatabaseSettings.node_keys_indexable, myIndeables).
                setConfig(GraphDatabaseSettings.node_auto_indexing, "true").
                setConfig(GraphDatabaseSettings.relationshipstore_mapped_memory_size, "12G").
                ...
                newGraphDatabase();

This way was also used to create (i.e., import) the original 11 GB database.

So far so good.

Now I perform a computation on the database. Ignoring the details, an algorithm calculates a kind of centrality measure for all the nodes in the largest connected component of the network (6’118’740 nodes).

The problem: Simply adding these newly computed numbers as a property to the 6’118’740 nodes (out of the total of 16’636’351) results in the database exploding to 249 GB with a 243 GB graph.db/index/ directory (due to the the lucene.log.vXXX files)!!!

However, if I instantiate as follows without indexing...

        gdbs = new GraphDatabaseFactory().newEmbeddedDatabaseBuilder(cfg.dbPath).
                setConfig(GraphDatabaseSettings.relationshipstore_mapped_memory_size, "12G").
                ...
                newGraphDatabase();

...the result is a database size of 6.9 GB (recall the original was 11 GB!), of which now only 2.2 GB are used for graph.db/index/!!!

What is happening here?

PS Additional information:

  • Java versions: Java(TM) SE Runtime Environment (build 1.7.0_76-b13) and OpenJDK Runtime Environment (IcedTea 2.5.4) (7u75-2.5.4-1~trusty1)
  • The jar file was exported from Eclipse.
  • The logs don't give any clues when going from the 11 GB database to the 249 GB version.

Answer:

By default Neo4j keeps the logical logs for 7 days (older versions have a different value). Since you have auto indexing enabled any update to a node might cause a index update - which might be empty if you only change non-indexed properties.

To prevent this shut down the database, make a backup copy and delete the lucene.log.vXXX files. In your startup code amend keep_logical_logs=false as a config option.

Question:

For certain use cases, e.g. with NGram or EdgeNGram tokenizers, it should be possible to define 2 different analyzers for a manual index, one for the index task and one for the search/query task.


Answer:

This is currently not supported in Neo4j (as of 3.5.11). The documentation states:

... Supported settings are 'analyzer', for specifying what analyzer to use when indexing and querying. ...

Neo4j doesn't provide a way to configure the analyzers (apart from specifying which analyzer to use, or deploying completely custom analyzer) so you would seldom want to define different analyzers for indexing and search.

In contrast full text search engines such as Elastic Search allow you to define individual steps in the analyzer. Then it makes sense to allow to define index analyzer and search analyzer differently (although I would argue this is rare and if it happens they are usually very similar anyway).

Question:

I load my data using the following code:

public void createGraph() {
 Map<String, String> config = new HashMap<String, String>();
 config.put("cache_type", "none");
 config.put("use_memory_mapped_buffers", "true");
 config.put("neostore.nodestore.db.mapped_memory", "200M");
 config.put("neostore.relationshipstore.db.mapped_memory", "1000M");
 config.put("neostore.propertystore.db.mapped_memory", "250M");
 config.put("neostore.propertystore.db.strings.mapped_memory", "250M");
 inserter = BatchInserters.inserter("./data/neo4j", config);

 long start = System.currentTimeMillis();

 try {
 BufferedReader reader = new BufferedReader(new InputStreamReader(new 
 FileInputStream("./data/enronEdges.txt")));
 String line;
 int lineCounter = 1;
 long srcNode, dstNode;
 while((line = reader.readLine()) != null) {
 if(lineCounter > 4) {
 String[] parts = line.split("\t");

 srcNode = getOrCreate(parts[0]);
 dstNode = getOrCreate(parts[1]);

 inserter.createRelationship(srcNode, dstNode, RelTypes.SIMILAR, null);
 }
 lineCounter++;
 }
 reader.close();
 } 
 catch (IOException e) {
 e.printStackTrace();
 }

 long time = System.currentTimeMillis() - start;
 inserter.createDeferredSchemaIndex(NODE_LABEL).on("nodeId").create();
 System.out.println("Loading time: " + time / 1000.0);

 inserter.shutdown();
 }

 private long getOrCreate(String value) {

 Long id = cache.get(Long.valueOf(value));
 if(id == null) {
 Map<String, Object> properties = MapUtil.map("nodeId", value);
 id = inserter.createNode(properties, NODE_LABEL);
 cache.put(Long.valueOf(value), id);
 }
 return id;
 }

Then I am trying to retrieve the nodes with this code:

GraphDatabaseService gdb = new GraphDatabaseFactory().newEmbeddedDatabase(dbPath);
try(Transaction tx = gdb.beginTx()) {
 Node n  = gdb.findNodesByLabelAndProperty(DynamicLabel.label("Node"), "nodeId", 1).iterator().next();
 System.out.println(n);

}

But I am getting the following error:

Exception in thread "main" java.util.NoSuchElementException: No more elements in org.neo4j.collection.primitive.PrimitiveLongCollections$5@6b3ab760 at org.neo4j.collection.primitive.PrimitiveLongCollections$PrimitiveLongBaseIterator.next(PrimitiveLongCollections.java:60) at org.neo4j.collection.primitive.PrimitiveLongCollections$13.next(PrimitiveLongCollections.java:712) at org.neo4j.helpers.collection.ResourceClosingIterator.next(ResourceClosingIterator.java:76) at test.Neo4jBatchInsert.visitAllNodes(Neo4jBatchInsert.java:56) at test.Neo4jBatchInsert.main(Neo4jBatchInsert.java:49)

Isn't this the proper way to get an indexed node? FYI, I use neo4j 2.1.3 embedded.


Answer:

The index is populated when gdb has been initialized so it might take some time for bringing the index online.

You might call gdb.schema().awaitIndexesOnline(10l, TimeUnits.MINUTES) prior to findNodesByLabelAndProperty to make sure indexes are in place.

Question:

I want to count the relationships by type in Neo4j using neo4j native java Api (not execute Cyper statement)

I create a full-text index in relationships by calling procedure

CALL db.index.fulltext.createRelationshipIndex(
  "dependsTypeRelationshipIndex",
  ["DEPENDS"], ["isoptional"], 
  { analyzer: "standard", eventually_consistent: "true" })

The index has been created suffcessfully :

Also, there are existing correspondingly relationships :

However, when I using neo4j native api, it does not work.

Is there any config I need to set, or method to count the relationships by type without using Cypher?


Answer:

In neo4j 3.5.*, for the relationship, there are two types of index, explicit index and full-text index. However, in the neo4j java native API, there is only an index manager for the explicit index, which is the reason I can't get my full-text index. So I can only query the full-text index by using the Cypher statement which is a conflict to my condition.

I hope there will be an index manager for the full-text index in the future neo4j java native API.