Hot questions for Using Neo4j in lucene

Question:

I'm still in the evaluation of Neo4j vs. OrientDB. Most importantly I need Lucene as full-text index engine. So I created on both databases the same schema with the same data (300Mio lines). I'm also experienced with querying different things in both systems. I used the Standard Analyzer on both sides. The OrientDB test query results are all fine and really good in terms of reliability and speed. The speed of Neo4j is also ok but the results are kind of bad in most of the cases. So let's come to the different issues I have with Neo4j Lucene indexing. I always give you an example of how it would look in OrientDB and which result set you should be getting out of the query.

So in these examples, there are Applns that have title(s). Titles are indexed with Lucene in both databases. Applns also have an ID just to demonstrate the ordering. At the end of each query I have some questions about them. It would be great to get some feedback or even answers about them.

Query #0: One word query with no order

Well this query is very simple. It shall be tested how the database behave if there is just a simple word and nothing else. As you can see the Neo4j result is way longer then the one from OrientDB. OrientDB is using TFIDF to keep the results short and more reliable to the actual search. As you can see as first result in OrientDB, there is title with SOLAR. That is totally missing in Neo4j, too.

In Neo4j: START n=node:titles('title:solar') RETURN n.title,n.ID LIMIT 10

  1. SOLAR RADIATION SHIELDING PARTICULATE AND SOLAR RADIATION SHIELDING RESIN MATERIAL DISPERSED WITH ... 38321319

  2. Solar module for cooling solar cells on the underside of a solar panel has air inlet and outlet openings ... 12944121

  3. Solar construction component for solar thermal assemblies, solar thermal assembly, method for operating a solar... 324146113

  4. ...

In OrientDB: SELECT title,ID FROM Appln WHERE title LUCENE "solar" LIMIT 10

  1. SOLAR 24900187

  2. Solar unit and solar apparatus 1876343

  3. Solar module with solar concentrator 13496706

  4. ...

Questions:

  1. Why is Neo4j not using TFIDF or what do they use instead?
  2. Is Neo4j able to use some ordering of the keyword match?
  3. Is it possible to change TFIDF to somethign else in OrientDB?
Query #1: One word query with order by ID

Neo4j is ordering the ID's before using TFIDF. As known from Query#0 Neo4j is not using TFIDF so it's basically just searching via first results of the Lucene query. In OrientDB besides it's still searching by good TFIDF's and then ordering.

In Neo4j: START n=node:titles('title:solar') RETURN n.title,n.ID ORDER BY n.ID ASC LIMIT 10

  1. Stackable flat-roof/floor frame for solar panels 318

  2. Method for producing contact for solar cells 636

  3. Solar cell and fabrication method thereof 1217

  4. ...

In OrientDB: SELECT title,ID FROM Appln WHERE title LUCENE "solar" ORDER BY ID ASC LIMIT 10

  1. Solar unit and solar apparatus 1876343

  2. Solar module with solar concentrator 13496706

  3. SOLAR TRACKER FOR SOLAR COLLECTOR 16543688

  4. ...

Questions:

  1. How would a search in OrientDB look like that should be ordered by the ID and still matching the best TFIDF of them.
  2. Is there a way in Neo4j to order the Lucene match before ordering by the ID?
Query #2: One word with using a star search

Star search had no influence on the Neo4j results. OrientDB results changed in a good way.

In Neo4j: START n=node:titles('title:solar*') RETURN n.title,n.ID ORDER BY n.ID ASC LIMIT 10

  1. Stackable flat-roof/floor frame for solar panels 318

  2. Method for producing contact for solar cells 636

  3. Solar cell and fabrication method thereof 1217

  4. ...

In OrientDB: SELECT title,ID FROM Appln WHERE title LUCENE "solar*" ORDER BY ID ASC LIMIT 10

  1. High performance solar methane generator 8354701

  2. All-plastic honeycomb solar water-heater 8355379

  3. Plate type solar energy heat collector plate core and its manufacturing method 8356173

  4. ...

Questions:

  1. Does Neo4j ignore star searches?
Query #3: Searching for 2 words devided by a space

The strange here is that you need to change 'title:solar panel' to that query here. Otherwhise you just get errors. OrientDB seems good so far.

In Neo4j: START n=node:titles(title="solar panel") RETURN n.title,n.ID ORDER BY n.ID ASC LIMIT 10

  1. Returned 0 rows in 817 ms

In OrientDB: SELECT title,ID FROM Appln WHERE title LUCENE "solar panel" ORDER BY ID ASC LIMIT 10

  1. SOLAR PANEL 1584567

  2. SOLAR PANEL 1616547

  3. SOLAR PANEL 2078382

  4. SOLAR PANEL 2078383

  5. Solar panel 2178466

  6. ...

Questions:

  1. Why does Neo4j need a special Query here to at least don't throw any error?
  2. Why is the query failing and not giving anything back? I know that Neo4j is searching here for lower letters, so it's case sensitive. But why it is like this? I mean I use the default analyzer and the doc of Neo4j Lucene says it's true, so it means to_lower_letter.
Query #4: Now searching for the same query in capital letters

The same issue like in #3. In Neo4j just searching returning the capital letters results of the words. OrientDB results looking fine again.

In Neo4j: START n=node:titles(title="SOLAR PANEL") RETURN n.title,n.ID ORDER BY n.ID ASC LIMIT 10

  1. SOLAR PANEL 348800

  2. SOLAR PANEL 420683

  3. SOLAR PANEL 1393804

  4. SOLAR PANEL 1584567

  5. SOLAR PANEL 1616547

  6. ...

In OrientDB: SELECT title,ID FROM Appln WHERE title LUCENE "SOLAR PANEL" ORDER BY ID ASC LIMIT 10

  1. SOLAR PANEL 1584567

  2. SOLAR PANEL 1616547

  3. SOLAR PANEL 2078382

  4. SOLAR PANEL 2078383

  5. Solar panel 2178466

  6. ...

Questions:

  1. Same question like in #3, how to search with to_lower_letter?
Query #5: Combining two words and using the star search

Here I want to combine words search with star search. But with the equal search I'm not able to find matches because he expects the star as usual sign in the title. But I'm not able to say 'title:SOLAR PANEL*'. That's also forbidden. In OrientDB everything is fine.

In Neo4j: START n=node:titles(title="SOLAR PANEL*") RETURN n.title,n.ID ORDER BY n.ID ASC LIMIT 10

  1. Returned 0 rows in 895 ms

In OrientDB: SELECT title,ID FROM Appln WHERE title LUCENE "SOLAR PANEL*" ORDER BY ID ASC LIMIT 10

  1. SOLAR PANELS 1405717

  2. SOLAR PANEL 1584567

  3. SOLAR PANEL 1616547

  4. SOLAR PANEL 2705081

  5. Solar Panel 2766555

  6. ...

Questions:

  1. How can you combine some words with the star search in Neo4j?
Query #6: Counting query results

The last thing I really need is a fast lookup how many results are there overall. Here Neo4j is finding a result way faster but always finding less matches then OrientDB. Searching for Solar is kind of close to each other. But another test was not that close.

In Neo4j: START n=node:titles("title:Solar") RETURN count(*)

143211 in 220 sec

In OrientDB: SELECT count(*) title FROM Appln WHERE title LUCENE "Solar" LIMIT -1

148029 in 50 sec

Questions:

  1. How can that lookup times be improved on both systems?
  2. Why does both systems find different number of matches? Also happens on other keywords. Maybe other indexing eninge used?

Well that is everything for now. If you need any other query just tell me and I deliver it. I think it's very important to compare the Lucene implementation because with Millions of nodes Lucene has to many advantages. Thanks for any small tip.

Btw: please don't give tips about using Java code instead for the query. I want to use Cypher because the request shall be done in the browser, like in OrientDB. I know that everything here is easily be done with Java code. Thank you.


Answer:

Well, I want to share what I found out about my issues until now:

Infos about Query #0,#1 and #2:
  1. It is not possible to change the TFIDF of Neo4j. They are using an own implementation that cannot be changed.
  2. In OrientDB ordering before searching is currently slow.

    SELECT FROM (
      SELECT title,ID FROM Appln WHERE title LUCENE "solar*" ORDER BY ID ASC
    )  LIMIT 1
    
    Query executed in 11.531 sec. Returned 1 record(s)
    
    
    SELECT FROM (
      SELECT title,ID FROM Appln WHERE title LUCENE "solar*" ORDER BY ID ASC
    )  LIMIT 10
    
    Query executed in 225.176 sec. Returned 10 record(s)
    

    The reason for it's being that slow is that is does not corresponds with Lucene.

Fixing Query #3,#4 and #5:

the query is not correct. The equal is a direct match and not the fuzzy one. So

START n=node:titles(title="solar panel") RETURN n.title,n.ID ORDER BY n.ID ASC LIMIT 10

needs to be replaced by

START n=node:titles('title:solar\\ panel') RETURN n.title,n.ID ORDER BY n.ID ASC LIMIT 10

Really bad way that you need to escape things in the cypher. Here the order of the two words are important. But there is another way to say it

START n=node:titles('title:SoLar AND title:Panel') RETURN n.title,n.ID ORDER BY n.ID ASC LIMIT 10

but also really bad if you image you have a string and just ask Neo4j for results, you need a parser. But here the order of the words does not matter.

Fixing Query #6:

OrientDB is currently working on making the counting faster (milliseconds). Planned in the 2.0 Release in some days.

Neo4j has no plans about this.

Question:

In my project I'm using neo4j version 2.2.5 and it conflicts with lucene dependency in pom.xml, because I checked same code without lucene dependency and it was fine, so how can I use lucene and neo4j in same project without conflict.

ERROR:

java.lang.RuntimeException: Error starting org.neo4j.kernel.EmbeddedGraphDatabase, E:\neo4j at org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:335) at org.neo4j.kernel.EmbeddedGraphDatabase.(EmbeddedGraphDatabase.java:59) at org.neo4j.graphdb.factory.GraphDatabaseFactory.newDatabase(GraphDatabaseFactory.java:108) at org.neo4j.graphdb.factory.GraphDatabaseFactory$1.newDatabase(GraphDatabaseFactory.java:95) at org.neo4j.graphdb.factory.GraphDatabaseBuilder.newGraphDatabase(GraphDatabaseBuilder.java:176) at org.neo4j.graphdb.factory.GraphDatabaseFactory.newEmbeddedDatabase(GraphDatabaseFactory.java:67) at neo4j.graphdbtest.IndexSearchExample.initDB(IndexSearchExample.java:42) at com.sessa.col.spr.act.process_flow.Flow.startProcess(Flow.java:56) at com.sessa.col.spr.act.process_flow.FlowHandler.main(FlowHandler.java:17) Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.extension.KernelExtensions@17973d6f' failed to initialize. Please see attached cause exception. at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.init(LifeSupport.java:489) at org.neo4j.kernel.lifecycle.LifeSupport.init(LifeSupport.java:72) at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:106) at org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:331) ... 8 more Caused by: java.lang.NoClassDefFoundError: org/apache/lucene/document/Fieldable at org.neo4j.kernel.api.impl.index.NodeRangeDocumentLabelScanStorageStrategy.(NodeRangeDocumentLabelScanStorageStrategy.java:71) at org.neo4j.kernel.api.impl.index.LuceneLabelScanStoreExtension.newKernelExtension(LuceneLabelScanStoreExtension.java:73) at org.neo4j.kernel.api.impl.index.LuceneLabelScanStoreExtension.newKernelExtension(LuceneLabelScanStoreExtension.java:39) at org.neo4j.kernel.extension.KernelExtensions.init(KernelExtensions.java:66) at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.init(LifeSupport.java:483) ... 11 more Caused by: java.lang.ClassNotFoundException: org.apache.lucene.document.Fieldable at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) ... 16 more

pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>com.sessa.col.spr.act</groupId>
<artifactId>Color-Spreading-Activation</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>

<name>Color-Spreading-Activation</name>
<url>http://maven.apache.org</url>

<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>

<dependencies>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.4</version>
    </dependency>

    <dependency>
        <groupId>org.neo4j</groupId>
        <artifactId>neo4j</artifactId>
        <version>2.2.5</version>
    </dependency>

    <dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-corenlp</artifactId>
        <version>3.5.2</version>
    </dependency>

    <dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-parser</artifactId>
        <version>3.5.2</version>
    </dependency>

    <dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-corenlp</artifactId>
        <version>3.5.2</version>
        <classifier>models</classifier>
    </dependency>

    <dependency>
        <groupId>com.sparsity</groupId>
        <artifactId>sparkseejava</artifactId>
        <version>5.1.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.jena</groupId>
        <artifactId>jena-tdb</artifactId>
        <version>1.1.2</version>
    </dependency>

    <dependency>
        <groupId>org.apache.opennlp</groupId>
        <artifactId>opennlp-tools</artifactId>
        <version>1.5.3</version>
    </dependency>

    <dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-core</artifactId>
        <version>5.3.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-analyzers-common</artifactId>
        <version>5.3.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-queryparser</artifactId>
        <version>5.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-queries</artifactId>
        <version>5.3.0</version>
    </dependency>

</dependencies>


Answer:

Is not possible to use Neo4j and latest version of Lucene in one Maven project. Neo4j is using version 3.6.

You have two options

  1. Write your own Class Loader

    • Java Class Loaders
  2. Use Maven Shade plugin

Question:

Because of different class versions in different .jar files I got this exception:

java.lang.RuntimeException: Error starting org.neo4j.kernel.EmbeddedGraphDatabase, E:\neo4j at org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:333) at org.neo4j.kernel.EmbeddedGraphDatabase.(EmbeddedGraphDatabase.java:63) at org.neo4j.graphdb.factory.GraphDatabaseFactory$1.newDatabase(GraphDatabaseFactory.java:92) at org.neo4j.graphdb.factory.GraphDatabaseBuilder.newGraphDatabase(GraphDatabaseBuilder.java:198) at org.neo4j.graphdb.factory.GraphDatabaseFactory.newEmbeddedDatabase(GraphDatabaseFactory.java:69) at neo4j_lucene.conflict_solver.ConfilctSolver.createDb(ConfilctSolver.java:55) at neo4j_lucene.conflict_solver.ConfilctSolver.main(ConfilctSolver.java:35)

despite I'm using ClassLoader for solving this problem, but again I get same exception. Here is my code:

try {
        CustomClassLoader ccl = new CustomClassLoader();
        Object object;
        Class clas;
        clas = ccl
                .loadClass("org.neo4j.graphdb.factory.GraphDatabaseFactory");

        object = clas.newInstance();

        graphDb = ((GraphDatabaseFactory) object)
                .newEmbeddedDatabase(DB_PATH);      

    } catch (Exception e) {
        e.printStackTrace();
    }

Custom class loader code:

public class CustomClassLoader extends ClassLoader {
private String jarFile = "C:/Users/RaufA/Desktop/test.jar"; // Path
                                                                            // to
                                                                            // the
                                                                            // jar
                                                                            // file
private Hashtable classes = new Hashtable(); // used to cache already
                                                // defined classes

public CustomClassLoader() {
    super(CustomClassLoader.class.getClassLoader()); // calls the parent
                                                        // class
                                                        // loader's
                                                        // constructor
}

public Class loadClass(String className) throws ClassNotFoundException {
    return findClass(className);
}

public Class findClass(String className) {
    byte classByte[];
    Class result = null;

    result = (Class) classes.get(className); // checks in cached classes
    if (result != null) {
        return result;
    }

    try {
        return findSystemClass(className);
    } catch (Exception e) {
    }

    try {
        JarFile jar = new JarFile(jarFile);         
        JarEntry entry = jar.getJarEntry(className + ".class");     
        System.out.println(className+".class");
        InputStream is = jar.getInputStream(entry);
        ByteArrayOutputStream byteStream = new ByteArrayOutputStream();
        int nextValue = is.read();
        while (-1 != nextValue) {
            byteStream.write(nextValue);
            nextValue = is.read();
        }

        classByte = byteStream.toByteArray();
        result = defineClass(className, classByte, 0, classByte.length,
                null);
        classes.put(className, result);
        System.out.println(">>>>result: " + result);
        return result;
    } catch (Exception e) {
        e.printStackTrace();
        return null;
    }
}

}

What else should I do?


Answer:

You are trying to have Neo4j and Lucene together in one jar, right.

Problem is, because Neo4j uses old Lucene version.

Alessandro Negro from GraphAware solved that problem and you can find his solution here - https://github.com/graphaware/neo4j-elasticsearch-tests

Question:

I have a mavenproject in eclipse which relies on jars that include lucene.

This is my pom:

      <dependency>
      <groupId>org.dbpedia.spotlight</groupId>
      <artifactId>core</artifactId>
      <version>0.7</version>
      </dependency>
     <dependency>
   <groupId>org.neo4j</groupId>
   <artifactId>neo4j</artifactId>
   <version>3.0.3</version>
  </dependency>
      <dependency>
    <groupId>org.neo4j</groupId>
    <artifactId>neo4j-bolt</artifactId>
    <version>3.0.3</version>
  </dependency>
    <dependency>
   <groupId>org.neo4j</groupId>
   <artifactId>neo4j-kernel</artifactId>
   <version>3.0.3</version>
  </dependency>
  <dependency>
    <groupId>org.neo4j</groupId>
    <artifactId>neo4j-cypher</artifactId>
    <version>3.0.3</version>
</dependency>

The issue is that I can not create/access my neo4j-Database if the dbpedia-spotlight-jar is included.

The Code

graphDb = new GraphDatabaseFactory().newEmbeddedDatabaseBuilder( dir )
        .setConfig( GraphDatabaseSettings.read_only, "true" )
        .newGraphDatabase();

gives the errormessage

Exception in thread "main" java.lang.RuntimeException: Error starting org.neo4j.kernel.impl.factory.CommunityFacadeFactory, F:\DLs\DB
    at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:144)
    at org.neo4j.kernel.impl.factory.CommunityFacadeFactory.newFacade(CommunityFacadeFactory.java:40)
    at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:108)
    at org.neo4j.graphdb.factory.GraphDatabaseFactory.newDatabase(GraphDatabaseFactory.java:100)
    at org.neo4j.graphdb.factory.GraphDatabaseFactory.lambda$createDatabaseCreator$203(GraphDatabaseFactory.java:89)
    at org.neo4j.graphdb.factory.GraphDatabaseFactory$$Lambda$1/440434003.newDatabase(Unknown Source)
    at org.neo4j.graphdb.factory.GraphDatabaseBuilder.newGraphDatabase(GraphDatabaseBuilder.java:183)
    at neo4j.Neo4j.startServer(Neo4j.java:26)
    at countAnnotator.Main.main(Main.java:35)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.extension.KernelExtensions@78d4b13a' failed to initialize. Please see attached cause exception.
    at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.init(LifeSupport.java:416)
    at org.neo4j.kernel.lifecycle.LifeSupport.init(LifeSupport.java:62)
    at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:98)
    at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:140)
    ... 8 more
Caused by: java.lang.VerifyError: Cannot inherit from final class
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(Unknown Source)
    at java.security.SecureClassLoader.defineClass(Unknown Source)
    at java.net.URLClassLoader.defineClass(Unknown Source)
    at java.net.URLClassLoader.access$100(Unknown Source)
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at org.neo4j.kernel.api.impl.labelscan.storestrategy.BitmapDocumentFormat.<clinit>(BitmapDocumentFormat.java:40)
    at org.neo4j.kernel.api.impl.labelscan.LuceneLabelScanIndexBuilder.<init>(LuceneLabelScanIndexBuilder.java:34)
    at org.neo4j.kernel.api.impl.labelscan.LuceneLabelScanIndexBuilder.create(LuceneLabelScanIndexBuilder.java:49)
    at org.neo4j.kernel.api.impl.labelscan.LuceneLabelScanStoreExtension.getLuceneIndex(LuceneLabelScanStoreExtension.java:90)
    at org.neo4j.kernel.api.impl.labelscan.LuceneLabelScanStoreExtension.newInstance(LuceneLabelScanStoreExtension.java:79)
    at org.neo4j.kernel.api.impl.labelscan.LuceneLabelScanStoreExtension.newInstance(LuceneLabelScanStoreExtension.java:40)
    at org.neo4j.kernel.extension.KernelExtensions.init(KernelExtensions.java:69)
    at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.init(LifeSupport.java:406)
    ... 11 more

IF I have the dbpedia-spotlight as a dependency.

If I dont have it as a dependency, it runs just fine. I tried manually adding the jar as external archive and putting maven-dependencies to "Top" in the Buildpath-Order but to no avail.

Im fairly unexperienced in terms of maven so I wonder how one would go about resolving such an issue? I need both jars to run my project (first step is wikification using dbpedia-spotlight, second step is calculating shortest path using neo4j - which both work as long as the other isnt included).

Putting the neo4j-part in a different proejct and the project in the buildpath of the original project didnt help either.

Thanks in advance!


So I did find a solution to my problem although I really dont understand why it is one - perhaps someone can explain it to me.

I still have the original project with both the dbpedia as well as the neo4j-jars. I have also created a different empty project which only has the neo4j-dependencies and added this project to the first projects buildpath.

Now I can call both the wikification as well as neo4j from the first project and they work. My assumption is that this causes the 5.5 version to take priority over the 3.6 version and dbpedia is upwards-compatible while the neo4j 5.5-version was not downwards-compatible.

That might also be completely wrong since I know very little about how build paths work in terms of priority etc. and I stumbled upon this solution through luck.

Anyways - it works and thats whats important to me.

edit2: The proper solution was as simple as swapping the positions of dbpedia-spotlight and noe4j in the POM.


Answer:

Regarding https://github.com/dbpedia-spotlight/dbpedia-spotlight/blob/master/pom.xml

org.dbpedia.spotlight in version 0.7 requires lucene version 3.6.0.

The only component of neo4j with dependencies to lucene, that I found googeling is neo4j-lucene-index. You did not list it in your pom, but may be you have it as a transitive (indirect) dependency. This works with lucene version 5.5, see neo4j-lucene-index/3.0.3

So what you have now does just not fit, you have to find versions of dbpedia that use the same version of lucene.

In both above links you find different versions of dbpedia and neo4j with their dependencies. Search the versions until you find some that use the same lucene version.

You can also run

mvn dependency:tree

on your project to get information about the used versions. Version conflicts should be shown there also.

Is it possible to use different versions ?

Short answer: No

Maven uses the dependency only to download the required jars and to put them in the classpath of your project. It will always put only one version of a jar file for every dependency into the classpath.

If this jar contains classes, that one of the other jars (dbpedia or neo4j) are not compatible with, you will get problems (exceptions).

That is not a maven specific problem, its just how java works.

At runtime neo4j and dbpedia want to use the same instances of lucene classes, and so they must somehow share the same version of lucene.

However, if you find a lucene version thats close to what dbpedia and neo4j use, you can give it a try, sometimes it works...

Question:

I have indexed the title property of a book node in Neo4j using the full-text capabilities of Lucene. When I want to search for a particular term(like wars) in a title across all nodes, I can do the following:

     IndexHits<Node> nodes = graphDb.index().forNodes("node_auto_index").query("title:wars");
     for (Node n : nodes) {
        //do something
     }
     nodes.close();

This returns the nodes sorted in order of some frequency score maintained by Lucene.

I would like to know the actual score associated to each of the nodes in the result. For example if the index internally looks as follows:

wars -> id8:4, id3:3, id1:2

I would like to return the corresponding scores 4,3 and 1 instead of just the ids.

Thanks.


Answer:

You can use the following :

IndexHits<Node> hits = graphDb.index().forNodes("node_auto_index").query("title:wars");
while (hits.hasNext()) {
  Node node = hits.next();
  float weight = hits.currentScore();
}