Hot questions for Using Neo4j in mongodb

Question:

I am using mongo-connector and neo4j doc manager to stream some data into my neo4j instance. The data that is being inserted into the mongo database is coming from a Java application that's using Morphia to serialize the object.

The objects in my Java application are tied together with references to each other. Morphia is correctly translating that into the mongo database. Here is an example of two documents that link to one another:

{
    "_id" : ObjectId("58fe606a43d7e22b34f65a16"),
    "name" : "client",
    "part" : 1
}

The mongo doc that points to the related doc:

{
    "_id" : ObjectId("58fe606d43d7e22b34f65a1a"),
    "correlatedObject" : ObjectId("58fe606a43d7e22b34f65a16"),
    "name" : "guest",
    "part" : 2
}

So you can see how the first example is a regular document with no correlatedObject field. The second document points to the first. Now, it's my understanding that the neo4j doc manager should detect this relationship and build a query based on it. But as I am seeing it in neo4j, this relationship is never made and the two entities are never tied together in the graph.

So my question is: How do I define relationships - either in doc manager configuration or a format that the doc manager will understand - so that in neo4j, the two entities can visually be seen as related items.


Answer:

Good question! According to their docs, this is how you do it:

Creating relationships by _id reference

Question:

I'm going to pass my data from MongoDB to Neo4j. So, I exported my MongoDB documents in .csv. As you can read here I have a problem with the array uniform. So I wrote a java program to fix this problem. Here is the .csv exported from MongoDB (note the different about uniform array):

_id,official_name,common_name,country,started_by.day,started_by.month,started_by.year,championship,stadium.name,stadium.capacity,palmares.first_prize,palmares.second_prize,palmares.third_prize,palmares.fourth_prize,average_age,squad_value,foreigners,uniform
0,yaDIXxLAOV,WWYWLqPcYM,QsVwiNmeGl,7,9,1479,oYKGgstIMv,qskcxizCkd,8560,10,25,9,29,16,58,6,"[""first_colour"",""second_colour"",""third_colour""]"

Here is how it must be to import in Neo4j:

_id,official_name,common_name,country,started_by.day,started_by.month,started_by.year,championship,stadium.name,stadium.capacity,palmares.first_prize,palmares.second_prize,palmares.third_prize,palmares.fourth_prize,average_age,squad_value,foreigners,uniform.0,uniform.1,uniform.2
0,yaDIXxLAOV,WWYWLqPcYM,QsVwiNmeGl,7,9,1479,oYKGgstIMv,qskcxizCkd,8560,10,25,9,29,16,58,6,first_colour,second_colour,third_colour

My code works, but I have to convert 500k line of the .csv file and the program it is too much slow(it's still working after 20 minutes :/):

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.PrintWriter;

public class ConvertireCSV {

    public static void main(String[] args) throws IOException {

        FileReader f;
        f=new FileReader("output.csv");

        BufferedReader b;
        b=new BufferedReader(f);

        String firstLine= b.readLine();
        int uniform = firstLine.indexOf("uniform");
        firstLine=firstLine.substring(0, uniform);
        firstLine = firstLine + "uniform.0,uniform.1,uniform.2\n";

        String line="";
        String csv="";

        while(true) {
            line=b.readLine();
            if(line==null)
                break;
            int u = line.indexOf("\"[");
            line=line.substring(0, u);
            line=line + "first_colour,second_colour,third_colour \n";
            csv=csv+line;                   
        }

        File file = new File("outputForNeo4j.csv");

        if(file.createNewFile()) {
            PrintWriter pw = new PrintWriter(file); 
            pw.println(firstLine + csv);
            System.out.println("New file \"outputForNeo4j.csv\" created.");
            pw.flush();
            pw.close();
        }
    }
}

How can I make it faster?


Answer:

Okay some basic ways to improve your code:

  1. Make sure that your variables got the minimal scope required. If you don't need line outside your loop, don't declare it outside your loop.
  2. Concatenation of simple strings is in general slow. Use a StringBuilder to speed things to there.
  3. Why are you buffering the string anyway? Seems like a waste of memory. Just open the output stream to your target file and write the lines to the new file as you process them.

Examples:

I don't think you need a example on the first point. For the second things could look like this:

...
StringBuilder csv = new StringBuilder();
while(true) {
    ...
    csv.append(line);
}
...
if(file.createNewFile()) {
    ...
    pw.println(firstLine + csv.toString());
    ...
}

For the third point the rewriting would be a little more extensive:

public static void main(String[] args) throws IOException {
    FileReader f;
    f=new FileReader("output.csv");

    BufferedReader b;
    b=new BufferedReader(f);

    String firstLine= b.readLine();
    int uniform = firstLine.indexOf("uniform");
    firstLine=firstLine.substring(0, uniform);
    firstLine = firstLine + "uniform.0,uniform.1,uniform.2\n";

    File file = new File("outputForNeo4j.csv");
    if(!file.createNewFile()) {
        // all work would be for nothing! Bailing out.
        return;
    }

    PrintWriter pw = new PrintWriter(file); 
    pw.print(firstLine);

    while(true) {
        String line=b.readLine();
        if(line==null)
            break;
        int u = line.indexOf("\"[");
        line=line.substring(0, u);
        line=line + "first_colour,second_colour,third_colour \n";
        pw.print(line);                   
    }

    System.out.println("New file \"outputForNeo4j.csv\" created.");
    pw.flush();
    pw.close();
    b.close()
}

Question:

Limiting the number of returned records in MongoDB is as simple as db.collection.find().limit(n). However I'd like to issue the equivalent query from Neo4J.

Given that a find query is issued from Neo4J as follows... apoc.mongodb.find(host, db, collection, query, project, sort) I find it difficult to see how one should tell the MongoDB instance to limit the returned results before streaming to Neo4J.

I am aware of Cypher's LIMIT clause, however, this feels like bad practice, considering the amount of redundant data that will be streamed from Mongo.

Is there yet a way of adding a limit to the query results pre-stream?


Answer:

At the moment this is not available out of the box. But you can add this functionality.

In the APOC source code make the following changes:

neo4j-apoc-procedures/src/main/java/apoc/mongodb/MongoDB.java:


@Procedure
@Description("apoc.mongodb.find(host-or-port,db-or-null,collection-or-null,query-or-null,projection-or-null,sort-or-null,[compatibleValues=true|false]) yield value - perform a find,project,sort operation on mongodb collection")
public Stream<MapResult> find(@Name("host") String hostOrKey, @Name("db") String db, @Name("collection") String collection, @Name("query") Map<String, Object> query, @Name("project") Map<String, Object> project, @Name("sort") Map<String, Object> sort, @Name(value = "compatibleValues", defaultValue = "false") boolean compatibleValues) {
    return getMongoColl(hostOrKey, db, collection, compatibleValues).find(query, project, sort).map(MapResult::new);
}

interface Coll extends Closeable {

...

    Stream<Map<String, Object>> find(Map<String, Object> query, Map<String, Object> project, Map<String, Object> sort, Map<String, Object> pagination);

neo4j-apoc-procedures/src/main/java/apoc/mongodb/MongoDBColl.java:


@Override
public Stream<Map<String, Object>> find(Map<String, Object> query, Map<String, Object> project, Map<String, Object> sort, Map<String, Object> pagination) {
    FindIterable<Document> documents = query == null ? collection.find() : collection.find(new Document(query));
    if (project != null) documents = documents.projection(new Document(project));
    if (sort != null) documents = documents.sort(new Document(sort));
    if (pagination != null) {
        Object skip = pagination.get("skip");
        Object limit = pagination.get("limit");
        if (skip != null) documents = documents.skip(Integer.parseInt(String.valueOf(skip)));
        if (limit != null) documents = documents.limit(Integer.parseInt(String.valueOf(limit)));
    }
    return asStream(documents);
}