Hot questions for Using Neo4j in traversal

Question:

I am trying to map the result of TraversalDescription.traverse() to a list of custom Node and Relationship Object.

If I use Cypher with sdn, I can do the following:

@Query("WITH {0} AS ns, {1} AS ne " +
    "MATCH p=(n1{name:ns})-[*]-(n), (n2{name:ne}) " +
    "WHERE n=n2 " +
    "AND all(a IN nodes(p) WHERE single(x IN nodes(p) WHERE x=a)) " +
    "RETURN nodes(p) as persons, rels(p) as connections " +
    "ORDER BY size(nodes(p))")
List<GraphPath> getAllPaths(String startNode, String endNode);

then map to the GraphPath object that contains custom node and relationship Object:

GraphPath.java

@Getter
@Setter
@ToString
@NoArgsConstructor
@AllArgsConstructor
@QueryResult
public class GraphPath {    
    private List<Person> persons;
    private List<Connection> connections;    
}

Person.java

@Getter
@Setter
@NoArgsConstructor
@NodeEntity(label = "Person")
public class Person extends Entity{

    @Property(name = "name")
    private String fullName;

    @Property(name = "status")
    private String status;

    @Relationship(type = "CONNECTS", direction = Relationship.UNDIRECTED)
    private Set<Connection> personConnections = new HashSet<>();
}

Connection.java

@Getter
@Setter
@NoArgsConstructor
@RelationshipEntity(type = "CONNECTS")
public class Connection extends Entity{

    @Property(name="connection_id")
    private String connectId;

    @Property(name = "status")
    private String status;

    @StartNode
    private Person personSource;

    @EndNode
    private Person personTarget;
}

and Entity.java is just POJO with field id and overrided equals() and hashCode()

This is working fine in simple graph, but when the graph gets more complicated, the time to get the result will increase much more. I am aiming to find all possible paths between start and end node, and there is no repeated node or relationship in each path. I am hoping to use traversal API to eliminate some of the unwanted path (path that contains repeated node or relationship) on the run in order to reduce process time.

Here is the code I use, and graphDb is just GraphDatabaseService:

final Node startNode = graphDb.findNode(Labels.Person, "name", startNodeName);
final Node endNode = graphDb.findNode(Labels.Person, "name", endNodeName);
TraversalDescription td = graphDb.traversalDescription()
        .depthFirst()
        .evaluator(Evaluators.excludeStartPosition())
        .evaluator(Evaluators.includeWhereEndNodeIs(endNode))
        .relationships(Types.CONNECTS)
                .uniqueness(Uniqueness.NODE_PATH)
                .uniqueness(Uniqueness.RELATIONSHIP_PATH);
Traverser t = td.traverse(startNode)  

Now the question is, how can I map the result to the custom object I mentioned above? Doing it manually will get to a point where I have to deal with recursive object mapping (the Set of Connection in Person, and the target and source Person in Connection).


Answer:

As I wrote in the comments, I think I would just do the mapping manually, since the Path returned by the Traverser already contain the nodes and relationships and only the properties now need to be read.

By iterating on the Path, the GraphPath and its Persons and Connections can be constructed and completed sequentially. This code can obviously be refactored by extracting methods.

for (Path path : t) {
    GraphPath gp = new GraphPath();
    Person person = null;
    Connection connection = null;

    for (PropertyContainer pc : path) {
        if (pc instanceof Node) {
            Node node = (Node) pc;

            person = new Person();
            person.setId(node.getId());
            person.setFullName(node.getProperty("name", null));
            person.setStatus(node.getProperty("status", null));
            gp.getPersons().add(person);

            // No connection exists for the first node in the path
            if (connection != null) {
                // Find out which end has already been connected
                if (connection.getPersonSource() == null) {
                    connection.setPersonSource(person);
                } else {
                    connection.setPersonTarget(person);
                }
                person.getPersonConnections().add(connection);
            }
        } else {
            Relationship rel = (Relationship) pc;

            connection = new Connection();
            connection.setId(rel.getId());
            connection.setConnectId(rel.getProperty("connection_id", null));
            connection.setStatus(rel.getProperty("status", null));
            gp.getConnections().add(connection);

            // Find out which end has already been mapped
            if (rel.getStartNode().getId() == person.getId().longValue()) {
                connection.setPersonSource(person);
            } else {
                connection.setPersonTarget(person);
            }
            person.getPersonConnections().add(connection);
        }
    }
}

If you want a single Person (resp. Connection) instance for a given node, you can change the Person (resp. Connection) creation to look up a Map first, where you'll index the entities by id; you'll also have to change the wiring of the Person and Connection together to only set unset ends on the Connection, and not rely on the fact that one end or another is still null.

Question:

I have a schema, where nodes are connected by 2 types of relationship - r:A and r:B. I'm trying to write a pattern, which will find every path from node N to node M. This can be simply done by following cypher query:

match path = (n)-[:A|:B*]->(m) return path;

Unfortunately this is not what I need exactly. I need to find every path from (n) to (m) where depth via relation r:A can be infinite, but along the way only limited number of r:B relations can be use. In happy day scenario the cypher query would look like this:

match path = (n)-[:A*|:B*0..3]->(m) return path;

However cypher does not allow this syntax. I can't solve this problem even with usage of another "helping" node on the way:

match path = (n)-[:A*]->()-[:B*0..3]->(m) return path;

This does not match my need also, because the nodes can be interconnected in any possible way. For example:

(n)-[r:A]-()-[r:A]-()-[r:A]-(m)
(n)-[r:A]-(m)
(n)-[r:A]-()-[r:B]-()-[r:A]-()-[r:B]-()-[r:A]-()-[r:A]-(m)

Is there a way how this can be achieved? If not in cypher, can it be done in gremlin / neo4j traversal api / embedded functions of spring data neo4j project?

Thank's for the answers.


Answer:

Try this:

MATCH path = (n)-[:A|:B*]->(m) WITH path, relationships(path) AS r, filter(rel in relationships(path) WHERE type(rel) = 'B') AS Brels WITH path, reduce(Bcount = 0, rel IN Brels | Bcount + 1) AS Bcount WHERE Bcount <= 3 RETURN path

I don't know if I understand the question completely clear. Just let me know.

EDIT: I added the second query after comments. This solution is ugly but it is good workaround.

MATCH path = (n)-[:A|:B*]-(m) WITH path, filter(rel in relationships(path) WHERE type(rel) = 'B') AS Brels WITH path, reduce(Bcount = 0, rel IN Brels | Bcount + 1) AS Bcount WHERE Bcount <= 3 WITH path, relationships(path) AS rels WITH path, rels, reduce(count = 0, rel IN rels | count + 1) AS count WITH path, rels, range(0,count-1) as counter WITH path, reduce(x = 0, c IN counter | CASE WHEN (type(rels[c])='B' AND type(rels[c+1])='B') THEN x+200000 ELSE x+1 END) AS countX WHERE countX<200000 RETURN path, countX

Question:

I am writing some code for traversing neo4j database (Traversal API). I am using the following dependency:

    <dependency>
        <groupId>org.neo4j</groupId>
        <artifactId>server-api</artifactId>
        <version>2.2.0-M02</version>
    </dependency>

IDE shows me that the following methods are deprecated:

   GraphDatabaseService.findNodesByLabelAndProperty()  
   GlobalGraphOperations.getAllNodesWithLabel()

I couldn't find any information on their replacement. Any suggestions?


Answer:

Although the javadocs aren't posted in an easy place online to google, you can find a jar of the javadocs on maven central. Inside of that, just unzip the JAR and you get the full javadocs, which get to your answer.

GraphDatabaseService.findNodesByLabelAndProperty() is deprecated in favor of GraphDatabaseService.findNodes(Label, String, Object)

GlobalGraphOperations.getAllNodesWithLabel() is deprecated in favor of GraphDatabaseService.findNodes(Label)

Hope this helps. For other libraries in other places, it's a trick worth remembering that a lot of library javadocs get packaged as separate JARs you can find by maven central, so if you need the javadocs for one very specific version or release (as was the case here) that's usually a go-to trick of mine, if google is frustrating me and only giving me javadocs for a different or incompatible version.

Question:

I am performing a custom traversal in Neo4J, using my own evaluator. In the traversal are two nodes, connected by two different relationships. What I'm seeing is that only one of the two relationships will be walked during the traversal.

However, my custom evaluation changes its behavior based on whether both relationships are present.

It seems like during a traversal, Neo4J maintains a set of visited nodes, and if a candidate path ends at a node that has already been visited, then that path is never sent to my evaluator. Is there a way around this? How can I have a custom evaluator examine every possible path to the nodes?

Here's a quick example:

Say that the graph looks like this:

             E----D----A====B----C

The traversal begins at A. A has two different relationships tying it to B (of two different types). All of the remaining nodes are connected by only 1 relationship. The goal of the evaluator is to return A-D, A-B, and B-C, but not D-E. The determination that B-C is valid comes from the fact that there are two relationships between A and B.

How can this be solved?


Answer:

You may need to think your use case through a bit more carefully.

One suggestion is that when you use the traversal framework in java, basically you can build a TraversalDescription and then iterate through what comes back from it by relationships, rather than by Paths or by Nodes. If your primary complaint is that each node is visited only once, you can change the TraversalDescription to specify RELATIONSHIP_GLOBAL guaranteeing that all relationships will be followed, whether or not that causes you to hit a node more than once.

More broadly, traversers don't tend to go over the same material more than once because if they did, you'd need to be ultra careful about specifying a termination condition. If hitting certain nodes or relationships more than once is OK, when do you know that you're done?

Question:

I have a cypher query which sum up the amount of every child and sets this sum to the parent node.

MATCH (n:product)-[:COSTS*0..]->(child) WITH n,  sum(child.amount) as sum SET n.costs=sum;
product1(amount:20) -[:COSTS]-> product2(amount:10)
product2 -[:COSTS]-> product3(amount:5)
product2 -[:COSTS]-> product4(amount:7)

So the results for my products are:

product1.costs = 42
product2.costs = 22

Can someone give me a hint how to do that with the neo4j traversal framework in java?

I am using neo4j version 2.2.5 and the version 2.2.5 of the core api for neo4j.


Answer:

Here is my approach for your questions

DATA

CREATE (p1:Product {id: 1, amount: 30}),
       (p2:Product {id: 2, amount: 20}),
       (p3:Product {id: 3, amount: 10}),
       (p4:Product {id: 4, amount: 40}),
       (p1)-[:COSTS]->(p2),
       (p2)-[:COSTS]->(p3),
       (p2)-[:COSTS]->(p4)

Code

try (Transaction tx = getDatabase().beginTx()) {
    GraphDatabaseService database = getDatabase();

    Node rootProduct = database.findNode(DynamicLabel.label("Product"), "id", 1);

    int sum = getChildrenSum(rootProduct);

    rootProduct.setProperty("costs", sum);

    tx.success();
}

public int getChildrenSum(Node product) {
    int sum = 0;

    final DynamicRelationshipType relationshipType = DynamicRelationshipType.withName("COSTS");
    final Direction direction = Direction.OUTGOING;

    if (product.hasRelationship(relationshipType, direction)) {
        for (Relationship costs : product.getRelationships(relationshipType, direction)) {
            final Node child = costs.getEndNode();
            final String propertyName = "amount";

            if (child.hasProperty(propertyName)) {
                sum += Integer.parseInt(child.getProperty(propertyName).toString());
            }
            childrenSum += getChildrenSum(child);

            sum += childrenSum;

            child.setProperty("costs", childrenSum);
        }
    }

    return sum;
}

Question:

I've coded a traversal in Java (using Eclipse IDE). The main() program accepts input arguments (classic args[]), it calls the traversal using one of them as the start node, the other args are used to perform some data filtering on traversal results. Thus, I can run this JAR on a command line with corresponding arguments.

Question: how can quickly test this JAR as a RESTful API? e.g. sending HTTP GET with its parameters as input arguments ? what is best practice ?

Thank you


Answer:

The best way is to create an unmanaged extension which will accept query parameters that will call your traversal

Example from the doc :

@Path( "/helloworld" )
public class HelloWorldResource
{
    private final GraphDatabaseService database;

    public HelloWorldResource( @Context GraphDatabaseService database )
    {
        this.database = database;
    }

    @GET
    @Produces( MediaType.TEXT_PLAIN )
    @Path( "/{nodeId}" )
    public Response hello( @PathParam( "nodeId" ) long nodeId )
    {
        // Do stuff with the database
        return Response.status( Status.OK ).entity(
                ("Hello World, nodeId=" + nodeId).getBytes( Charset.forName("UTF-8") ) ).build();
    }
}

You can find a test example here : https://github.com/neo4j/neo4j/blob/2.2.1/community/server-examples/src/test/java/org/neo4j/examples/server/unmanaged/UnmanagedExtensionsDocIT.java

Question:

I am playing around with Neo4j and so far I have a geographical graph where an AIRPORT is connect to a CITY, the CITY to a COUNTRY and the COUNTRY to a CONTINENT, as depicted in the picture

Labels on the arrows translate to org.neo4j.graphdb.RelationshipType into my code. So far, I can build the path between the start node MXP to the end node LTN using the following mono-directional traversal.

Traverser traverse = database.traversalDescription().depthFirst()
  .relationships(CITY, BOTH)
  .relationships(CONTINENT, BOTH)
  .relationships(COUNTRY, BOTH)
  .relationships(REGION, BOTH)
  .evaluator(Evaluators.includeWhereEndNodeIs(endNode)).traverse(startNode);

With this, I get a single path MXP -> Milan -> Italy -> Europe <- England <- London <- LTN, which is correct given the graph description, the traversal description and of course my understanding my understanding of such description.

I am trying to change this code to perform a bidirectional traversal, meaning I want to start from both MXP and LTN and stop at the collision point. I tried with the following snippet, where comments mean my understanding so it may easier to point out the problem.

TraversalDescription startSide = database.traversalDescription().depthFirst() //Depth first algorithm
  .relationships(CITY, OUTGOING) //consider CITY relationship, only outgoing
  .relationships(REGION, OUTGOING) //consider REGION relationship, only outgoing
  .relationships(COUNTRY, OUTGOING) //consider COUNTRY relationship, only outgoing
  .relationships(CONTINENT, OUTGOING) //consider CONTINENT relationship, only outgoing
  .evaluator(Evaluators.excludeStartPosition()); //do not consider the starting point. 
                                               //Here I tried also with all, with the same result
                                               //with includeWhereEndNodeIs(endNode), again with same result
                                               //and combining includeWhereEndNodeIs and excludeStartPosition, once more with same result.
                                               //All tries I mirrored for the endSide description, changing endNode to startNode where I feel it was needed

TraversalDescription endSide = database.traversalDescription().depthFirst()
  .relationships(CITY, OUTGOING)
  .relationships(REGION, OUTGOING)
  .relationships(COUNTRY, OUTGOING)
  .relationships(CONTINENT, OUTGOING)
  .evaluator(Evaluators.excludeStartPosition());

List<Node> asList = Arrays.asList(startNode, endNode);
Traverser traverse = database.bidirectionalTraversalDescription().endSide(endSide).startSide(startSide).traverse(asList, asList);

Here, instead of the path I am getting with the monodirectional traversal try, I get two paths, one with only MXP and one with only LTN.

At this point I seriously believe I am completely misunderstanding the bidirectional traversal and maybe even its purpose. Where is my mistake? Why I do not get the same output?


Answer:

I finally got a working solution. The problem in my code was related to the concept of uniqueness. Interesting points for my problem are

Sets the rules for how positions can be revisited during a traversal as stated in Uniqueness. Default if not set is NODE_GLOBAL.

NODE_GLOBAL uniqueness: No node in the entire graph may be visited more than once. This could potentially consume a lot of memory since it requires keeping an in-memory data structure remembering all the visited nodes.

NODE_PATH uniqueness: A node may not occur previously in the path reaching up to it.

These descriptions are somehow different from the official API so I played around trying different combination and ended up with the following code:

TraversalDescription bothSide = database.traversalDescription().depthFirst()
    .relationships(CITY, OUTGOING)
    .relationships(REGION, OUTGOING)
    .relationships(COUNTRY, OUTGOING)
    .relationships(CONTINENT, OUTGOING)
    .uniqueness(NODE_PATH);

Traverser traverser = database
    .bidirectionalTraversalDescription()
    .startSide(bothSide)
    .endSide(bothSide)
    .traverse(node, endNode);

Basically, I defined a common TraversalDescription for both the end and the start side, where I want to follow only OUTGOING relationships and I want to consider paths only where nodes are unique inside the path itself.

Then, I defined a bidirectional traverser which simply sets up the end and the start side and traverses the graph from the starting node node to the end node endNode (well, actually it traverses from start to end AND from end to start at the same time and stops when the two traversal collide, merging the resulting paths into a single path leading from start to end).

NOTE: I am not completely sure about the meaning of NODE_GLOBAL, since in my database each node represents a geographic entity, so each node in the path MXP -> Milan -> Italy -> Europe <- England <- London <- LTN should be visited only once and thus there should be no difference between NODE_GLOBAL and NODE_PATH in this context.

Question:

I assume I have an error in my code, but I can't find which one I have three nodes : a, b and c there are linked so (a)-[r1]->(b)<-[r2]-(c) (I'm 100% sure of this, I checked with neo4j Community)

But my Program can only find the first and the last one. The second one, b, is always ignored. Here is my method :

static List<Node> getNodes(Node startNode,final Node endNode,boolean uniqueResult,List<RelationshipType> relationshipTypes)
{
    List<Node> result=new ArrayList<>();
    try(Transaction tx=getInstance()._graph.beginTx())
    {
        TraversalDescription td = getInstance()._graph.traversalDescription().breadthFirst();
        for(RelationshipType relationshipType:relationshipTypes)
            td=td.relationships(relationshipType);
        td=td.evaluator(Evaluators.excludeStartPosition());
        if(!uniqueResult)
            td=td.uniqueness(Uniqueness.NODE_PATH);
        td=td.evaluator(new Evaluator() {
            @Override
            public Evaluation evaluate(Path path) {
                boolean isEndNode=path.endNode().equals(endNode);
                return Evaluation.of(isEndNode, !isEndNode);
            }
        });      
        Traverser tr=td.traverse( startNode );
        for(Path path:tr)   
            result.add(path.endNode());

        tx.success();
    }   
    return result;
}

startNode is a endNode is c uniqueResult is true (but I tried with false, no change) relationShipTypes contains r1 and r2.

I have no clue, why it doesn't work Thanks for your help


Answer:

A Path is a collection of node-relationship-node.

With

  • length() you can access the path length, i.e the number of rels
  • nodes() returns an Iterable of nodes in the path
  • relationship() returns an Iterable of relationships in the path
  • path itself is an Iterable of property containers
  • startNode() and endNode() return those respectively
  • lastRelationship() returns the last relationship in the path, as it is more frequently used

See also: http://docs.neo4j.org/chunked/stable/tutorial-traversal-java-api.html#_path

Question:

I'm trying to create an algorithm in Neo4j using the java API. The algorithm is called GRAIL (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.169.1656&rep=rep1&type=pdf) and it assigns labels to a graph for later answering reachability queries.

This algorithm uses postorder depth first search but with random traversal each time (each child of a node is visited randomly in each traversal). In the neo4j java api there is an algorithm doing this (https://github.com/neo4j/neo4j/blob/7f47df8b5d61b0437e39298cd15e840aa1bcfed8/community/kernel/src/main/java/org/neo4j/graphdb/traversal/PostorderDepthFirstSelector.java) without the randomness and i can't seem to find a way to do this.

My code has a traversal description in which i want to add a custom order (BranchOrderingPolicy) in order to achieve the before mentioned algorithm. like this:

 .order(**postorderDepthFirst()**)

Answer:

The answer to my question came rather easy but after a lot of thinking. I just had to alter the path expander (i created my own) which returns the relationhipss that the traversal takes as next and there a simple line of code to randomize the relationships. The code is :

public class customExpander implements PathExpander {

private final RelationshipType relationshipType;
private final Direction direction;
private final Integer times;

public customExpander (RelationshipType relationshipType, Direction direction ,Integer times)
{
    this.relationshipType = relationshipType;
    this.direction = direction;
    this.times=times;
}



@Override
public Iterable<Relationship> expand(Path path, BranchState state)
{
        List<Relationship> results = new ArrayList<Relationship>();
        for ( Relationship r : path.endNode().getRelationships( relationshipType, direction ) )
        {  
           results.add( r ); 
        }
        Collections.shuffle(results);
        }       
    return results;         
    }


@Override
public PathExpander<String> reverse()
{
    return null;
}

}