Hot questions for Using Neo4j in gremlin

Question:

I spent a week at Gremlin shell trying to compose one query to get all incoming and outgoing vertexes, including their edges and directions. All i tried everything.

g.V("name","testname").bothE.as('both').select().back('both').bothV.as('bothV').select(){it.map()}

output i need is (just example structure ):

[v{'name':"testname"}]___[ine{edge_name:"nameofincomingedge"}]____[v{name:'nameofconnectedvertex']

[v{'name':"testname"}]___[oute{edge_name:"nameofoutgoingedge"}]____[v{name:'nameofconnectedvertex']

So i just whant to get 1) all Vertices with exact name , edge of each this vertex (including type inE or outE), and connected Vertex. And ideally after that i want to get their map() so i'l get complete object properties. i dont care about the output style, i just need all of information present, so i can manipulate with it after. I need this to train my Gremlin, but Neo4j examples are welcome. Thanks!


Answer:

There's a variety of ways to approach this. Here's a few ideas that will hopefully inspire you to an answer:

gremlin> g = TinkerGraphFactory.createTinkerGraph()                                                                                                        
==>tinkergraph[vertices:6 edges:6]
gremlin> g.V('name','marko').transform{[v:it,inE:it.inE().as('e').outV().as('v').select().toList(),outE:it.outE().as('e').inV().as('v').select().toList()]}
==>{v=v[1], inE=[], outE=[[e:e[9][1-created->3], v:v[3]], [e:e[7][1-knows->2], v:v[2]], [e:e[8][1-knows->4], v:v[4]]]}

The transform converts the incoming vertex to a Map and does internal traversal over in/out edges. You could also use path as follows to get a similar output:

gremlin> g.V('name','marko').transform{[v:it,inE:it.inE().outV().path().toList().toList(),outE:it.outE().inV().path().toList()]}       
==>{v=v[1], inE=[], outE=[[v[1], e[9][1-created->3], v[3]], [v[1], e[7][1-knows->2], v[2]], [v[1], e[8][1-knows->4], v[4]]]}

I provided these answers using TinkerPop 2.x as that looked like what you were using as judged from the syntax. TinkerPop 3.x is now available and if you are just getting started, you should take a look at the latest that has to offer:

http://tinkerpop.incubator.apache.org/

Under 3.0 syntax you might do something like this:

gremlin> g.V().has('name','marko').as('a').bothE().bothV().where(neq('a')).path()
==>[v[1], e[9][1-created->3], v[3]]
==>[v[1], e[7][1-knows->2], v[2]]
==>[v[1], e[8][1-knows->4], v[4]]

I know that you wanted to know what the direction of the edge in the output but that's easy enough to detect on analysis of the path.

UPDATE: Here's the above query written with Daniel's suggestion of otherV usage:

gremlin> g.V().has('name','marko').bothE().otherV().path()
==>[v[1], e[9][1-created->3], v[3]]
==>[v[1], e[7][1-knows->2], v[2]]
==>[v[1], e[8][1-knows->4], v[4]]

To see the data from this you can use by() to pick apart each Path object - The extension to the above query applies valueMap to each piece of each Path:

gremlin> g.V().has('name','marko').bothE().otherV().path().by(__.valueMap(true)) 
==>[{label=person, name=[marko], id=1, age=[29]}, {label=created, weight=0.4, id=9}, {label=software, name=[lop], id=3, lang=[java]}]
==>[{label=person, name=[marko], id=1, age=[29]}, {label=knows, weight=0.5, id=7}, {label=person, name=[vadas], id=2, age=[27]}]
==>[{label=person, name=[marko], id=1, age=[29]}, {label=knows, weight=1.0, id=8}, {label=person, name=[josh], id=4, age=[32]}]

Question:

Consider the above graph. I would like a gremlin query that returns all nodes that have multiple edges between them as shown in the graph.

this graph was obtained using neo4j cypher query: MATCH (d:dest)-[r]-(n:cust) WITH d,n, count(r) as popular RETURN d, n ORDER BY popular desc LIMIT 5

for example: between RITUPRAKA... and Asia there are 8 multiple edges hence the query has returned the 2 nodes along with the edges, similarly for other nodes.

Note: the graph has other nodes with only a single edge between them, these nodes will not be returned.

I would like same thing in gremlin.

I have used given below query g.V().as('out').out().as('in').select('out','in').groupCount().unfold().filter(select(values).is(gt(1))).select(keys)

it is displaying out:v[1234],in:v[3456] .....

but instead of displaying Ids of the node I want to display values of the node like out:ICIC1234,in:HDFC234

I have modified query as g.V().values("name").as('out').out().as('in').values("name").select('out','in'). groupCount().unfold().filter(select(values).is(gt(1))).select(keys)

but it is showing the error like classcastException, each vertex to be traversed use indexes for fast iteration


Answer:

Your graph doesn't seem to indicate bi-directional edges are possible so I will answer with that assumption in mind. Here's a simple sample graph - please consider including one on future questions as it makes it much easier than pictures and textual descriptions for those reading your question to understand and to get started writing a Gremlin traversal to help you:

g.addV().property(id,'a').as('a').
  addV().property(id,'b').as('b').
  addV().property(id,'c').as('c').
  addE('knows').from('a').to('b').
  addE('knows').from('a').to('b').
  addE('knows').from('a').to('c').iterate()

So you can see that vertex "a" has two outgoing edges to "b" and one outgoing edge to "c", thus we should get the "a b" vertex pair. One way to get this is with:

gremlin> g.V().as('out').out().as('in').
......1>   select('out','in').
......2>   groupCount().
......3>   unfold().
......4>   filter(select(values).is(gt(1))).
......5>   select(keys)
==>[out:v[a],in:v[b]]

The above traversal uses groupCount() to count the number of times the "out" and "in" labelled vertices show up (i.e. the number of edges between them). It uses unfold() to iterate through the Map of <Vertex Pairs,Count> (or more literally <List<Vertex>,Long>) and filter out those that have a count greater than 1 (i.e. multiple edges). The final select(keys) drops the "count" as it is not needed anymore (i.e. we just need the keys which hold the vertex pairs for the result).

Perhaps another way to go is with this method:

gremlin> g.V().filter(outE()).
......1>   project('out','in').
......2>     by().
......3>     by(out().
......4>        groupCount().
......5>        unfold().
......6>        filter(select(values).is(gt(1))).
......7>        select(keys)).
......8>   select(values)
==>[v[a],v[b]]

This approach with project() forgoes the heavier memory requirements for a big groupCount() over the whole graph in favor of building a smaller Map over an individual Vertex that becomes eligible for garbage collection at the end of the by() (or essentially per initial vertex processed).

Question:

I have a Neo4j cypher query that looks like this:

MATCH (b:VertexType1)-[e1]-(a:VertexType2)-[e2]-(c:VertexType1)-[e3]-(d)

What this translates to in english (I think) is:

"Find me a chain of vertices 'b','a','c','d' of type 'VertexType1', 'VertexType2', 'VertexType1' and 'VertexTypeAny' in that order connected by any kind of edges 'e1','e2' and 'e3'"

What's the equivalent of this using OrientDB and gremlin in java?

It seems like if I would want to start with :

for(Vertex a : orientGraph.getVerticesOfClass("VertexType2")){

}

and then start my gremlin code with vertex 'a' followed by a 'both' so that I spread out from vertex 'a' until I confirm / deny that a is connected in the way that I want.

In the end I want to have all the vertices and edges in Java so that I can do some adding / removing of edges and vertices, so I'd have:

OrientVertex a;
OrientVertex b;
OrientVertex c;
OrientVertex d;
OrientEdge e1;
OrientEdge e2;
OrientEdge e3;

Is this possible with gremlin in java?


Answer:

This is the gremlin query you are looking for:

 g.V().has('@class', T.eq, 'VertexType1').as('b').bothE().as('e1').bothV().as('a').has('@class', T.eq, 'VertexType2').bothE().as('e2').bothV().as('c').has('@class', T.eq, 'VertexType1').bothE().as('e3').bothV().path

Question:

Suppose we have a graph like this.

(User)-[:KNOWS]->(Friend)

I want to count all outgoing relationship from User and group them by user, then add some condition to filter. (Like more than 10 Knows) This is what I did,

g.V().hasLabel("Friend").in("KNOWS").hasLabel("User").groupCount().next()

This is returning a map, so I can add the condition to filter the results. My question is , do we have any efficient alternative way to do this ?


Answer:

I am not sure if I understood your question correctly, but it sounds like you just want to filter all users based on their number of outgoing edges with the label knows. In that case you can directly start at the User vertices and filter them based on the number of their KNOWS edges instead of doing a groupCount:

g.V().hasLabel('User').where(outE('KNOWS').count().is(gt(10)))

See the result of a slightly modified version of this traversal to work with the modern graph in GremlinBin: http://gremlinbin.com/bin/view/58dceb63ad0f7

Until now I ignored any performance constrains. But as Paul Jackson mentioned in his comment it is not efficient to execute such a query in OLTP mode like this. Neo4j will probably iterate over all vertices, check whether they have the label User and then count their KNOWS edges.

You basically have two options to speed this up:

  1. As Paul Jackson suggested: Add the edge count as a property to the vertices, pre-compute it and then index this property or
  2. Use something like Spark-Gremlin if you really want to compute the edge count on the fly.

Question:

My application has filters in English and I need to translate these filters into Gremlin query. Each filter consists of three parts:

  1. Type of vertex
  2. Label of outgoing edge from vertex in #1
  3. Name of incoming vertex from edge in #2

Any of the part can take the string "any", which signifies any type, label or name can be included in the result. Using the Modern toy graph as example, I have the following two filters:

  1. person -> created -> any
  2. person -> knows -> vadas

The result of the evaluation of the above two filters should be:

  1. marko -> created -> lop
  2. marko -> knows -> vadas

While the following two filters:

  1. person -> any -> josh
  2. person -> created -> lop

Should evaluate to the following edges:

  1. marko -> knows -> josh
  2. marko -> created -> lop

The query I come up with the closest result to the desired results above is:

g.E().and(outV().outE().has(label, "created"), outV().outE().has(label, "knows").inV().has("name", "vadas"), outV().has(label, "person"))

The problem with the above query is that it returns all three edges going out from marko, not just two desired edges. How can I improve my query to return only the two edges as described above?


Answer:

This solution takes the approach of separating the filters from the traversals that return the results.

gremlin> Gremlin.version()
==>3.3.3
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().
......1>     and(
......2>         outE('created'),
......3>         out('knows').has('name', 'vadas')
......4>     ).
......5>     union(
......6>         outE('created').inV(),
......7>         outE('knows').inV().has('name', 'vadas')
......8>     ).
......9>     path().by('name').by(label)
==>[marko,created,lop]
==>[marko,knows,vadas]
gremlin> g.V().
......1>     and(
......2>         out().has('name', 'josh'),
......3>         out('created').has('name', 'lop')
......4>     ).
......5>     union(
......6>         outE().inV().has('name', 'josh'),
......7>         outE('created').inV().has('name', 'lop')
......8>     ).
......9>     path().by('name').by(label)
==>[marko,knows,josh]
==>[marko,created,lop]