Hot questions for Using Cassandra in titan

Question:

I am dead new to Titan and when I started researching on it I have got confused as it has plethora of new things under the hood like Gremlin , tinkerpop and rexter etc.

What I want is an example in java which makes use of titan with Cassandra as a back end. I would like to create a graph , store in cassandra , retrieve it back and traverse it. A very simple would also be a lot helpful.

I got a basic example in java that I ran.

    BaseConfiguration baseConfiguration = new BaseConfiguration();
    baseConfiguration.setProperty("storage.backend", "cassandra");
    baseConfiguration.setProperty("storage.hostname", "192.168.3.82");

    TitanGraph titanGraph = TitanFactory.open(baseConfiguration);

     Vertex rash = titanGraph.addVertex(null);
        rash.setProperty("userId", 1);
        rash.setProperty("username", "rash");
        rash.setProperty("firstName", "Rahul");
        rash.setProperty("lastName", "Chaudhary");
        rash.setProperty("birthday", 101);

        Vertex honey = titanGraph.addVertex(null);
        honey.setProperty("userId", 2);
        honey.setProperty("username", "honey");
        honey.setProperty("firstName", "Honey");
        honey.setProperty("lastName", "Anant");
        honey.setProperty("birthday", 201);

        Edge frnd = titanGraph.addEdge(null, rash, honey, "FRIEND");
        frnd.setProperty("since", 2011);

        titanGraph.shutdown();

So when I run this , I observed the cassandra logs and it created a keyspace named titan and the following tables :

  • titan_ids
  • edgestore
  • graphindex
  • system_properties
  • systemlog
  • txlog
  • edgestore_lock_
  • graphindex_lock_
  • system_properties_lock_

I don't know what these tables are used for and how are they storing the data.

After running the program , which creates a graph of 2 vertex's and a edge between them. I queried the tables and found some hexadecimal values in each of the table.

I have the following questions :

  1. How is the graph being stored in cassandra ?

  2. Now that I have this graph say 'x' stored in cassandra . Say I created another graph 'y' and store it. How will be able to retrieve and traverse any particular graph ? Because in a normal cql query you know the table and the column's to query . How will I Identify 'x' and 'y' separately.

  3. Could any one help posting an example code in java to create a graph using some sample csv data. Store in Cassandra and some traversal examples of the same graph . Will be a lot helpful as there no such example available which is understandable.


Answer:

You have a few questions in there so I will try to answer as much as I can.

Question 1:

If you interested in how the data is persisted into the DB then you should take a look here it describes the titan data model in detail. I am not sure how well it translates to the commit logs and tables but it's a start.

Question 2:

So the reason you ended up with a keysoace called titan is because you did not provide your own. Usually when creating different graphs that have nothing to do with each other you would store those graphs in different keyspaces. This is done as follows:

BaseConfiguration baseConfiguration = new BaseConfiguration();
baseConfiguration.setProperty("storage.backend", "cassandra");
baseConfiguration.setProperty("storage.hostname", "192.168.3.82");

//First Graph
baseConfiguration.setProperty("storage.cassandra.keyspace", "keyspace1");
TitanGraph titanGraph1 = TitanFactory.open(baseConfiguration);

//Second Graph
baseConfiguration.setProperty("storage.cassandra.keyspace", "keyspace2");
TitanGraph titanGraph2 = TitanFactory.open(baseConfiguration);

Of course you can create multiple disconnected graphs in the same keysoace as outlined here

Question 3:

That is a bit of a loaded question asking for a sample CSV migration. I would say take a step back and ask yourself, what are you trying to model.

Lets say you want to store a list of products and a list of people who bought those products. There are a multitude of ways you could model this but for now lets just say that people and products are vertices and the edges between then represent a purchase:

//Initliase graph
BaseConfiguration baseConfiguration = new BaseConfiguration();
baseConfiguration.setProperty("storage.backend", "cassandra");
baseConfiguration.setProperty("storage.hostname", "192.168.3.82");
baseConfiguration.setProperty("storage.cassandra.keyspace", "mycustomerdata");
TitanGraph graph = TitanFactory.open(baseConfiguration);

//---------------- Adding Data -------------------
//Create some customers
Vertex alice = graph.addVertex("customer");
alice.property("name", "Alice Mc Alice");
alice.property("birthdat", "100000 BC");

Vertex bob = graph.addVertex("customer");
bob.property("name", "Bob Mc Bob");
bob.property("birthdat", "1000 BC");

//Create Some Products
Vertex meat = graph.addVertex("product");
meat.property("name", "Meat");
meat.property("description", "Delicious Meat");

Vertex lettuce = graph.addVertex("product");
lettuce.property("name", "Lettuce");
lettuce.property("description", "Delicious Lettuce which is green");

//Alice Bought some meat:
alice.addEdge("bought", meat);
//Bob Bought some meat and lettuce:
bob.addEdge("bought", meat, lettuce);

//---------------- Querying (aka traversing whcih is what you do in graph dbs) Data -------------------
//Now who has bought meat?
graph.traversal().V().has("name", "meat").in("bought").forEachRemaining(v -> System.out.println(v.value("name")));

//Who are all our customers
graph.traversal().V().hasLabel("customer").forEachRemaining(v -> System.out.println(v.value("name")));

//What products do we have
graph.traversal().V().hasLabel("customer").forEachRemaining(v -> System.out.println(v.value("name")));

The above example is a simple use of Titan. I would reccomend running through the [tinkerpop] documentation so that you can familiarise yourself with using it. At the end of the day you interface with titan via the Tinkerpop API.

I hope this helps you somewhat

Question:

My previous question was about the syntax of the TitanFactory class. Now i wonder how to use it?

For example i can construct RexsterGraph object like the following and it works like a charm.

Graph graph = new RexsterGraph(http://190.188.20.11:8183/graphs/graph");

Now i want to import csv file into the titan. So i need TitanGraph object. I found the following post to do that.

How to import a CSV file into Titan graph database?

And i worte the following code and it gives me error.

Could not find implementation class: com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftStoreManager

    TitanGraph titanGraph = null;
    try {
        titanGraph = TitanFactory
                .open("D:\\TEMP\\titan-cassandra.properties");
    } catch (Exception e) {
        System.err.println(e.getMessage());

        System.out.println("\n");

        System.err.println(e.getStackTrace());
    }

The only thing i need is that i want some code like RexsterGraph example for getting instance of TitanGraph object. What should i do? by the way i run the code on my local but graph is working remote linux machine


Answer:

sample test.csv lines

id:1,name:xxx,age:20,........

id:2,name:yyy,age:21,........

I don't know what is your csv file size but it is small, you can import like this

            String path = "c:\\test.csv";
            Charset encoding = Charset.forName("ISO-8859-1");
            try {
                List<String> lines = Files.readAllLines(Paths.get(path), encoding);
                Graph graph = new RexsterGraph("http://190.188.20.11:8183/graphs/graph");

                for (String line : lines) {
                    Vertex currentNode = graph.addVertex(null);
                    String[] values = line.split(",");
                    for (String value : values) {
                        String[] property = value.split(":");
                        currentNode.setProperty(property[0].toString(), property[1].toString());
                    }
                }

            }

Question:

I am trying use titan version 1.0.0 with a multi Data center deployment, I am using Cassandra 2.1.9 as my backend.

my deployment topology is: C* is setup (4 nodes, divided into 2 DC, each contains 2 RACK)

The current setting is: [?????@????? apps]$ /apps/cassandra/bin/nodetool status

Datacenter: DC2 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN ???.???.125.92 58.51 KB 256 ? d483a0b3-45f7-4a8f-a269-fca19eab08bd RAC2 UN ???.???.125.91 76.41 KB 256 ? b31751cd-03a1-489d-8482-c4d0f66b780f RAC1

Datacenter: DC1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN ???.???.125.89 101.89 KB 256 ? 628e72c6-d068-4217-8205-91fe4bf7abf3 RAC1 UN ???.???.125.90 63.34 KB 256 ? 96b9d87b-e5d4-4bdb-9693-5f8f9889a83c RAC2

I am using a titan client which is part of my java application. the titan configuration I am using:

storage.backend=cassandra storage.hostname=???.???.125.89,???.??.125.90 storage.port=9160 storage.username=cassandra storage.password=cassandra storage.cassandra.read-consistency-level=LOCAL_QUORUM storage.cassandra.write-consistency-level=LOCAL_QUORUM storage.cassandra.replication-strategy-class=org.apache.cassandra.locator.NetworkTopologyStrategy storage.cassandra.replication-strategy-options=DC1,2,DC2,2 cache.db-cache=false cache.db-cache-clean-wait=20 cache.db-cache-time=180000 cache.db-cache-size=0.5

what happens when I start my application is I fail when I try to create indexes, now the application flow is correct because when I work with a single Cassandra node the aplication works fine.

the error i get in the aplication when i run with multi dc is: 2016-03-10T16:46:15.473Z|||main||ASDC-BE||ERROR|||localhost||c.t.t.g.database.StandardTitanGraph|||ActivityType=, Desc= com.thinkaurelius.titan.diskstorage.locking.TemporaryLockingException: Temporary locking failure at com.thinkaurelius.titan.diskstorage.locking.AbstractLocker.writeLock(AbstractLocker.java:295) ~[titan-core-1.0.0.jar:na] at com.thinkaurelius.titan.diskstorage.locking.consistentkey.ExpectedValueCheckingStore.acquireLock(ExpectedValueCheckingStore.java:89) ~[titan-core-1.0.0.jar:na] at com.thinkaurelius.titan.diskstorage.keycolumnvalue.KCVSProxy.acquireLock(KCVSProxy.java:40) ~[titan-core-1.0.0.jar:na] at com.thinkaurelius.titan.diskstorage.BackendTransaction.acquireIndexLock(BackendTransaction.java:240) ~[titan-core-1.0.0.jar:na] at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.prepareCommit(StandardTitanGraph.java:554) ~[titan-core-1.0.0.jar:na] at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.commit(StandardTitanGraph.java:683) ~[titan-core-1.0.0.jar:na] at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.commit(StandardTitanTx.java:1352) [titan-core-1.0.0.jar:na] at com.thinkaurelius.titan.graphdb.database.management.ManagementSystem.commit(ManagementSystem.java:221) [titan-core-1.0.0.jar:na] at com.att.tlv.sdc.be.dao.titan.TitanGraphClient.createVertexIndixes(TitanGraphClient.java:322) [catalog-dao-1604.1.0-SNAPSHOT.jar:na] at com.att.tlv.sdc.be.dao.titan.TitanGraphClient.createIndexesAndDefaults(TitanGraphClient.java:276) [catalog-dao-1604.1.0-SNAPSHOT.jar:na] at com.att.tlv.sdc.be.dao.titan.TitanGraphClient.createGraph(TitanGraphClient.java:244) [catalog-dao-1604.1.0-SNAPSHOT.jar:na] at com.att.tlv.sdc.be.dao.titan.TitanGraphClient.createGraph(TitanGraphClient.java:225) [catalog-dao-1604.1.0-SNAPSHOT.jar:na] at com.att.tlv.sdc.be.dao.titan.TitanGraphClient.createGraph(TitanGraphClient.java:180) [catalog-dao-1604.1.0-SNAPSHOT.jar:na] ..... at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_66] at org.eclipse.jetty.start.Main.invokeMain(Main.java:214) [start.jar:9.3.6.v20151106] at org.eclipse.jetty.start.Main.start(Main.java:457) [start.jar:9.3.6.v20151106] at org.eclipse.jetty.start.Main.main(Main.java:75) [start.jar:9.3.6.v20151106] Caused by: com.thinkaurelius.titan.diskstorage.TemporaryBackendException: Lock write retry count exceeded at com.thinkaurelius.titan.diskstorage.locking.consistentkey.ConsistentKeyLocker.writeSingleLock(ConsistentKeyLocker.java:325) ~[titan-core-1.0.0.jar:na] at com.thinkaurelius.titan.diskstorage.locking.consistentkey.ConsistentKeyLocker.writeSingleLock(ConsistentKeyLocker.java:109) ~[titan-core-1.0.0.jar:na] at com.thinkaurelius.titan.diskstorage.locking.AbstractLocker.writeLock(AbstractLocker.java:290) ~[titan-core-1.0.0.jar:na] ... 108 common frames omitted

does anyone has an idea what I am doing wrong?


Answer:

sadly I found no help here, so I continued it in the titan forum: https://groups.google.com/forum/#!topic/aureliusgraphs/fJYH1de5wBw

Question:

Revision:

I have a stand-alone version of Cassandra. I launch that using the following command:

./cassandra -f

I also have a Java Application that the Titan Graph Library installed. To obtain a TitanGraph object I use the following code:

BaseConfiguration configuration = new BaseConfiguration();
configuration.setProperty("storage.backend", "cassandra");
configuration.setProperty("storage.hostname", "127.0.0.1");
TitanGraph graph = TitanFactory.open(configuration);

After this I can add Vertices/Edges and Query them as well. I did an additional check on the local Cassandra database and can verify there are records being generated and persisted

cqlsh> select count(*) from titan.edgestore;

 count
--------
 185050

(1 rows)

The problem arises when I launch the rexster-server. I am launching this in stand-alone mode using the following command:

./rexster.sh -s -c ../config/rexster.xml

Then I launch the rexster console and load the graph. The issues is that the graph seems to contain no data? I am really not sure what is going on here. There is only 1 instance of Cassandra running.

        (l_(l
(_______( 0 0
(        (-Y-) <woof>
l l-----l l
l l,,   l l,,
opening session [127.0.0.1:8184]
?h for help

rexster[groovy]> ?h
-= Console Specific =-
?<language-name>: jump to engine
?l: list of available languages on Rexster
?b: print available bindings in the session
?r: reset the rexster session
?e <file-name>: execute a script file
?q: quit
?h: displays this message

-= Rexster Context =-
rexster.getGraph(graphName) - gets a Graph instance
   :graphName - [String] - the name of a graph configured within Rexster
rexster.getGraphNames() - gets the set of graph names configured within Rexster
rexster.getVersion() - gets the version of Rexster server

rexster[groovy]> rexster.getGraphNames() 
==>kpdlp
rexster[groovy]> rexster.getGraph('graph')
==>titangraph[cassandrathrift:[127.0.0.1]]
rexster[groovy]> g = rexster.getGraph('graph')
==>titangraph[cassandrathrift:[127.0.0.1]]
rexster[groovy]> g.V.count()
==>0
rexster[groovy]> 

Below is the rexster.xml I am using

  <?xml version="1.0" encoding="UTF-8"?>
<rexster>
    <http>
        <server-port>8182</server-port>
        <server-host>0.0.0.0</server-host>
        <base-uri>http://localhost</base-uri>
        <web-root>public</web-root>
        <character-set>UTF-8</character-set>
        <enable-jmx>false</enable-jmx>
        <enable-doghouse>true</enable-doghouse>
        <max-post-size>2097152</max-post-size>
        <max-header-size>8192</max-header-size>
        <upload-timeout-millis>30000</upload-timeout-millis>
        <thread-pool>
            <worker>
                <core-size>8</core-size>
                <max-size>8</max-size>
            </worker>
            <kernal>
                <core-size>4</core-size>
                <max-size>4</max-size>
            </kernal>
        </thread-pool>
        <io-strategy>leader-follower</io-strategy>
    </http>
    <rexpro>
        <server-port>8184</server-port>
        <server-host>0.0.0.0</server-host>
        <session-max-idle>1790000</session-max-idle>
        <session-check-interval>3000000</session-check-interval>
        <read-buffer>65536</read-buffer>
        <enable-jmx>false</enable-jmx>
        <thread-pool>
            <worker>
                <core-size>8</core-size>
                <max-size>8</max-size>
            </worker>
            <kernal>
                <core-size>4</core-size>
                <max-size>4</max-size>
            </kernal>
        </thread-pool>
        <io-strategy>leader-follower</io-strategy>
    </rexpro>
    <shutdown-port>8183</shutdown-port>
    <shutdown-host>127.0.0.1</shutdown-host>
    <config-check-interval>10000</config-check-interval>
    <script-engines>
        <script-engine>
            <name>gremlin-groovy</name>
            <reset-threshold>-1</reset-threshold>
            <init-scripts>config/init.groovy</init-scripts>
            <imports>com.tinkerpop.rexster.client.*</imports>
            <static-imports>java.lang.Math.PI</static-imports>
        </script-engine>
    </script-engines>
    <security>
        <authentication>
            <type>none</type>
            <configuration>
                <users>
                    <user>
                        <username>rexster</username>
                        <password>rexster</password>
                    </user>
                </users>
            </configuration>
        </authentication>
    </security>
    <metrics>
        <reporter>
            <type>jmx</type>
        </reporter>
        <reporter>
            <type>http</type>
        </reporter>
        <reporter>
            <type>console</type>
            <properties>
                <rates-time-unit>SECONDS</rates-time-unit>
                <duration-time-unit>SECONDS</duration-time-unit>
                <report-period>10</report-period>
                <report-time-unit>MINUTES</report-time-unit>
                <includes>http.rest.*</includes>
                <excludes>http.rest.*.delete</excludes>
            </properties>
        </reporter>
    </metrics>
    <graphs>
<graph>
    <graph-name>graph</graph-name>
    <graph-type>com.thinkaurelius.titan.tinkerpop.rexster.TitanGraphConfiguration</graph-type>
    <graph-location></graph-location>
    <graph-read-only>false</graph-read-only>
    <properties>
          <storage.backend>cassandrathrift</storage.backend>
          <storage.hostname>127.0.0.1</storage.hostname>
    </properties>
    <extensions>
      <allows>
        <allow>tp:gremlin</allow>
      </allows>
    </extensions>
  </graph>
    </graphs>
</rexster>

Answer:

Perhaps there is just some confusion in Rexster's role. Your question was:

My issue is that when I instantiate an TitanGraph using the TitanFactory as seen below there does not seem to be the option to specify the graph name?

Note that using TitanFactory will open a TitanGraph instance that connects directly to cassandra. That has nothing to do with Rexster. If you want to connect to Rexster (which remotely holds a TitanGraph instance given your configuration) then you must do so through REST or RexPro. With the more simple approach for verifying operations being REST, try to curl:

curl http://localhost:8182/graphs

That should return some JSON that contains the name of the TitanGraph instance you configured in the <graph-name> field in rexster.xml. The <graph-name> simply identifies the graph instance in Rexster so that you can uniquely identify it in requests when there are multiple instances hosted in there.