Hot questions for Using Cassandra in mongodb

Question:

I am using Apache (Kafka-Storm-Cassandra) for real time processing.The problem I am facing is that I can't use aggregation queries on Cassandra directly(Datastax can be used but it is a paid service).Moreover, I also considered using mongodb but It is not good for more and frequent writes. So, I am thinking to do all my calculation in storm and store it into Cassandra and move it on hourly basis or so to mongodb to perform my further analytics.

Is this the right approach or are there any better options to achieve this ?

Also, How can I export data directly from Cassandra to mongodb prefebly using JAVA?

Thanks In Advance !!


Answer:

Without knowing your full requirement, amount of inserts/updates one cannot predict is it a good or bad approach. Mongo is less preferable for heavy writes but it can support quite a good no. of inserts. So important thing is how many writes you have per unit time and based on that you can take the decision.

I have seen Mongo taking upto 1000-2000 writes per sec with avg of 4-5ms on server class machines. Of course Cassandra beats it by margin but if you need to perform any aggregation then Mongo has better framework and capabilities.

For export and import, flat csv can be used. Cassandra can export data to csv and MongoDB can import data from csv with export/import options.

Check MongoImport and for exporting from cassandra, example could be,

copy employee (emp_id, dept, designation, emp_name, salary)
 to 'employee.csv';

Question:

I have a program which exports the data directly from Cassandra to mongodb.It works fine But, It doesn't copies the counter type column from Cassandra to mongodb. It leaves it as blank.

public class Export

{
static String keyspace = "db1";
static String table = "results";
static String host = "localhost";

@SuppressWarnings("deprecation")
static Mongo mongo = new Mongo("localhost", 27017);
@SuppressWarnings("deprecation")
static DB db = mongo.getDB("mydb");

static DBCollection collection = db.getCollection("results");

public static void main( String[] args ) throws IOException
{

    Cluster.Builder clusterBuilder = Cluster.builder()
     .addContactPoints(host);
     Cluster cluster = clusterBuilder.build();
     Session session = cluster.connect(keyspace);

     Statement stmt = new SimpleStatement("SELECT * FROM " + table);
     stmt.setFetchSize(1);
     ResultSet rs = session.execute(stmt);
     Iterator<Row> iter = rs.iterator();

     ArrayList<String> colName = new ArrayList<String>();
     Row row1 = iter.next();
     if (row1 != null)
     {
         for (Definition key1 : row1.getColumnDefinitions().asList())
         {      
                 String val = key1.getName();
                 colName.add(val);
         }

     }

     while (!rs.isFullyFetched()) 
     {
         rs.fetchMoreResults();
         Row row = iter.next();
         if (row != null)
         {   BasicDBObject document = new BasicDBObject();
             ArrayList<String> ele = new ArrayList<String>();
             for (Definition key : row.getColumnDefinitions().asList())
             {  
                 String val = myGetValue(key, row);
                 ele.add(val);
             }

             for (int i = 0; i < ele.size() && i < colName.size(); i++) {
                     document.put(colName.get(i), ele.get(i));
                }
                     collection.insert(document);
         }
     }

        session.close();
        cluster.close();

 }

    public static String myGetValue(Definition key, Row row)
     {
        String str = "";

        if (key != null)
        {
            String col = key.getName();

            try
            {
                if (key.getType() == DataType.cdouble())
                {
                    str = new Double(row.getDouble(col)).toString();
                }
                else if (key.getType() == DataType.cint())
                {
                    str = new Integer(row.getInt(col)).toString();
                }
                else if (key.getType() == DataType.uuid())
                {
                    str = row.getUUID(col).toString();
                }
                else if (key.getType() == DataType.cfloat())
                {
                    str = new Float(row.getFloat(col)).toString();
                }
                else if (key.getType() == DataType.counter())
                {
                    str = new Float(row.getFloat(col)).toString();
                }

                else if (key.getType() == DataType.timestamp())
                {
                    str = row.getDate(col).toString();

                    SimpleDateFormat fmt = new SimpleDateFormat("yyyy-MM-dd HH:mm:ssZ");
                    str = fmt.format(row.getDate(col));
                }
                else
                {
                    str = row.getString(col);
                }
            } catch (Exception e)
                {
                    str = "";
                }
         }

                return str;
       }

    }

And data which gets exported to mongodb is :

{
"_id" : ObjectId("558c0202209c02284d30df05"),
"term" : "gamma red eye tennis dampener",
"year" : "2015",
"month" : "05",
"day" : "29",
"hour" : "09",
"dayofyear" : "176",
"weekofyear" : "26",
"productcount" : "1",
"count" : ""
}
{
"_id" : ObjectId("558c0202209c02284d30df06"),
"term" : "headtie",
"year" : "2015",
"month" : "06",
"day" : "01",
"hour" : "05",
"dayofyear" : "176",
"weekofyear" : "26",
"productcount" : "29",
"count" : ""
}
{
"_id" : ObjectId("558c0202209c02284d30df07"),
"term" : "dryed roller court",
"year" : "2015",
"month" : "06",
"day" : "08",
"hour" : "15",
"dayofyear" : "176",
"weekofyear" : "26",
"productcount" : "362",
"count" : ""
 }

I know what the problem is.The function below is not inserting the value of counter into mongodb since the val is defined as string which conflicts with the counter Data type of Cassandra.

while (!rs.isFullyFetched()) 
     {
         rs.fetchMoreResults();
         Row row = iter.next();
         if (row != null)
         {   BasicDBObject document = new BasicDBObject();
             ArrayList<String> ele = new ArrayList<String>();
             for (Definition key : row.getColumnDefinitions().asList())
             {  
                 String val = myGetValue(key, row);
                 System.out.println(val);
                 ele.add(val);
             }

             for (int i = 0; i < ele.size() && i < colName.size(); i++) {
                     document.put(colName.get(i), ele.get(i));
                }
                     collection.insert(document);
         }
     }

Can anyone please suggest the required change i need to do in the function?


Answer:

You should use getLong() instead of getFloat() to retrieve counter value, i.e. change code:

else if (key.getType() == DataType.counter())
{
     str = new Float(row.getFloat(col)).toString();
}

to

else if (key.getType() == DataType.counter())
{
     str = String.valueOf(row.getLong(col));
}

BTW pay attention on the fact that you do not have to create instance of wrapper type Long (as you did int all your cases). Use String.valueOf() to create string representation of your value.

Question:

I am trying to compare and contrast MongoDB vs Cassandra. Our project is Java based. Some of the differences i found out are

Cassandra is highly available and partition supportive vs MongoDb is consistent and highly available(not so good with partitions).

MongoDb is document based.Cassandra gives more flexibility in terms of modelling the data and also storing JSON like structures directly into it.

But another difference someone told me is that Cassandra is Java stack based while MongoDB is Javascript based.

From a third party users(developers) perspective, how does the stack matter? Considering that i work on a Java project, would a product build on Java stack provide me additional benefits? If so what are those?


Answer:

This isn't really about Cassandra or MongoDB but rather about the maturity of the languages the systems are written in and the languages of the various supporting APIs.

Cassandra itself is written in Java whilst mongo is in cpp, when thinking about their ecosystems (or as you call it their stack) and the various langs that play a part you need to just think back to what makes a particular programming language advantageous. Below is my highly minimalistic take on the matter since there are many books and blogs covering this exact topic.

  • Popularity and community support. Hipster languages are really cool, until you don't understand what's going on and there's no one to help. Both c++ and java are very mature languages with a big user base. Both systems have various APIs that are implemented in popular languages.
  • Efficiency - I'm not going to get into a which lang is faster or more feature rich argument but we can safely say both dbs are again using very efficient languages that are continuously getting better.Plus at the end of the day, if the developer doesn't do their job properly, a lang's performance might be their last concern when it comes to optimisations.

At the end of the day Cassandra is well known for its massive scale [1] and Mongo has been pointed out to have scaling out issues, but this isn't because C++ has problems (lookup the magic that Facebook does with c++ and php) or because Java is some king of amazing big-scale language. It's just to do with how the systems have been implemented.

[1] From https://cassandra.apache.org/ (There was also a presentation by Apple at the 2014 C* summit)

One of the largest production deployments is Apple's, with over 75,000 nodes storing over 10 PB of data. Other large Cassandra installations include Netflix (2,500 nodes, 420 TB, over 1 trillion requests per day), Chinese search engine Easou (270 nodes, 300 TB, over 800 million reqests per day), and eBay (over 100 nodes, 250 TB).