Hot questions for Using Cassandra in jdbc

Question:

I am using map type column in Cassandra 2.1.7. My requirement is that I have to preserve the order of values which I am inserting in Cassandra for later retrieval. For that I have used LinkedHashMap in Java which serves its purpose.

Now, while debugging I found the order is preserved till Java is handling the key-value pairs but while executing "insert into" CQL command, Cassandra rearranges the map keys in ascending order without asking me, which I don't like :-)

I have searched for providing sort options while defining map types in Cassandra but got nothing as map types got limited flexibility in Cassandra.

Could you please suggest any workaround to meet the above expectations !


Answer:

You are correct that the Cassandra Map (and Set) type does not preserve order.

aploetz@cqlsh:stackoverflow2> CREATE TABLE maptest 
    (key text PRIMARY KEY, values map<text,text>);
aploetz@cqlsh:stackoverflow2> INSERT INTO maptest (key, values) 
    VALUES ('key1',{'4':'Four','1':'One','2':'Two'});
aploetz@cqlsh:stackoverflow2> SELECT * FROM maptest ;

 key  | values
------+---------------------------------------
 key1 | {'1': 'One', '2': 'Two', '4': 'Four'}

(1 rows)

The List type however, handles this just fine:

aploetz@cqlsh:stackoverflow2> CREATE TABLE listtest 
    (key text PRIMARY KEY, values list<text>);
aploetz@cqlsh:stackoverflow2> INSERT INTO listtest (key, values) 
    VALUES ('key2',['4','1','2']);
aploetz@cqlsh:stackoverflow2> SELECT * FROM listtest ;

 key  | values
------+-----------------
 key2 | ['4', '1', '2']

(1 rows)

Perhaps (as a workaround) something you could try would be to include a column of the List type along with your Map. You could store the keys in the List in the order of your choosing. When you read your row back out, you could iterate through the List (which would be in your chosen order), and use the List values to determine which key/value pairs to pull from the Map.

Question:

I'm trying to set up cassandra liquibase. https://github.com/liquibase/liquibase-cassandra

Executed code from terminal:

java -jar  ~/.m2/repository/org/liquibase/liquibase-core/3.4.1/liquibase-core-3.4.1.jar  \
  --driver=org.apache.cassandra.cql.jdbc.CassandraDriver  \
  --classpath="/.../liquibase-cassandra/target/liquibase-cassandra-1.0-SNAPSHOT.jar:/.../liquibase-cassandra/target/cassandra-jdbc-1.2.5.jar"  \
  --changeLogFile=/.../conf/liquibase.xml  \
  --url=jdbc:cassandra://host1--host2--host3:9160/keyspace1 \
  --username=cassandra \
  --password=cassandra \
  update

Error: Unexpected error running Liquibase: Could not initialize class org.apache.cassandra.cql.jdbc.CassandraDriver

I'm not sure what is the right syntax to use liquibase cassandra extension. I cloned liquibase-cassandra project, created a jar with mvn and assumed that org.apache.cassandra.cql.jdbc.CassandraDriver mentioned in liquibase.ext.cassandra.database.CassandraDatabase class is the jdbc driver I need. I don't have any experience with liquibase, I'm still in college and my knowledge is still quite vague in some areas.


Answer:

As error suggests, I had problem with jdbc driver. I couldn't connect to Cassandra database using SQLWorkbenchJ and that driver either. I'm not sure if the problem was with my url syntax.

I tried using different jdbc driver. com.github.cassandra.jdbc.CassandraDriver. Jar: cassandra-jdbc-driver-0.6.4-shaded.jar and url for jar: https://github.com/zhicwu/cassandra-jdbc-driver/releases.

Url syntax for this specific driver: url=jdbc:c*://ipaddress:port.

java -jar  /.../.m2/repository/org/liquibase/liquibase-core/3.4.1/liquibase-core-3.4.1.jar \
  --driver=com.github.cassandra.jdbc.CassandraDriver \      
  --classpath="/.../cassandra-jdbc-driver-0.6.4-shaded.jar:/.../liquibase-cassandra/target/liquibase-cassandra-1.0-SNAPSHOT.jar:/.../log4j-1.2.17.jar" \     
  --changeLogFile=/.../conf/liquibase.xml \       
  --url=jdbc:c*://127.0.0.1:9042 \      
  --username=cassandra \      
  --password=cassandra \   
update

This worked. :)

Question:

I am trying to read data from cassandra using spring batch, where I have implemented ItemReader, ItemProcessor, and ItemWriter. I am able to read the data , process it and write back the data to the same table. I am creating xml file to execute the job:

xml:

    <job id="LoadStatusIndicator" job-repository="jobRepository" restartable="false">
        <step id="LoadStatus" next="">
            <tasklet>
                <chunk reader="StatusReader" processor="ItemProcessor" writer="ItemWriter"
                    commit-interval="10" />
            </tasklet>
        </step>
    </job>

    <beans:bean id="ItemWriter" scope="step"
        class="com.batch.writer.ItemWriter">
    </beans:bean>

<beans:bean id="ItemProcessor" scope="step"
        class="com.batch.processor.ItemProcessor">
    </beans:bean>
    <beans:bean id="Reader" scope="step"
        class="com.reader.ItemReader">
        <beans:property name="dataSource" ref="CassandraSource" />

    </beans:bean>

applicationcontext.xml:

<beans:bean id="CassandraSource" parent="DataSourceParent">
<beans:property name="url" value="jdbc:cassandra://${cassandra.hostName}:${cassandra.port}/${cassandra.keyspace}" />
<beans:property name="driverClassName" value="org.apache.cassandra.cql.jdbc.CassandraDriver" />
</beans:bean>

reader class:

   public static final String query = "SELECT * FROM test_1 allow filtering;";

 @Override
    public List<Item> read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException 
    {
    List<Item> results = new ArrayList<Item>();
try {
results = cassandraTemplate.select(query,Item.class);
    } catch (Exception e) {
    e.printStackTrace();
    }
return results;
    }

writer classs:

@Override
public void write(List<? extends Item> item) throws Exception {
    try {
    cassandraTemplate.insert(item);
}catch(Exception e){e.printStackTrace();}

But the problem is the whole job is getting executed multiple times , infact it is not stopping at all. I have to force stop the job execution. I have only 2 rows in the table. I think it is because of the commit-interval defined in xml, but having commit-interval = 10, job executes more than 10 times

According to my understanding, when I run the xml file that means I am running the job only one time, it calls the reader once keeps the data in the run time memory (job repository), calls item processor once (I use list ) and the whole list is inserted at once


Answer:

SOLVED

In reader class I wrote:

if (results.size!=0) return results; else return null;

Question:

The https://github/adejanovski/cassandra-jdbc-wrapper is great but more than a bit outdated. E.g., it has a target of Java 1.6, it's using version 3.0.0 of a key library (datastax) where the final version was 3.8.0 (and the current version is 4.4), etc. It can connect to Cassandra 3.x but is missing some of the latest features.

We've been able to fork the project and update the datastax library to 3.8.0 but I'm stuck at adding the 'duration' data type. There's a table that (seems to) map the wire protocol's protocolId for data types to the wrapper's class that implements it. I've searched both datastax code (which supports this data type) and the cassandra code and been totally stumped finding the correct value since everything uses the enum (yay!) instead of the much more opaque protocolId... but I haven't been able to find where the association is made on the Cassadra server's side.

So two questions. First is there already a fork of this project that adds support for the 'duration' type? (I wish it was possible to do a code search over both a project and its forks.)

Second, what is the protocol id / where is it defined?

At this point I'm almost ready to just add some instrumentation so the "unknown codec" message adds more details. It would work but could be fragile.

Note on JDBC vs CQL: our current code was written several years ago by someone reusing our extensive JDBC infrastructure. It was "good enough" but requires maintained JDBC drivers. It's now clear that we should refactor the code to use CQL but that will take time to write and test and a customer wants a solution now. We found a commercial solution but it's incompatible with our system since we load our jars from a mongo store instead of the filesystem. (Don't ask.) Forking this driver gives us some breathing room.

If it helps this is the table in com.github.adejanovski.cassandra.jdbc:

public enum DataTypeEnum {
    ASCII     (1,  String.class, DataType.Name.ASCII),
    BIGINT    (2,  Long.class, DataType.Name.BIGINT),
    BLOB      (3,  ByteBuffer.class, DataType.Name.BLOB),
    BOOLEAN   (4,  Boolean.class, DataType.Name.BOOLEAN),
    COUNTER   (5,  Long.class, DataType.Name.COUNTER),
    DECIMAL   (6,  BigDecimal.class, DataType.Name.DECIMAL),
    DOUBLE    (7,  Double.class, DataType.Name.DOUBLE),
    FLOAT     (8,  Float.class, DataType.Name.FLOAT),
    INET      (16, InetAddress.class, DataType.Name.INET),
    INT       (9,  Integer.class, DataType.Name.INT),
    TEXT      (10, String.class, DataType.Name.TEXT),
    TIMESTAMP (11, Date.class, DataType.Name.TIMESTAMP),
    UUID      (12, UUID.class, DataType.Name.UUID),
    VARCHAR   (13, String.class, DataType.Name.VARCHAR),
    VARINT    (14, BigInteger.class, DataType.Name.VARINT),
    TIMEUUID  (15, UUID.class, DataType.Name.TIMEUUID),
    LIST      (32, List.class, DataType.Name.LIST),
    SET       (34, Set.class, DataType.Name.SET),
    MAP       (33, Map.class, DataType.Name.MAP),
    UDT       (48, UDTValue.class, DataType.Name.UDT),
    TUPLE     (49, TupleValue.class, DataType.Name.TUPLE),
    CUSTOM    (0,  ByteBuffer.class, DataType.Name.CUSTOM),       
    SMALLINT  (19, Integer.class, DataType.Name.SMALLINT),
    TINYINT   (20, Integer.class, DataType.Name.TINYINT),
    DATE      (17, Date.class, DataType.Name.DATE),
    TIME      (18, Date.class, DataType.Name.TIME),
    DURATION  (???, Duration.class, DataType.Name.DURATION);

    final int protocolId;
    final Class<?> javaType;
    final Name cqlType;

    private static final DataTypeEnum[] nameToIds;
    private static final Map<DataType.Name, DataTypeEnum> cqlDataTypeToDataType;

Answer:

Codes for types, together with their encoding information are defined in CQL protocol specification - Duration type is in the version 5 - this type has identifier 0x15 (21). You can also find all codes in the source code of the Java driver.

Question:

I'm trying to get the OFBiz demo project off the ground with a Cassandra database. After quite some struggle i found this short guide on how to integrate those.

Following that, and using this Cassandra jdbc driver (cassandrajdbc1.1.jar) I get a bunch of exceptions thrown.

When running the project using ant load-demo start I get a lot of: org.ofbiz.entity.GenericEntityException: Cached loader got a known bad class name: org.apache.cassandra.jdbc.EmbeddedDriver

What could the issue be? Am I using the wrong jdbc driver or is it an OFBiz thing?

My entityengine.xml contents can be found here.


Answer:

The error Cached loader got a known bad class name is thrown by OFBiz when the JDBC driver could not be found in the classpath.

In order to make sure that OFBiz finds the JDBC driver please copy the jar into the folder:

framework/entity/lib/jdbc/

Question:

I have records already in cassandra DB,Using Java Class I am retrieving each row , updating with TTL and storing them back to Cassandra DB. after that if I run select query its executing and showing records. but when the TTL time was complete, If I run select query it has to show zero records but its not running select query showing Cassandra Failure during read query at consistency ONE error. For other tables select query working properly but for that table(to which rows I applied TTL) not working.


Answer:

You are using common anti-patterns.

1) You are using batches to load data into two single tables, separately. I don't know if you already own a cluster or you're on your local machine, but this is not the way you load data to a C* cluster, and you are going to stress a lot your C* cluster. You should use batches only when you need to keep two or more tables in sync, and not to load a bunch of records at time. I suggest you the following readings on the topic:

2) You are using synchronous writes to insert your pretty indipendent records into your cluster. You should use asynchronous writes to speed up your data processing.

3) You are using the TTL features in your tables, which per se are not that bad. However, an expired TTL is a tombstone, and that means when you SELECT your query C* will have to read all those tombstones.

4) You bind your prepared statement multiple time:

BoundStatement bound = phonePrepared.bind(macAddress, ...

and that should be

BoundStatement bound = new BoundStatement(phonePrepared).bind(macAddress, ...

in order to use different bound statements. This is not an anti-pattern, this is a problem with your code.

Now, if you run your program multiple times, your tables have a lot of tombstones due to the TTL features, and that means C* is trying hard to read all these in order to find what you wrote "the last time" you successfully run, and it takes so long that the queries times-out.

Just for fun, you can try to increase your timeouts, say 2 minutes, in the SELECT and take a coffee, and in the meantime C* will get your records back.

I don't know what you are trying to achieve, but fast TTLs are your enemies. If you just wanted to refresh your records then try to keep TTLs time high enough so that it doesn't hurt your performances. Or, a probably better solution is to add a new column EXPIRED, "manually" written only when you need to delete a record instead. That depends on your requirements.

Question:

Is it possible get the rows affected in the Update Statement?

public void removeCountry(String id,int version) throws VersionException {
    log.debug("Update row"+id);
    try {
        Update update = QueryBuilder.update("country");
        update.setConsistencyLevel(ConsistencyLevel.ONE);
        update.with(QueryBuilder.set("destroyed", true));
        update.where(QueryBuilder.eq("id",id));
        update.where(QueryBuilder.eq("version",version));
        cassandraOperations.execute(update);
        //How to I know if the row was updated
        int rows = 0;//complete
        if (rows==0) throw new VersionException("Row does not been updated yet);
    } catch (Exception e) {
        log.error(e.getMessage(),e);
    }
}

I am newbie with Cassandra, so I dont know if I can do that like a the JDBC Operation.


Answer:

As I suspected, given the distributed nature of Cassandra with varying levels of consistency, the short answer is NO.

Others have posted similar questions on SO before, for example...

How to know affected rows in Cassandra(CQL)?

Question:

I have a following cassandra table "users" defined in keyspace "chemdb"

CREATE TABLE users (
  userid text PRIMARY KEY,
  passwd text,
  fname text,
  lname text,
  creationdate timestamp,  
  isactive text  
);   

I created a java class JDBConnec.java to connect jsp with cassandra using jdbc:

package dbclasses;

    import java.sql.*;
    import java.lang.*;
    import java.io.*;   

     public class JDBConnec {

           public Statement stmt=null;     
           public ResultSet rs=null;   
           public Connection con=null;

       public PreparedStatement pstmt = null;


        public JDBConnec()
        {

        try
        {
            Class.forName("org.apache.cassandra.cql.jdbc.CassandraDriver");
            con = DriverManager.getConnection("jdbc:cassandra://127.0.0.1:9160/chemdb","okkkkk","12345");   
        }
        catch(Exception ex)
        {
            System.out.println(ex.getMessage());         
        }
       }
    }

The above class is succesfully compiling.

However in my jsp file, when I use the following code to select record from database by using Preparedstatements, I get the below mentioned error:

JDBConnec db = new JDBConnec();
String query = "select * from users where userid=?";
db.pstmt = db.con.prepareStatement(query); 
db.pstmt.setString(1, "henry");
db.rs = db.pstmt.executeQuery();

It gives following error

InvalidRequestException(why:Undefined name userid in where clause ('userid EQ ?'))

Does cassandra jdbc driver support preparedstatments in jsp? Any thoughts on why this error is occuring. Thanks in advance.

Cassandra version: 2.0.8 Java: 7 Apache tomcat: latest downloaded yesterday


Answer:

I was able to figure out problem. I just changed the query from

String query = "select * from users where userid=?";

to

String query = "select userid, passwd from chemdb.users where userid=?";    

and it worked fine.

Thanks!