Hot questions for Using Cassandra in timestamp

Question:

Recently I have started working on cassandra and I have some issues dealing with cassandra timestamp using cql and Java.

Below is my sample cassandra table schema.

CREATE TABLE emp (
    empid int,
    create_date timestamp,
    deptid int,
    PRIMARY KEY (empid, create_date)
)

Here are my questions below:

1) Actually I need only date(I am not worried about time), In my table schema I have used timestamp datatype, Is there any datatype like date to store only date instead timestamp.

2) If there is no alternative to timestamp datatype below is how I am inserting record into table manually

Insert into emp(empId, deptId, create_date) values(1,2,'2016-03-15 00:00:00+0000');

When I am trying to query to retrieve record using cql as below

select * from emp where create_date='2016-03-15' and empid=1;

It's not returning any record so I have use below query

 select * from emp where create_date='2016-03-15 00:00:00+0000' and empid=1;

above query returned record, So how to query so that I need to get records only with date.

3) I am using datastax 2.1 JAVA API to communicate to cassandra, Below is my code how I am retrieving timestamp column from Java.

row.getDate("create_date")

getDate method is returning java.util.date (later versions returning com.datastax.driver.core.LocalDate)

When I try to get date from code its returning one day less suppose if record in my table is 2016-03-15 00:00:00+0000, but from my code it shows something like 2016-03-14 EDT (one day less is this because of timezone issue), If its timezone issue should I set timezone in my Java code while retrieving

4) If I want to query and get record from my Java code based on empId and create_date how to make exact pattern like 2016-03-15 00:00:00+0000 in Java code, As it is not retrieving if I provide only date in where condition or is there any better way in retrieving using Java code.


Answer:

Getting back to your example:

Insert into emp(empId, deptId, create_date) values(1,2,'2016-03-15 00:00:00+0000');

The statement above will create a timestamp value for 2016-03-15 00:00 UTC.

select * from emp where create_date='2016-03-15' and empid=1;

The select will match against a timestamp created from the given date in your local timezone. What you have to do here is to query specifically by UTC, using create_date='2016-03-15Z'.

As you're also having timezone related issues in Java the easiest option would be to simply use a string representation of the date.

Question:

I prefer to use the timestamp as one of the column in Cassandra (which I decided to use as Clustering Key). which is the right way to store the column as timestamp in Cassandra?

(i.e) Is it fine to use the 'milliseconds' (Example : 1513078338560) directly like below?

INSERT INTO testdata (nodeIp, totalCapacity, physicalUsage, readIOPS, readBW, writeIOPS, writeBW, writeLatency, flashMode, timestamp) VALUES('172.30.56.60',1, 1,1,1,1,1,1,'yes',1513078338560); 

or to use the dateof(now());

INSERT INTO testdata (nodeIp, totalCapacity, physicalUsage, readIOPS, readBW, writeIOPS, writeBW, writeLatency, flashMode, timestamp) VALUES('172.30.56.60',1, 1,1,1,1,1,1,'yes',dateof(now()));

which is faster and recommended way to use for timestamp based queries in Cassandra?

NOTE : I know internally it stores as milliseconds, I used the 'SELECT timestamp, blobAsBigint(timestampAsBlob(timestamp)) FROM'

Thanks, Harry


Answer:

The dateof is deprecated in Cassandra >= 2.2... Instead it's better to use function toTimestamp, like this: toTimestamp(now()). When you selecting, you can also use the toUnixTimestamp function if you want to get the timestamp as long:

cqlsh:test> CREATE TABLE test_times (a int, b timestamp, PRIMARY KEY (a,b));
cqlsh:test> INSERT INTO test_times (a,b) VALUES (1, toTimestamp(now()));
cqlsh:test> SELECT toUnixTimestamp(b) FROM test_times;

 system.tounixtimestamp(b)
---------------------------
         1513086032267

(1 rows)

cqlsh:test> SELECT b FROM test_times;

 b
---------------------------------
 2017-12-12 13:40:32.267000+0000

(1 rows)

Regarding the performance - there are different considerations:

  1. If you already have the timestamp as number, then you can use it instead of calling function
  2. It's better to use prepared statements instead of "raw inserts" - in this case Cassandra won't need to transfer full query, but only data, and also don't need to parse statement every time.

The pseudo code will look as following (Java-like).

PreparedStatement prepared = session.prepare(
    "insert into your_table (field1, field2) values (?, ?)");
while(true) {
    session.execute(prepared.bind(value1, value2));
}

Question:

I'm having the below table in the keyspace myks

CREATE table IF NOT EXISTS myks.users (
    user_name text,
    email text,
    created_at timestamp,
    updated_at timestamp,
    PRIMARY KEY (user_name)
);

Below is the model class

@Table(value = "users")
public @Data class Users{
    @PrimaryKey
    @Column("user_name")
    @CassandraType(type = DataType.Name.TEXT)
    private String user_name;

    @Column("email")
    @CassandraType(type = DataType.Name.TEXT)
    private String email;

    @Column("created_at")
    @CassandraType(type = DataType.Name.TIMESTAMP)
    private Timestamp created_at;

    @Column("updated_at")
    @CassandraType(type = DataType.Name.TIMESTAMP)
    private Timestamp updated_at;
}

Repository interface

@Repository
public interface UsersRepository extends CrudRepository<Users, String> {
}

Inserted the below values into the table

Users users = new Users();
LocalDateTime ldt_created = LocalDateTime.now();
LocalDateTime ldt_updated = ldt_created.plus(1000, ChronoUnit.MILLIS);
Timestamp ts_created = Timestamp.valueOf(ldt_created);
Timestamp ts_updated = Timestamp.valueOf(ldt_updated);
users.setUser_name(krishna");
users.setEmail("krishna@gmail.com");
users.setCreated_at(ts_created);
users.setUpdated_at(ts_updated);
usersRepository.save(users);

It got saved in the table but while retrieving the data it is throwing the below exception

No converter found capable of converting from type [java.util.Date] to type [java.sql.Timestamp]

Answer:

CQL's timestamp type is mapped into java.util.Date Java type, so you need to use it instead of Timestamp. See CQL to Java mapping table for this & other types.

You can also use so-called optional codecs (an additional dependency) to map timestamp into other Java types, such as, Instant. See documentation for more information.

Question:

I am using Spark Cassandra connector in Java to insert data. My data has a timeuuid and timestamp field. I have the following table:

CREATE TABLE abc.log (
time_uuid timeuuid,
session_id text,
event text,
time timestamp,
sequence int,
PRIMARY KEY (customer)
);

I am using this code to insert:

JavaRDD<EventLog> rdd = sc.parallelize(eventLogs);
    javaFunctions(rdd)
    .writerBuilder("dove", "event_log", mapToRow(EventLog.class))
    .saveToCassandra();

how do I insert the timeuuid and timestamp fields? Using normal insert I would just use the now() function, how do I do that here?


Answer:

You may use com.datastax.driver.core.utils.UUIDs for this. The UUIDsTest utilizes the class like this to create a TimeUUID:

 UUID uuid = UUIDs.timeBased();

Note that UUID is java.util.UUID. Note sure if you need it for your use case, but after that you can retrieve timestamp of the UUID by calling UUIDs.unixTimestamp(uuid);.

As for your timestamp, you pass an instance of java.util.Date, as proposed on the docs.

Question:

I am trying to do a SELECT in java into my Cassandra database. I am trying it with:

Statement statement = QueryBuilder.select()
       .all()
       .from(keySpaceName, tableName)                      
       .where((QueryBuilder.eq("asset", categoryPos)))
       .and(QueryBuilder.gte("date", "2006-06-08 00:00:00"))
       .limit(10)
       .allowFiltering()
       .enableTracing();

The CQL query (already working) is

SELECT * FROM pair_tick.price WHERE asset = 1 and date>='2006-06-08 15:30:00' LIMIT 10;

When I am trying to execute this query, I get this error:

The RuntimeException could not be mapped to a response, re-throwing to the HTTP container
com.datastax.driver.core.exceptions.InvalidQueryException: Expected 8 or 0 byte long for date (10)
 at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:50)
 at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
 at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:244)
 at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:55)
 at com.nexow.services.HistoricService.getHistoric(HistoricService.java:86)
 at com.nexow.HistoricController.getHistoric(HistoricController.java:38)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
 at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
 at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
 at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
 at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
 at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)

How can avoid this error? Thank you in advance!


Answer:

I believe the issue here is the way the driver is converting your date value.

As stated in the DataStax driver 3.0 (http://docs.datastax.com/en/developer/java-driver/3.0/java-driver/reference/javaClass2Cql3Datatypes.html?scroll=cql-java-types) there is a new class to handle the Cassandra date type, it is the com.datastax.driver.core.LocalDate class, so you should use that class, as the example bellow:

 // 1. convert your string date to LocalDate
 SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
 LocalDate localDate = LocalDate.fromMillisSinceEpoch(sdf.parse("2006-06-08").getTime());

 // 2. then pass it to your query
 Statement statement = QueryBuilder.select()
   .all()
   .from(keySpaceName, tableName)                      
   .where((QueryBuilder.eq("asset", categoryPos)))
   .and(QueryBuilder.gte("date", localDate)
   .limit(10)
   .allowFiltering()
   .enableTracing();

The driver encapsulates them in the LocalDate because it is cumbersome to work with raw date literals (specially since Java does not have unsigned integers), the LocalDate class aims to hide all that complexity behind utility methods to convert LocalDate instances to and from integers representing the number of days since the Epoch.

Question:

I'm working with Cassandra using their Java API for interacting with it. I have entity classes that my mapper object uses to peform CRUD operations. I needed a custom query to retrieve all my Purchase objects from a specific timespan. However when I run my query below, I never get anything in return. Fun fact though, after more extensive testing, the query does work on my colleague's Mac running Cassandra 3.11.2. My machine is running Windows and Cassandra 3.9.0.

String query = String.format("SELECT * FROM purchase WHERE timestamp >=  %s AND timestamp <= %s ALLOW FILTERING;", startTimestamp, endTimestamp);
                purchases = session.execute(query);

I have also tried using the IN operation, however I can't find any information on what it actually does and though it doesn't throw any exception, it won't find any purchases at all:

String query = String.format("SELECT * FROM purchase WHERE timestamp IN (%s , %s);", startTimestamp, endTimestamp);

Answer:

I finally managed to solve. Turns out, if you store something with the Cassandra timestamp data type, you cannot select the item with a long anymore. You have to use a date format on a string. Solved it like this:

   startDate = simpleDateFormat.format(Long.valueOf(startTimestamp));
   endDate = simpleDateFormat.format(Long.valueOf(endTimestamp));

   query = String.format("SELECT * FROM purchase WHERE timestamp >= '%s' AND timestamp <= '%s' ALLOW FILTERING;", startDate, endDate);

Question:

I want to convert string date to Cassandra time stamp format

Example date String inputDate="20170525"


Answer:

You need to convert your string to Date. Java Date type maps cassandra timestamp type

DateFormat dateFormat = new SimpleDateFormat("yyyyMMdd");
Date date = dateFormat.parse("20170525");

Now you have the date you can insert or query with it in prepared statement

Note : You don't have any timezone. So default timezone will be used. If you want to specify the timezone use dateFormat.setTimeZone(TimeZone zone) method

Question:

I have a table like given below.

CREATE TEST(
 HOURLYTIME TIMESTAMP,
 FULLTIME TIMESTAMP,
 DATA TEXT,
 PRIMARY KEY(HOURLYTIME,FULLTIME)
)

I inserted the record (2014-12-12 00:00:00,2014-12-12 00:00:01,'Hello World')

I would like to search based on date time range in HOURLYTIME field which holds hourly records.When i tried with token() like

select * from TEST where token(HOURLYTIME)=token('2014-12-12')

to get all the records for that date it returns only for one hour record i.e for

 2014-12-12 **00:00:00**

If i add date range

select * from TEST where token(HOURLYTIME)>=token('2014-12-12') AND token(HOURLYTIME)<=token('2014-12-14');

It gives the error : More than one restriction was found for the start bound.

How to resolve this issue.

I am able to scan using FULLTIME but i need to provide ALLOW FILTERING which will scan whole records & inefficient.


Answer:

To make range queries you need to have this column as clustering column.

In this case it will be efficient, cause clustering column are stored sorted. If you want to search the data, you need to specify partition key.

So as an example, where I use device_id as a partition key:

CREATE TABLE IF NOT EXISTS mykeyspace.device_data (
 DEVICE_ID text,
 HOURLYTIME TIMESTAMP,
 FULLTIME TIMESTAMP,
 DATA TEXT,
 PRIMARY KEY (DEVICE_ID, HOURLYTIME, FULLTIME)
);

INSERT INTO mykeyspace.device_data (device_id, hourlytime, fulltime, data)
values('Spam machine', '2014-12-12 00:01:00','2014-12-12 00:00:01','Hello World1');

INSERT INTO mykeyspace.device_data (device_id, hourlytime, fulltime, data)
values('Spam machine', '2014-12-12 00:02:00','2014-12-12 00:00:02','Hello World2');

INSERT INTO mykeyspace.device_data (device_id, hourlytime, fulltime, data)
values('Spam machine', '2014-12-12 00:03:00','2014-12-12 00:00:03','Hello World3');

-- Effective range query
SELECT * FROM mykeyspace.device_data
WHERE device_id = 'Spam machine'
    AND hourlytime > '2014-12-12 00:00:00'
    AND hourlytime < '2014-12-12 00:02:00';

Or another example, where I partition data by day (which will cause spread data across cluster nicely), and perform range queries:

CREATE TABLE IF NOT EXISTS mykeyspace.day_data (
     DAYTIME timestamp,
     HOURLYTIME TIMESTAMP,
     FULLTIME TIMESTAMP,
     DATA TEXT,
     PRIMARY KEY (DAYTIME, HOURLYTIME, FULLTIME)
);

INSERT INTO mykeyspace.day_data (DAYTIME, hourlytime, fulltime, data)
values('2014-12-12', '2014-12-12 00:01:00','2014-12-12 00:00:01','Hello World1');

INSERT INTO mykeyspace.day_data (DAYTIME, hourlytime, fulltime, data)
values('2014-12-12', '2014-12-12 00:02:00','2014-12-12 00:00:02','Hello World2');

INSERT INTO mykeyspace.day_data (DAYTIME, hourlytime, fulltime, data)
values('2014-12-12', '2014-12-12 00:03:00','2014-12-12 00:00:03','Hello World3');

SELECT * FROM mykeyspace.day_data
WHERE daytime = '2014-12-12'
    AND hourlytime > '2014-12-12 00:00:00'
    AND hourlytime < '2014-12-12 00:02:00';

There is very useful article about timeseries data on PlanetCassandra

Question:

In cassandra, one of my column type is timeuuid whose java type equivalent as per documentation is java.util.UUID

I have my input time as java.sql.Timestamp and in order to set it in entity class, I want to convert java.sql.Timestamp -> java.util.UUID.


Answer:

First of all, the documentation you mentioned refers to a very old version of the DataStax Java driver, you should upgrade to version 3.0.0.

That said, the DataStax Java driver comes with an utility class, com.datastax.driver.core.utils.UUIDs. It has three useful methods for dealing with time-based UUIDs:

  1. timeBased(); creates a new time-based UUID for the current timestamp; timestamps generated by this method are suitable to be inserted in Cassandra as timeuuid values.
  2. startOf(long) and endOf(long): these methods provide respectively the lowest and highest possible UUIDs for a given timestamp. They are suitable mainly for querying a range of timeuuid values.

In your case however, it looks like you are migrating data from an RDBMS with fixed timestamps. I suggest that you consistently use startOf(long) to migrate your data:

java.sql.Timestamp tp = ...;
UUID uuid = UUIDs.startOf(tp.getTime());

In any case do not attempt to create such UUIDs manually, as the way Cassandra sorts timeuuid values is not compatible with UUID.comapreTo().

Question:

Lets say I've an one-column-table like this:

CREATE TABLE test (
id int,
time timestamp,
PRIMARY KEY(id)
);

Now I insert data per Json:

INSERT INTO test JSON '{"id":1,"time":"2018-01-12T15:06:02.753Z"}'

The result would look like this:

id        time
1         2018-01-12 15:06:02.753+0000

Now I select the entry again as Json:

SELECT JSON * from test where id=1;

The resulting Json would look like this:

{  
   "id":1,
   "time":"2018-01-12 15:06:02.753+0000"
}

If I now want to parse the field "time" in the json above to an Instant, it fails because of a whitespace:

Exception in thread "main" java.time.format.DateTimeParseException: Text '2018-01-12 10:23:00.461Z' could not be parsed at index 10

I could replace the whitespace-Charakter with a 'T'-Delimiter, but is there an better way to parse an Instant?


Answer:

Datastax Java driver supports table to object mapping out-of-the box. You don't have to write the code by yourself.

Timestamps are automatically mapped to java.util.Date objects by Java driver. See some related documentation below:

Question:

I've loaded all rows from a table into Dataset using spark session in java. I want to get the count of rows in each month.

I tried to create new column of month by using withColumn() so that I can later use group_by month and count(). But I am not able to get month from timestamp. How can I find the count in each month from above dataset?

My sample Dataset will look like this,


Answer:

Considering the way you have explained your problem: I tried to create new column of month by using withColumn() so that I can later use group_by month and count(). But I am not able to get month from timestamp.

You can you the static month() function provided in org.apache.spark.sql.functions package to find the month, as below:

myDataset.withColumn("month", month(col("date"))).groupBy(col("month")).count().show();

where col("date") will have the timestamp (in below case : "yyyy-mm-dd HH:mm:ss" ).

Input used:

1,2019-04-07 07:24:14,0,8

2,2019-05-07 07:24:14,0,10

5,2019-06-07 07:24:14,0,6

3,2019-04-07 07:24:14,0,7

This will give you an output as below:

+-----+-----+

|month|count|

+-----+-----+

| 6| 1|

| 5| 1|

| 4| 2|

+-----+-----+

Hope this helps!!

Question:

I have a table in Cassandra which I am trying to iterate and extract stuff from it. I am using datastax java driver 3.1.0.

CREATE TABLE test (
  client_id int,
  process_id text,
  last_modified_date timestamp,
  PRIMARY KEY ((client_id), process_id)
)

Below is the code:

private List<MetaHolder> getMetaHolder() {
    List<MetaHolder> metaHolder = new ArrayList<>();
    String sql = "select * from test where client_id=1";
    try {
      BoundStatement bs = Cache.getInstance().getStatement(sql);
      bs.setConsistencyLevel(ConsistencyLevel.QUORUM);

      ResultSet res = session.execute(bs);
      Iterator<Row> rows = res.iterator();
      while (rows.hasNext()) {
        Row row = rows.next();
        String processId = row.getString("PROCESS_ID");
        // this line throws exception
        UUID lastModifiedDate = row.getUUID("LAST_MODIFIED_DATE");
        MetaHolder metadata =
            new MetaHolder(processId, lastModifiedDate);
        metaHolder.add(metadata);
      }
    } catch (Exception ex) {
      LOGGER.logError("error= ", ExceptionUtils.getStackTrace(ex));
    }
    return metaHolder;
}

Below is the exception:

error= com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not found for requested operation: [timestamp <-> java.util.UUID]

Is there any way I can use JodaTime by creating a codec if possible instead of using UUID here in my code?


Answer:

Change the below code

UUID lastModifiedDate = row.getUUID("LAST_MODIFIED_DATE");

to

Date lastModifiedDate = row.getTimestamp("LAST_MODIFIED_DATE");

You can get joda DateTime from Date

DateTime lastModifiedDateTime = new DateTime(lastModifiedDate);

Question:

when you put an entry in the DB,Cassandra creates for each value in the column a timestamp. Is there a possibility to query the timestamp which is created with each column in Cassandra?


Answer:

writetime(col) is the CQL function you are looking for.