There are many flavors of time series data. Some can be windowed in the stream, others can not be windowed in the stream because queries are not by time slice but by specific year,month,day,hour. Spark Streaming lets you do both. Cassandra in particular is excellent for time series data, working with raw data, transformations with Spark to aggregate data, and so forth. In some cases, using Spark with Cassandra (and the right data model) reduces the number of Spark transformations necessary on your data because Cassandra does that for you in its cluster.
When using Apache Spark & Apache Cassandra together, it is best practice to co-locate Spark and Cassandra nodes for data-locality and decreased network calls, resulting in overall reduced latency.
Download and install the latest Cassandra release
Start Cassandra.
./apache-cassandra-{latest.version}/bin/cassandra -f
Note: If you get an error - you may need to prepend with sudo, or chown /var/lib/cassandra.
Run the setup cql scripts to create the schema and populate the weather stations table. Go to the timeseries data folder and start a cqlsh sell there:
% cd /path/to/reference-apps/timeseries/scala/data
% /path/to/apache-cassandra-{latest.version}/bin/cqlsh
You should see:
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh {latest.version} | Cassandra {latest.version} | CQL spec {latest.version} | Native protocol {latest.version}]
Use HELP for help.
cqlsh>
Then run the script:
cqlsh> source 'create-timeseries.cql'; cqlsh> quit;
See this Github repo to find out more about the weather stations table data.
% cd /path/to/reference-apps/timeseries/scala
% sbt weather/run
You should see: Multiple main classes detected, select one to run:
[1] com.databricks.apps.WeatherApp
[2] com.databricks.apps.WeatherClientApp
Select option 1 to open the weather app.See this github repo to find out more about the Time Series Data Model