In this section, we demonstrate how simple it is to analyze web logs using Apache Spark. We'll show how to load a Resilient Distributed Dataset (RDD) of access log lines and use Spark tranformations and actions to compute some statistics for web server monitoring. In the process, we'll introduce the Spark SQL and the Spark Streaming libraries.
In this explanation, the code snippets are in Java 8. However, there is also sample code in Java 6, Scala, and Python included in this directory. In those folders are README's for instructions on how to build and run those examples, and the necessary build files with all the required dependencies.
This chapter covers the following topics: