Databricks Spark Reference Applications

Exporting Data out of Spark

This section contains methods for exporting data out of Spark into systems. First, you'll have to figure out if your output data is small (meaning can fit on memory on one machine) or large (too big to fit into memory on one machine). Consult these two sections based on your use case.

  • Small Datasets - If you have a small dataset, you can call an action on this dataset to retrieve objects in memory on the driver program, and then write those objects out any way you want.
  • Large Datasets - For a large dataset, it's important to remember that this dataset is too large to fit in memory on the driver program. In that case, you can either call Spark to write the data to files directly from the Spark workers or you can implement your own custom solution.