Databricks Spark Reference Applications

Exporting Large Datasets

If you are exporting a very large dataset, you can't call collect() or a similar action to read all the data from the RDD onto the single driver program - that could trigger out of memory problems. Instead, you have to be careful about saving a large RDD. See these two sections for more information.