An opensource tool for large-scale data processing.
Apache Spark is an opensource tool for large-scale data processing. Its unified analytic engine provides easy-to-use API in Java, Scala, Python, and R. It supports tool sets such as Spark SQL for SQL and structured data processing, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for incremental computation and stream processing.