Paul Krill
Editor at Large

Microsoft brings .NET dev to Apache Spark

news
Oct 29, 20202 mins

.NET for Apache Spark 1.0 provides high-performance .NET APIs to Apache Spark including Spark SQL, Spark Streaming, and MLlib

spiral sparks / steelwork / coil / spring
Credit: 463529

Microsoft and the .NET Foundation have released version 1.0 of .NET for Apache Spark, an open source package that brings .NET development to the Spark analytics engine for large-scale data processing.

Announced October 27, .NET for Apache Spark 1.0 has support for .NET applications targeting .NET Standard 2.0 or later. Users can access Spark DataFrame APIs, write Spark SQL, and create user-defined functions UDFs).

The .NET for Apache Spark framework is available on the .NET Foundation’s GitHub page or from NuGet. Other capabilities of .NET for Apache Spark 1.0 include:

  • An API extension framework to add support for additional Spark libraries including Linux Foundation Delta Lake, Microsoft OSS Hyperspace, ML.NET, and Apache Spark MLlib functionality.
  • .NET for Apache Spark programs that are not UDFs show the same speed as Scala and PySpark-based non-UDF applications. If applications include UDFs, .NET for Apache Spark programs are at least as fast as PySpark programs or might be faster.
  • .NET for Apache Spark is built into Azure Synapse and Azure HDInsight. It also can be used in other Apache Spark cloud offerings including Azure Databricks.

The first public version of the project was announced in April 2019. Driving the development of .NET for Apache Spark was increased demand for an easier way to build big data applications instead of having to learn Scala or Python. The project is operated under the .NET Foundation and has been filed as a Spark Project Improvement Proposal to be considered for inclusion in the Apache Spark project directly.

Looking ahead, Microsoft is addressing obstacles including setting up prerequisites and dependencies and finding quality documentation, with examples such as community-contributed “ready-to-run” Docker images and updates to .NET for Apache Spark documentation. Another priority is supporting deployment options including integration with CI/CD devops pipelines and publishing jobs directly from Visual Studio.

Paul Krill

Paul Krill is editor at large at InfoWorld. Paul has been covering computer technology as a news and feature reporter for more than 35 years, including 30 years at InfoWorld. He has specialized in coverage of software development tools and technologies since the 1990s, and he continues to lead InfoWorld’s news coverage of software development platforms including Java and .NET and programming languages including JavaScript, TypeScript, PHP, Python, Ruby, Rust, and Go. Long trusted as a reporter who prioritizes accuracy, integrity, and the best interests of readers, Paul is sought out by technology companies and industry organizations who want to reach InfoWorld’s audience of software developers and other information technology professionals. Paul has won a “Best Technology News Coverage” award from IDG.

More from this author