Eric Knorr
Contributing writer

Interview: Matei Zaharia on Spark and machine learning

feature
Sep 7, 20182 mins

Zaharia expounds on the reasons Spark has become the big data framework of choice and why he thinks his company’s melding of Spark and machine learning delivers unique value

True Technologist video series [teaser]
Credit: Infoworld / Getty Images

Sometimes things happen in the lab that you wouldn’t expect. Back in 2009, when Matei Zaharia was a grad student at UC Berkeley’s AMPLab, he started a project called Spark to serve as a pilot workload for Mesos, an open-source project to manage clusters. Since then, Mesos has faded, while Spark has become the widely adopted successor to the Hadoop distributed processing framework—faster, smarter, and, unlike its predecessor, a robust platform for streaming analytics and machine learning.

Today Zaharia is CTO of Databricks, a cloud-based provider of Spark and machine learning as a service, though he still keeps one foot in academia as an assistant professor of computer science at Stanford. One testament to his ingenuity: According to Databricks CEO Ali Ghodsi, Zaharia once informed him that he had an interest in biology and was taking a class. Not long after a project emerged that he created in collaboration with AMPLap colleagues: the Scalable Nucleotide Alignment Program (SNAP), a sequence aligner that is three to 20 times faster than competing sequencing solutions.

In this interview with IDG’s Eric Knorr, Zaharia expounds on the reasons Spark has become the big data framework of choice and, among other topics, why he thinks his company’s melding of Spark and machine learning delivers unique value. Zaharia, who hold a PhD in computer science from UC Berkeley and an ACM Doctoral Dissertation Award for his research on large-scale computer systems, serves as vice president of the Apache Spark project and has worked on key Spark components, including MLlib, Spark Streaming, and Spark SQL. Few have contributed so much in so short a time to the advancement of big data analytics and machine learning.

Eric Knorr

Eric Knorr is a freelance writer, editor, and content strategist. Previously he was the Editor in Chief of Foundry’s enterprise websites: CIO, Computerworld, CSO, InfoWorld, and Network World. A technology journalist since the start of the PC era, he has developed content to serve the needs of IT professionals since the turn of the 21st century. He is the former Editor of PC World magazine, the creator of the best-selling The PC Bible, a founding editor of CNET, and the author of hundreds of articles to inform and support IT leaders and those who build, evaluate, and sustain technology for business. Eric has received Neal, ASBPE, and Computer Press Awards for journalistic excellence. He graduated from the University of Wisconsin, Madison with a BA in English.

More from this author