InfoWorld NEWS

Big Data

Big Data | News, how-tos, features, reviews, and videos

Google tests BigQuery feature to generate SQL queries from English

The ability to write parts of SQL queries in natural language will help developers speed up their work, analysts say.

By Anirban Ghoshal

Jan 16, 20264 mins

Big DataNatural Language ProcessingSQL

6 incredibly hyped software trends that failed to deliver

By Bill Doerrfeld

Jan 5, 202613 mins

Big DataBlockchainTechnology Industry

Why data contracts need Apache Kafka and Apache Flink

By Matthew OKeefe

Dec 2, 202511 mins

AnalyticsBig DataData Management

Getting the enterprise data layer unstuck for AI

Nov 26, 20259 mins

AnalyticsBig DataGenerative AI

Was data mesh just a fad?

By Nidhin Ponon

Nov 3, 20256 mins

AnalyticsBig DataData Architecture

Why observability needs Apache Iceberg

By Jacob Leverich

Oct 2, 20256 mins

Big DataBusiness IntelligenceData Architecture

Using Cosmos DB in Microsoft Fabric

By Simon Bisson

Aug 28, 20257 mins

Big DataCloud-NativeDatabases

Snowflake brings analytics workloads into its cloud with Snowpark Connect for Apache Spark

By Anirban Ghoshal

Jul 29, 20253 mins

AnalyticsData ManagementData Warehousing

Real-time analytics with StarTree Cloud and Apache Pinot

By Martin Heller

Jun 3, 202516 mins

Big DataDatabasesSaaS

Articles

Critical deserialization bug in Apache Parquet allows RCE

Successful exploitation could allow attackers to steal data, install malware, or take full control over affected big data systems.

By Shweta Sharma

Apr 4, 2025 1 min

AnalyticsBig DataVulnerabilities

Understanding unstructured data in the context of AI

What’s the best way to store, search, and analyze content not based on their technical characteristics but on their meaning?

Dec 3, 2024 8 mins

Big DataDatabasesGenerative AI

What is Apache Spark? The big data platform that crushed Hadoop

Fast, flexible, and developer-friendly, Apache Spark is the leading platform for large-scale SQL, batch processing, stream processing, and machine learning.

Apr 3, 2024 11 mins

Machine LearningOpen SourceSQL

BI meets data science in Microsoft Fabric

Microsoft’s cloud-hosted data lake and lakehouse platform gains new data science tools and opens up Power BI datasets to Python, R, and SparkSQL.

By Simon Bisson

Oct 19, 2023 6 mins

Data ScienceMicrosoft AzurePython

Building an analytics architecture for unstructured data and multimodal AI

Discover best practices that allow data pipelines to scale and support both structured and unstructured data.

By Ganesh Kumar Gella, Sr. Director, Engineering, Google BigQuery Generative AI Initiatives

Jun 11, 2025 5 mins

Artificial IntelligenceBig Data

A deep dive into caching in Presto

Understand the caching mechanisms for the popular distributed SQL engine and how to use them to improve query speed and efficiency.

By Beinan Wang and Hope Wang

Sep 19, 2023 9 mins

AnalyticsDatabasesSQL

AWS Glue upgrades Spark engines, backs Ray framework

Serverless data integration service in the Amazon cloud also adds support for built-in Pandas APIs and the Apache Hudi, Apache Iceberg, and Delta Lake formats.

Nov 29, 2022 2 mins

Amazon Web ServicesData IntegrationPython

Databricks adds data governance, marketplace features

The data marketplace and other features are expected to accelerate data engineering tasks with an option for data monetization down the road, Databricks said.

By Anirban Ghoshal

Jun 28, 2022 5 mins

AnalyticsData GovernanceDatabases

Databricks open sources its Delta Lake data lakehouse

Databricks is open sourcing Delta Lake to counter criticism from rivals and take on Apache Iceberg as well as data warehouse products from Snowflake, Starburst, Dremio, Google Cloud, AWS, Oracle and HPE.

By Anirban Ghoshal

Jun 28, 2022 4 mins

AnalyticsData WarehousingDatabases

Microsoft brings .NET dev to Apache Spark

.NET for Apache Spark 1.0 provides high-performance .NET APIs to Apache Spark including Spark SQL, Spark Streaming, and MLlib

Oct 29, 2020 2 mins

Machine LearningMicrosoft .NETOpen Source

Azure Databricks previews parallelized Photon query engine

Microsoft and Databricks say the vectorized query engine written in C++ accelerates Apache Spark workloads by up to 20x

Sep 28, 2020 2 mins

AnalyticsCloud ComputingMicrosoft Azure

Why you should use Presto for ad hoc analytics

A federated SQL query execution engine created at Facebook, Presto brings interactive querying to all of your data — no matter where it resides

By Ashish Tadose

Sep 16, 2020 9 mins

AnalyticsData ManagementSQL

Rakuten frees itself of Hadoop investment in two years

The U.S. arm of the Japanese e-commerce giant has moved away from Hadoop in a bid to cut hardware costs and ease the management of its estate

Jun 23, 2020 5 mins

AnalyticsCloud ComputingData Management