Serdar Yegulalp
Senior Writer

Google BigQuery provides insight into Stack Overflow discussion data

news analysis
Dec 16, 20163 mins

Snapshots of Stack Overflow's discussion database are now available on Google BigQuery, offering new ways to learn what developers really talk about -- and want

idc big data scaled

Software development discussion site Stack Overflow has started offering quarterly snapshots of its question-and-answer database through Google’s BigQuery.

Stack Exchange, parent company for Stack Overflow and its sister sites, has previously made its data available to researchers throught its online data explorer. But now researchers with a Google Cloud Platform account can plug directly into the data set using Google’s data exploration tools, which have fewer limitations than Stack Overflow’s.

If you have a Google Cloud account, you can log in and begin exploring the data directly from a SQL-style web interface. Results from queries can be exported to CSV or JSON, saved to other tables in Google BigQuery, or exported to Google Sheets. BigQuery also comes with a REST API, so it can be used with third-party visualization tools or software stacks.

Stack Overflow’s question-and-answer format is popular with developers seeking quick solutions to common problems. Though it has a reputation for being insular and unwelcoming, it’s  widely trafficked, and many of its highest-voted answers are widely circulated as great explainers. For example, a popular question about why processing a sorted array is faster than working with an unsorted one not only gives a detailed technical answer, but also serves as great explainer for the concept of branch prediction failure.

One possible application for Stack Overflow’s data, with or without BigQuery’s tool set, is sentiment analysis of topics and discussions taking place on Stack Overflow–in other words, getting broad hints about developers’ feelings about a technology.

If discussions about a language are paired with discussions about an IDE for that language, those threads could be parsed for details about what people are (or aren’t) doing most often with that pairing. Thus, you could figure out what developers might need but aren’t yet asking for.

Stack Overflow’s yearly surveys of its developers provide a similar snapshot of its audience’s mindsets: what languages are popular or how developers classify themselves. But such surveys are self-conscious and self-reporting, and they’re limited to the categories devised for them. Discussions on the site could provide more open-ended, direct, and detailed data about what developers like, hate, look for, and struggle with.

Note that this data set comes from Stack Overflow, and not from any of the other IT-related Stack Exchange sites, such as Server Fault (for IT admins) or Super User (for “computer enthusiasts and power users”). If these data sets go online through Google BigQuery as well, they could open up possibilities for even larger and more sophisticated analyses across multiple IT disciplines.

Serdar Yegulalp

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

More from this author