Data Skeptic podcast

Data Skeptic

The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.

The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.

 

#558

The Mystery Behind Large Graphs

Our guest in this episode is David Tench, a Grace Hopper postdoctoral fellow at Lawrence Berkeley National Labs, who specializes in scalable graph algorithms and compression techniques to tackle massive datasets. In this episode, we will learn how his techniques enable real-time analysis of large datasets, such as particle tracking in physics experiments or social network analysis, by reducing storage requirements while preserving critical structural properties. David also challenges the common belief that giant graphs are sparse by pointing to a potential bias: Maybe because of the challenges that exist in analyzing large dense graphs, we only see datasets of sparse graphs? The truth is out there… David encourages you to reach out to him if you have a large scale graph application that you don't currently have the capacity to deal with using your current methods and your current hardware. He promises to "look for the hammer that might help you with your nail". ... Read more

10 Jan 2025

47 MINS

47:47

10 Jan 2025


#557

Customizing a Graph Solution

In this episode, Dave Bechberger, principal Graph Architect at AWS and author of "Graph Databases in Action", brings deep insights into the field of graph databases and their applications. Together we delve into specific scenarios in which Graph Databases provide unique solutions, such as in the fraud industry, and learn how to optimize our DB for questions around connections, such as "How are these entities related?" or "What patterns of interaction indicate anomalies?" This discussion sheds light on when organizations should consider adopting graph databases, particularly for cases that require scalable analysis of highly interconnected data and provides practical insights into leveraging graph databases for performance improvements in tasks that traditional relational databases struggle with. ... Read more

16 Dec 2024

38 MINS

38:07

16 Dec 2024


#556

Graph Transformations

In this episode, Adam Machowczyk, a PhD student at the University of Leicester, specializes in graph rewriting and its intersection with machine learning, particularly Graph Neural Networks. Adam explains how graph rewriting provides a formalized method to modify graphs using rule-based transformations, allowing for tasks like graph completion, attribute prediction, and structural evolution. Bridging the worlds of graph rewriting and machine learning, Adam's work aspire to  open new possibilities for creating adaptive, scalable models capable of solving challenges that traditional methods struggle with, such as handling heterogeneous graphs or incorporating incremental updates efficiently. Real-life applications discussed include using graph transformations to improve recommender systems in social networks, molecular research in chemistry, and enhancing IoT network analysis. ... Read more

09 Dec 2024

32 MINS

32:48

09 Dec 2024


#555

Networks for AB Testing

In this episode, the data scientist Wentao Su shares his experience in AB testing on social media platforms like LinkedIn and TikTok. We talk about how network science can enhance AB testing by accounting for complex social interactions, especially in environments where users are both viewers and content creators. These interactions might cause a "spillover effect" meaning a possible influence across experimental groups, which can distort results. To mitigate this effect, our guest presents heuristics and algorithms they developed ("one-degree label propagation”) to allow for good results on big data with minimal running time and so optimize user experience and advertiser performance in social media platforms. ... Read more

25 Nov 2024

36 MINS

36:42

25 Nov 2024


#554

Lessons from eGamer Networks

Alex Bisberg, a PhD candidate at the University of Southern California, specializes in network science and game analytics, with a focus on understanding social and competitive success in multiplayer online games. In this episode, listeners can expect to learn from a network perspective about players interactions and patterns of behavior. Through his research on games, Alex sheds light on how network analysis and statistical tests might explain positive contagious behaviors, such as generosity, and explore the dynamics of collaboration and competition in gaming environments. These insights offer valuable lessons not only for game developers in enhancing player experience, engagement and retention, but also for anyone interested in understanding the ways that virtual interactions shape social networks and behavior. ... Read more

18 Nov 2024

37 MINS

37:52

18 Nov 2024


#553

Github Collaboration Network

In this episode we discuss the GitHub Collaboration Network with Behnaz Moradi-Jamei, assistant professor at James Madison University.  As a network scientist, Behnaz created and analyzed a network of about 700,000 contributors to Github's repository.  The network of collaborators on GitHub was created by identifying developers (nodes) and linking them with edges based on shared contributions to the same repositories. This means that if two developers contributed to the same project, an edge (connection) was formed between them, representing a collaborative relationship network consisting of 32 million such connections. By using algorithms for Community Detection, Behnaz's analysis reveals insights into how developer communities form, function, and evolve, that can be used as guidance for OSS community managers. ... Read more

11 Nov 2024

42 MINS

42:24

11 Nov 2024


#552

Graphs and ML for Robotics

We are joined by Abhishek Paudel, a PhD Student at George Mason University with a research focus on robotics, machine learning, and planning under uncertainty, using graph-based methods to enhance robot behavior. He explains how graph-based approaches can model environments, capture spatial relationships, and provide a framework for integrating multiple levels of planning and decision-making. ... Read more

04 Nov 2024

41 MINS

41:59

04 Nov 2024


#551

Graphs for HPC and LLMs

We are joined by Maciej Besta, a senior researcher of sparse graph computations and large language models at the Scalable Parallel Computing Lab (SPCL). In this episode, we explore the intersection of graph theory and high-performance computing (HPC), Graph Neural Networks (GNNs) and LLMs. ... Read more

29 Oct 2024

52 MINS

52:08

29 Oct 2024


#550

Graph Databases and AI

In this episode, we sit down with Yuanyuan Tian, a principal scientist manager at Microsoft Gray Systems Lab, to discuss the evolving role of graph databases in various industries such as fraud detection in finance and insurance, security, healthcare, and supply chain optimization.  ... Read more

21 Oct 2024

35 MINS

35:58

21 Oct 2024


#549

Network Analysis in Practice

Our new season "Graphs and Networks" begins here!  We are joined by new co-host Asaf Shapira, a network analysis consultant and the podcaster of NETfrix – the network science podcast. Kyle and Asaf discuss ideas to cover in the season and explore Asaf's work in the field. ... Read more

14 Oct 2024

29 MINS

29:37

14 Oct 2024


#548

Animal Intelligence Final Exam

Join us for our capstone episode on the Animal Intelligence season. We recap what we loved, what we learned, and things we wish we had gotten to spend more time on. This is a great episode to see how the podcast is produced. Now that the season is ending, our current co-host, Becky, is moving to emeritus status. In this last installment we got to spend a little more time getting to know Becky and where her work will take her after this. Did Data Skeptic inspire her to learn more about machine learning? Tune in and find out.  ... Read more

07 Oct 2024

30 MINS

30:18

07 Oct 2024


#547

Process Mining with LLMs

David Obembe, a recent University of Tartu graduate, discussed his Masters thesis on integrating LLMs with process mining tools. He explained how process mining uses event logs to create maps that identify inefficiencies in business processes. David shared his research on LLMs' potential to enhance process mining, including experiments evaluating their performance and future improvements using Retrieval Augmented Generation (RAG). ... Read more

24 Sep 2024

26 MINS

26:24

24 Sep 2024


#546

Open Animal Tracks

Our guest today is Risa Shinoda, a PhD student at Kyoto University Agricultural Systems Engineering Lab, where she applies computer vision techniques. She talked about the OpenAnimalTracks dataset and what it was used for. The dataset helps researchers predict animal footprint. She also discussed how she built a model for predicting tracks of animals. She shared the algorithms used and the accuracy they achieved. She also discussed further improvement opportunities for the model. ... Read more

17 Sep 2024

22 MINS

22:45

17 Sep 2024


#545

Bird Distribution Modeling with Satbird

This episode features an interview with Mélisande Teng, a PhD candidate at Université de Montréal. Her research lies in the intersection of remote sensing and computer vision for biodiversity monitoring. ... Read more

10 Sep 2024

39 MINS

39:31

10 Sep 2024


#544

Ant Encounters

In this interview with author Deborah Gordon, Kyle asks questions about the mechanisms at work in an ant colony and what ants might teach us about how to build artificial intelligence. Ants are surprisingly adaptive creatures whose behavior emerges from their complex interactions. Aspects of network theory and the statistical nature of ant behavior are just some of the interesting details you'll get in this episode.   ... Read more

26 Aug 2024

31 MINS

31:26

26 Aug 2024


#543

Computing Toolbox

This season it’s become clear that computing skills are vital for working in the natural sciences. In this episode, we were fortunate to speak with Madlen Wilmes, co-author of the book "Computing Skills for Biologists: A Toolbox". We discussed the book and why it’s a great resource for students and teachers. In addition to the book, Madlen shared her experience and advice on transitioning from academia to an industry career and how data analytic skills transfer to jobs that your professionals might not always consider. Join us and learn more about the book and careers using transferable skills. ... Read more

19 Aug 2024

38 MINS

38:44

19 Aug 2024


#542

Biodiversity Monitoring

In this episode, we talked shop with Hager Radi about her biodiversity monitoring work. While biodiversity modeling may sound simple, count organisms and mark their location, there is a lot more to it than that! Incomplete and biased data can make estimations hard. There are also many species with very few observations in the wild. Using machine learning and remote sensing data, scientists can build models that predict species distributions with limited data. Listen in and hear about Hager’s work tackling these challenges and the tools she has built. ... Read more

14 Aug 2024

32 MINS

32:20

14 Aug 2024


#541

Hacking the Colony

Today, Ashay Aswale and Tony Lopez shared their work on swarm robotics and what they have learned from ants. Robotic swarms must solve the same problems that eusocial insects do. What if your pheromone trail goes cold? What if you’re getting bad information from a bad-actor within the swarm? Answering these questions can help tackle serious robotic challenges. For example, a swarm of robots can lose a few members to accidents and malfunctions, but a large robot cannot. Additionally, a swarm could be host to many castes like an ant colony. Specialization with redundancy built in seems like a win-win! Tune in and hear more about this fascinating topic. ... Read more

08 Aug 2024

41 MINS

41:03

08 Aug 2024