Coding Agents Meet Data Science
The Data Exchange with Ben Lorica Podcast Recap
Published:
Guests: Miki o' Braun, Chaitan Konhi
What Happened
Miki o' Braun, a senior principal applied scientist at Solando, has been using coding agents in data science since September last year. He finds them effective for exploratory analysis, model training, and evaluation but notes they can be too quick to jump to conclusions. Miki integrates Claude code with tools like Open Code and Open Router to manage unvetted data, highlighting the need for human guidance to properly interpret data.
AutoML tools work well with clean data, but coding agents are preferred for unvetted data, where they can infer from column names. However, these agents require context and domain knowledge to be truly effective. Miki mentions the use of Kumo AI's foundation model from Stanford to enhance productivity in building forecasting models.
Chaitan Konhi is developing Rote, a context file system designed to capture operational knowledge and reduce token usage in AI processes. Coding agents, while not specifically built for data science, necessitate process adjustments like code review and testing due to increased productivity. The skill of working with AI tools is now essential, and evaluating AI output is crucial.
Junior programmers face challenges in gaining experience as AI-driven development landscapes evolve. Spec-driven development and iterative, collaborative programming are emerging as key paradigms in AI-enhanced coding. Coding agents can generate large amounts of code quickly, but human evaluation remains necessary for quality assurance.
The reduced cost of rewriting code allows for more experimentation and iteration. Miki discusses Open Claw's development, which involves using agents for managing pull requests and responding to security feedback. AI can assist in building secure code, but security considerations must be integrated from the start.
Miki's side project, 'Talk with Wren', aims to develop conversational fluency in multiple languages, focusing on Japanese. Unlike quiz-based apps like Duolingo, Talk with Wren uses AI models like Gemini or ChatGPT to facilitate conversational practice. It integrates vocabulary lists and flashcard training, assuming users have basic fluency.
Talk with Wren also offers role-playing scenarios to simulate real-life conversations, and Miki is working on improving engagement through better prompt design. The project may soon include an assessment feature to measure language improvement, tested in languages like French with some prior knowledge required.
Miki has experimented with creating a text adventure game called 'Beyond the Bouncer' using an LLM, and has worked on projects in Python and Vue, with plans to use Go for future projects. He advises computer science students to understand software structure and build side projects, emphasizing the importance of domain knowledge for securing a first job.
Key Insights
- Miki o' Braun has found that coding agents, used since September last year, can effectively handle data science tasks but often require human oversight to avoid hasty conclusions.
- AutoML is suitable for clean data, but coding agents are better equipped to manage unvetted data, requiring guidance to infer data properly and ensure accurate outcomes.
- The development of tools like Rote by Chaitan Konhi aims to capture operational knowledge, thereby reducing token usage and optimizing AI processes in data science.
- Junior programmers face challenges in gaining experience due to the AI-driven development landscape, making skills in AI tool evaluation and domain knowledge more critical than ever.