Skill Piper

Send feedback

Exploring The Process And Practice Of Building Better Software Through Code Reviews

Exploring The Process And Practice Of Building Better Software Through Code Reviews

The Python Podcast.__init__ 5 September 2022

Episode Description

Summary

Writing code is only one piece of creating good software. Code reviews are an important step in the process of building applications that are maintainable and sustainable. In this episode On Freund shares his thoughts on the myriad purposes that code reviews serve, as well as exploring some of the patterns and anti-patterns that grow up around a seemingly simple process.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • Your host as usual is Tobias Macey and today I’m interviewing On Freund about the intricacies and importance of code reviews

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by giving us your description of what a code review is?
    • What is the purpose of the code review?
  • At face value a code review appears to be a simple task. What are some of the subtleties that become evident with time and experience?
  • What are some of the ways that code reviews can go wrong?
  • What are some common anti-patterns that get applied to code reviews?
  • What are the elements of code review that are useful to automate?
    • What are some of the risks/bad habits that can result from overdoing automated checks/fixes or over-reliance on those tools in code reviews?
  • identifying who can/should do a review for a piece of code
  • how to use code reviews as a teaching tool for new/junior engineers
  • how to use code reviews for avoiding siloed experience/promoting cross-training
  • PR templates for capturing relevant context
  • What are the most interesting, innovative, or unexpected ways that you have seen code reviews used?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while leading and supporting engineering teams?
  • What are some resources that you recommend for anyone who wants to learn more about code review strategies and how to use them to scale their teams?

Keep In Touch

Picks

Closing Announcements

  • Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. The Machine Learning Podcast helps you go from idea to production with machine learning.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

...see more

More Episodes


Catching Up With Pyre, A Fast Type Checker For Python

Catching Up With Pyre, A Fast Type Checker For Python

Static typing versus dynamic typing is one of the oldest debates in software development. In recent years a number of dynamic languages have worked toward a middle ground by adding support for type hints. Python's type annotations have given rise to an ecosystem of tools that use that type information to validate the correctness of programs and help identify potential bugs. At Instagram they created the Pyre project with a focus on speed to allow for scaling to huge Python projects. In this episode Shannon Zhu discusses how it is implemented, how to use it in your development process, and how it compares to other type checkers in the Python ecosystem.

19 September 2022


Standardizing On Python For All Software Projects At Ascend.io

Standardizing On Python For All Software Projects At Ascend.io

Every software project is subject to a series of decisions and tradeoffs. One of the first decisions to make is which programming language to use. For companies where their product is software, this is a decision that can have significant impact on their overall success. In this episode Sean Knapp discusses the languages that his team at Ascend use for building a service that powers complex and business critical data workflows. He also explains his motivation to standardize on Python for all layers of their system to improve developer productivity.

13 September 2022


Ship With Confidence By Automating Quality Assurance

Ship With Confidence By Automating Quality Assurance

Quality assurance in the software industry has become a shared responsibility in most organizations. Given the rapid pace of development and delivery it can be challenging to ensure that your application is still working the way it's supposed to with each release. In this episode Jonathon Wright discusses the role of quality assurance in modern software teams and how automation can help.

28 August 2022


Remove Roadblocks And Let Your Developers Ship Faster With Self-Serve Infrastructure

Remove Roadblocks And Let Your Developers Ship Faster With Self-Serve Infrastructure

The goal of every software team is to get their code into production without breaking anything. This requires establishing a repeatable process that doesn't introduce unnecessary roadblocks and friction. In this episode Ronak Rahman discusses the challenges that development teams encounter when trying to build and maintain velocity in their work, the role that access to infrastructure plays in that process, and how to build automation and guardrails for everyone to take part in the delivery process.

14 August 2022


The Benefits Of Python And Django For Going From Zero To MVP At Speed

The Benefits Of Python And Django For Going From Zero To MVP At Speed

Every startup begins with an idea, but that won't get you very far without testing the feasibility of that idea. A common practice is to build a Minimum Viable Product (MVP) that addresses the problem that you are trying to solve and working with early customers as they engage with that MVP. In this episode Tony Pavlovych shares his thoughts on Python's strengths when building and launching that MVP and some of the potential pitfalls that businesses can run into on that path.

31 July 2022


Powering The Next Generation Of Application Architectures With Web Assembly And The Fermyon Platform

Powering The Next Generation Of Application Architectures With Web Assembly And The Fermyon Platform

Application architectures have been in a constant state of evolution as new infrastructure capabilities are introduced. Virtualization, cloud, containers, mobile, and now web assembly have each introduced new options for how to build and deploy software. Recognizing the transformative potential of web assembly, Matt Butcher and his team at Fermyon are investing in tooling and services to improve the developer experience. In this episode he explains the opportunity that web assembly offers to all language communities, what they are building to power lightweight server-side microservices, and how Python developers can get started building and contributing to this nascent ecosystem.

25 July 2022


Gain A Deeper Understanding Of What Your Code Is Doing And Where It Spends Its Time With VizTracer

Gain A Deeper Understanding Of What Your Code Is Doing And Where It Spends Its Time With VizTracer

As your code scales beyond a trivial level of complexity and sophistication it becomes difficult or impossible to know everything that it is doing. The flow of logic and data through your software and which parts are taking the most time are impossible to understand without help from your tools. VizTracer is the tool that you will turn to when you need to know all of the execution paths that are being exercised and which of those paths are the most expensive. In this episode Tian Gao explains why he created VizTracer and how you can use it to gain a deeper familiarity with the code that you are responsible for maintaining.

17 July 2022


Stream Processing In Real Time And At Scale In Pure Python With Bytewax

Stream Processing In Real Time And At Scale In Pure Python With Bytewax

Analysis of streaming data in real time has long been the domain of big data frameworks, predominantly written in Java. In order to take advantage of those capabilities from Python requires using client libraries that suffer from impedance mis-matches that make the work harder than necessary. Bytewax is a new open source platform for writing stream processing applications in pure Python that don't have to be translated into foreign idioms. In this episode Bytewax founder Zander Matheson explains how the system works and how to get started with it today.

10 July 2022


Tetra: A Full Stack Web Framework That Doesn't Make You Write Everything Twice

Tetra: A Full Stack Web Framework That Doesn't Make You Write Everything Twice

Building a fully functional web application has been growing in complexity along with the growing popularity of javascript UI frameworks such as React, Vue, Angular, etc. Users have grown to expect interactive experiences with dynamic page updates, which leads to duplicated business logic and complex API contracts between the server-side application and the Javascript front-end. To reduce the friction involved in writing and maintaining a full application Sam Willis created Tetra, a framework built on top of Django that embeds the Javascript logic into the Python context where it is used. In this episode he explains his design goals for the project, how it has helped him build applications more rapidly, and how you can start using it to build your own projects today.

3 July 2022


Design Real-World Objects In Python With CadQuery

Design Real-World Objects In Python With CadQuery

Virtually everything that you interact with on a daily basis and many other things that make modern life possible were designed and modeled in software called CAD or Computer-Aided Design. These programs are advanced suites with graphical editing environments tailored to domain experts in areas such as mechanical engineering, electrical engineering, architecture, etc. While the UI-driven workflow is more accessible, it isn't scalable which opens the door to code-driven workflows. In this episode Jeremy Wright discusses the design, uses, and benefits of the CadQuery framework for building 3D CAD models entirely in Python.

27 June 2022


Intelligent Dependency Resolution For Optimal Compatibility And Security With Project Thoth

Intelligent Dependency Resolution For Optimal Compatibility And Security With Project Thoth

Building any software project is going to require relying on dependencies that you and your team didn't write or maintain, and many of those will have dependencies of their own. This has led to a wide variety of potential and actual issues ranging from developer ergonomics to application security. In order to provide a higher degree of confidence in the optimal combinations of direct and transitive dependencies a team at Red Hat started Project Thoth. In this episode Fridolín Pokorný explains how the Thoth resolver uses multiple signals to find the best combination of dependency versions to ensure compatibility and avoid known security issues.

15 June 2022


Take A Deep Dive On How Code Completion Works And How To Customize It

Take A Deep Dive On How Code Completion Works And How To Customize It

Most developers have encountered code completion systems and rely on them as part of their daily work. They allow you to stay in the flow of programming, but have you ever stopped to think about how they work? In this episode Meredydd Luff takes us behind the scenes to dig into the mechanics of code completion engines and how you can customize them to fit your particular use case.

30 May 2022


Hunting Black Swans With Bees: Catching Up With The Inimitable Russell Keith-Magee

Hunting Black Swans With Bees: Catching Up With The Inimitable Russell Keith-Magee

Russell Keith-Magee is an accomplished engineer and a fixture of the Python community. His work on the Beeware suite of projects is one of the most ambitious undertakings in the ecosystem and unfailingly forward-looking. With his recent transition to working for Anaconda he is now able to dedicate his full focus to the effort. In this episode he reflects on the journey that he has taken so far, how Beeware is helping to address some of the threats to Python's long term viability, and how he envisions its future in light of the recent release of PyScript, an in-browser runtime for Python.

24 May 2022


Take Control Of Your Digital Photos By Running Your Own Smart Library Manager With LibrePhotos

Take Control Of Your Digital Photos By Running Your Own Smart Library Manager With LibrePhotos

Digital cameras and the widespread availability of smartphones has allowed us all to generate massive libraries of personal photographs. Unfortunately, now we are all left to our own devices of how to manage them. While cloud services such as iPhotos and Google Photos are convenient, they aren't always affordable and they put your pictures under the control of large companies with their own agendas. LibrePhotos is an open source and self-hosted alternative to these services that puts you in control of your digital memories. In this episode the maintainer of LibrePhotos, Niaz Faridani-Rad, explains how he got involved with the project, the capabilities that it offers for managing your image library, and how to get your own instance set up to take back control of your pictures.

16 May 2022


Making Investment Data Easy To Access And Analyze With The OpenBB Terminal

Making Investment Data Easy To Access And Analyze With The OpenBB Terminal

Investing effectively is largely a game of information access and analysis. This can involve a substantial amount of research and time spent on finding, validating, and acquiring different information sources. In order to reduce the barrier to entry and provide a powerful framework for amateur and professional investors alike Didier Rodrigues Lopes created the OpenBB Terminal. In this episode he explains how a pandemic project that started as an experiment has led to him founding a new company and dedicating his time to growing and improving the project and its community.

10 May 2022


Accelerate Your Machine Learning Experimentation With Automatic Checkpoints Using FLOR

Accelerate Your Machine Learning Experimentation With Automatic Checkpoints Using FLOR

The experimentation phase of building a machine learning model requires a lot of trial and error. One of the limiting factors of how many experiments you can try is the length of time required to train the model which can be on the order of days or weeks. To reduce the time required to test different iterations Rolando Garcia Sanchez created FLOR which is a library that automatically checkpoints training epochs and instruments your code so that you can bypass early training cycles when you want to explore a different path in your algorithm. In this episode he explains how the tool works to speed up your experimentation phase and how to get started with it.

2 May 2022


Automatically Enforce Software Structures With Powerful Code Modifications Powered By LibCST

Automatically Enforce Software Structures With Powerful Code Modifications Powered By LibCST

Programmers love to automate tedious processes, including refactoring your code. In order to support the creation of code modifications for your Python projects Jimmy Lai created LibCST. It provides a richly typed and high level API for creating and manipulating concrete syntax trees of your source code. In this episode Jimmy Lai and Zsolt Dollenstein explain how it works, some of the linting and automatic code modification utilities that you can build with it and how to get started with using it to maintain your own Python projects.

25 April 2022


Cloud Native Networking For Developers With The Gloo Platform

Cloud Native Networking For Developers With The Gloo Platform

Communication is a fundamental requirement for any program or application. As the friction involved in deploying code has gone down, the motivation for architecting your system as microservices goes up. This shifts the communication patterns in your software from function calls to network calls. In this episode Idit Levine explains how the Gloo platform that she and her team at Solo have created makes it easier for you to configure and monitor the network topologies for your microservice environments. She also discusses what developers need to know about networking in cloud native environments and how a combination of API gateways and service mesh technologies allow you to more rapidly iterate on your systems.

19 April 2022


Accelerate And Simplify Cloud Native Development For Kubernetes Environments With Gefyra

Accelerate And Simplify Cloud Native Development For Kubernetes Environments With Gefyra

Cloud native architectures have been gaining prominence for the past few years due to the rising popularity of Kubernetes. This introduces new complications to development workflows due to the need to integrate with multiple services as you build new components for your production systems. In order to reduce the friction involved in developing applications for cloud native environments Michael Schilonka created Gefyra. In this episode he explains how it connects your local machine to a running Kubernetes environment so that you can rapidly iterate on your software in the context of the whole system. He also shares how the Django Hurricane plugin lets your applications work closely with the Kubernetes process model.

11 April 2022


Building A Community And Technology Stack For Scalable Big Data Geoscience At Pangeo

Building A Community And Technology Stack For Scalable Big Data Geoscience At Pangeo

Science is founded on the collection and analysis of data. For disciplines that rely on data about the earth the ability to simulate and generate that data has been growing faster than the tools for analysis of that data can keep up with. In order to help scale that capacity for everyone working in geosciences the Pangeo project compiled a reference stack that combines powerful tools into an out-of-the-box solution for researchers to be productive in short order. In this episode Ryan Abernathy and Joe Hamman explain what the Pangeo project really is, how they have integrated a combination of XArray, Dask, and Jupyter to power these analytical workflows, and how it has helped to accelerate research on multidimensional geospatial datasets.

28 March 2022


Automating Application Lifecycles For Developer Happiness At Wayfair

Automating Application Lifecycles For Developer Happiness At Wayfair

A common piece of advice when starting anything new is to "begin with the end in mind". In order to help the engineers at Wayfair manage the complete lifecycle of their applications Joshua Woodward runs a team that provides tooling and assistance along every step of the journey. In this episode he shares some of the lessons and tactics that they have developed while assisting other engineering teams with starting, deploying, and sunsetting projects. This is an interesting look at the inner workings of large organizations and how they invest in the scaffolding that supports their myriad efforts.

20 March 2022


Run Your Applications Reliably On Kubernetes Without Losing Sleep With Robusta

Run Your Applications Reliably On Kubernetes Without Losing Sleep With Robusta

Kubernetes is a framework that aims to simplify the work of running applications in production, but it forces you to adopt new patterns for debugging and resolving issues in your systems. Robusta is aimed at making that a more pleasant experience for developers and operators through pre-built automations, easy debugging, and a simple means of creating your own event-based workflows to find, fix, and alert on errors in production. In this episode Natan Yellin explains how the project got started, how it is architected and tested, and how you can start using it today to keep your Python projects running reliably.

14 March 2022


Accelerate The Development And Delivery Of Your Machine Learning Applications Using Ray And Deploy It At Anyscale

Accelerate The Development And Delivery Of Your Machine Learning Applications Using Ray And Deploy It At Anyscale

Building a machine learning application is inherently complex. Once it becomes necessary to scale the operation or training of the model, or introduce online re-training the process becomes even more challenging. In order to reduce the operational burden of AI developers Robert Nishihara helped to create the Ray framework that handles the distributed computing aspects of machine learning operations. To support the ongoing development and simplify adoption of Ray he co-founded Anyscale. In this episode he re-joins the show to share how the project, its community, and the ecosystem around it have grown and evolved over the intervening two years. He also explains how the techniques and adoption of machine learning have influenced the direction of the project.

6 March 2022


See The Structure Of Your Software At A Glance With Call Graphs From Code2Flow

See The Structure Of Your Software At A Glance With Call Graphs From Code2Flow

As software projects grow and change it can become difficult to keep track of all of the logical flows. By visualizing the interconnections of function definitions, classes, and their invocations you can speed up the time to comprehension for newcomers to a project, or help yourself remember what you worked on last month. In this episode Scott Rogowski shares his work on Code2Flow as a way to generate a call graph of your programs. He explains how it got started, how it works, and how you can start using it to understand your Python, Ruby, and PHP projects.

28 February 2022


Scaling Knowledge Management For Technical Teams With Knowledge Repo

Scaling Knowledge Management For Technical Teams With Knowledge Repo

One of the most persistent challenges faced by organizations of all sizes is the recording and distribution of institutional knowledge. In technical teams this is exacerbated by the need to incorporate technical review feedback and manage access to data before publishing. When faced with this problem as an early data scientist at AirBnB, Chetan Sharma helped create the Knowledge Repo project as a solution. In this episode he shares the story behind its creation and growth, how and why it was released as open source, and the features that make it a compelling option for your own team's knowledge management journey.

21 February 2022


Simplify And Scale Your Software Development Cycles By Putting On Pants (Build Tool)

Simplify And Scale Your Software Development Cycles By Putting On Pants (Build Tool)

Software development is a complex undertaking due to the number of options available and choices to be made in every stage of the lifecycle. In order to make it more scaleable it is necessary to establish common practices and patterns and introduce strong opinions. One area that can have a huge impact on the productivity of the engineers engaged with a project is the tooling used for building, validating, and deploying changes introduced to the software. In this episode maintainers of the Pants build tool Eric Arellano, Stu Hood, and Andreas Stenius discuss the recent updates that add support for more languages, efforts made to simplify its adoption, and the growth of the community that uses it. They also explore how using Pants as the single entry point for all of your routine tasks allows you to spend your time on the decisions that matter.

14 February 2022


Achieve Repeatable Builds Of Your Software On Any Machine With Earthly

Achieve Repeatable Builds Of Your Software On Any Machine With Earthly

It doesn't matter how amazing your application is if you are unable to deliver it to your users. Frustrated with the rampant complexity involved in building and deploying software Vlad A. Ionescu created the Earthly tool to reduce the toil involved in creating repeatable software builds. In this episode he explains the complexities that are inherent to building software projects and how he designed the syntax and structure of Earthly to make it easy to adopt for developers across all language environments. By adopting Earthly you can use the same techniques for building on your laptop and in your CI/CD pipelines.

6 February 2022


Building A Detailed View Of Your Software Delivery Process With The Eiffel Protocol

Building A Detailed View Of Your Software Delivery Process With The Eiffel Protocol

The process of getting software delivered to an environment where users can interact with it requires many steps along the way. In some cases the journey can require a large number of interdependent workflows that need to be orchestrated across technical and organizational boundaries, making it difficult to know what the current status is. Faced with such a complex delivery workflow the engineers at Ericsson created a message based protocol and accompanying tooling to let the various actors in the process provide information about the events that happened across the different stages. In this episode Daniel Ståhl and Magnus Bäck explain how the Eiffel protocol allows you to build a tooling agnostic visibility layer for your software delivery process, letting you answer all of your questions about what is happening between writing a line of code and your users executing it.

31 January 2022


Improve Your Productivity By Investing In Developer Experience Design For Your Projects

Improve Your Productivity By Investing In Developer Experience Design For Your Projects

When we are creating applications we spend a significant amount of effort on optimizing the experience of our end users to ensure that they are able to complete the tasks that the system is intended for. A similar effort that we should all consider is optimizing the developer experience for ourselves and other engineers who contribute to the projects that we work on. Adam Johnson recently wrote a book on how to improve the developer experience for Django projects and in this episode he shares some of the insights that he has gained through that project and his work with clients to help you improve the experience that you and your team have when collaborating on software development.

24 January 2022


An Exploration Of Effective Pandas Practices With Matt Harrison

An Exploration Of Effective Pandas Practices With Matt Harrison

Pandas has grown to be a ubiquitous tool for working with data at every stage. It has become so well known that many people learn Python solely for the purpose of using Pandas. With all of this activity and the long history of the project it can be easy to find misleading or outdated information about how to use it. In this episode Matt Harrison shares his work on the book "Effective Pandas" and some of the best practices and potential pitfalls that you should know for applying Pandas in your own work.

15 January 2022


Generate Your Text Files With Python Using Cog

Generate Your Text Files With Python Using Cog

Developers hate wasting effort on manual processes when we can write code to do it instead. Cog is a tool to manage the work of automating the creation of text inside another file by executing arbitrary Python code. In this episode Ned Batchelder shares the story of why he created Cog in the first place, some of the interesting ways that he uses it in his daily work, and the unique challenges of maintaining a project with a small audience and a well defined scope.

13 January 2022


A Friendly Approach To Regression Models For Programmers

A Friendly Approach To Regression Models For Programmers

Statistical regression models are a staple of predictive forecasts in a wide range of applications. In this episode Matthew Rudd explains the various types of regression models, when to use them, and his work on the book "Regression: A Friendly Guide" to help programmers add regression techniques to their toolbox.

2 January 2022


Fast, Flexible, and Incremental Task Automation With doit

Fast, Flexible, and Incremental Task Automation With doit

Every software project needs a tool for managing the repetitive tasks that are involved in building, running, and deploying the code. Frustrated with the limitations of tools like Make, Scons, and others Eduardo Schettino created doit to handle task automation in his own work and released it as open source. In this episode he shares the story behind the project, how it is implemented under the hood, and how you can start using it in your own projects to save you time and effort.

27 December 2021


The Technological, Business, and Sales Challenges Of Building The Ethical Ads Network

The Technological, Business, and Sales Challenges Of Building The Ethical Ads Network

Whether we like it or not, advertising is a common and effective way to make money on the internet. In order to support the work being done at Read The Docs they decided to include advertisements on the documentation sites they were hosting, but they didn't want to alienate their users or collect unnecessary information. In this episode David Fischer explains how they built the Ethical Ads network to solve their problem, the technical and business challenges that are involved, and the open source application that they built to power their network.

20 December 2021


Accidentally Building A Business With Python At Listen Notes

Accidentally Building A Business With Python At Listen Notes

Podcasts are one of the few mediums in the internet era that are still distributed through an open ecosystem. This has a number of benefits, but it also brings the challenge of making it difficult to find the content that you are looking for. Frustrated by the inability to pick and choose single episodes across various shows for his listening Wenbin Fang started the Listen Notes project to fulfill his own needs. He ended up turning that project into his full time business which has grown into the most full featured podcast search engine on the market. In this episode he explains how he build the Listen Notes application using Python and Django, his work to turn it into a sustainable business, and the various ways that you can build other applications and experiences on top of his API.

12 December 2021


Making Orbital Mechanics More Accessible With Poliastro

Making Orbital Mechanics More Accessible With Poliastro

Outer space holds a deep fascination for people of all ages, and the key principle in its exploration both near and far is orbital mechanics. Poliastro is a pure Python package for exploring and simulating orbit calculations. In this episode Juan Luis Cano Rodriguez shares the story behind the project, how you can use it to learn more about space travel, and some of the interesting projects that have used it for planning planetary and interplanetary missions.

27 November 2021


Declarative Deep Learning From Your Laptop To Production With Ludwig and Horovod

Declarative Deep Learning From Your Laptop To Production With Ludwig and Horovod

Deep learning frameworks encourage you to focus on the structure of your model ahead of the data that you are working with. Ludwig is a tool that uses a data oriented approach to building and training deep learning models so that you can experiment faster based on the information that you actually have, rather than spending all of our time manipulating features to make them match your inputs. In this episode Travis Addair explains how Ludwig is designed to improve the adoption of deep learning for more companies and a wider range of users. He also explains how the Horovod framework plugs in easily to allow for scaling your training workflow from your laptop out to a massive cluster of servers and GPUs. The combination of these tools allows for a declarative workflow that starts off easy but gives you full control over the end result.

22 November 2021


Build Better Analytics And Models With A Focus On The Data Experience

Build Better Analytics And Models With A Focus On The Data Experience

A lot of time and energy goes into data analysis and machine learning projects to address various goals. Most of the effort is focused on the technical aspects and validating the results, but how much time do you spend on considering the experience of the people who are using the outputs of these projects? In this episode Benn Stancil explores the impact that our technical focus has on the perceived value of our work, and how taking the time to consider what the desired experience will be can lead us to approach our work more holistically and increase the satisfaction of everyone involved.

22 November 2021


Building Conversational AI to Augment Sales Teams at Structurely

Building Conversational AI to Augment Sales Teams at Structurely

The true power of artificial intelligence is its ability to work collaboratively with humans. Nate Joens co-founded Structurely to create a conversational AI platform that augments human sales teams to help guide potential customers through the initial steps of the funnel. In this episode he discusses the technical and social considerations that need to be combined for a seamless conversational experience and how he and his team are tackling the problem.

6 November 2021


Build Composable And Reusable Feature Engineering Pipelines with Feature-Engine

Build Composable And Reusable Feature Engineering Pipelines with Feature-Engine

Every machine learning model has to start with feature engineering. This is the process of combining input variables into a more meaningful signal for the problem that you are trying to solve. Many times this process can lead to duplicating code from previous projects, or introducing technical debt in the form of poorly maintained feature pipelines. In order to make the practice more manageable Soledad Galli created the feature-engine library. In this episode she explains how it has helped her and others build reusable transformations that can be applied in a composable manner with your scikit-learn projects. She also discusses the importance of understanding the data that you are working with and the domain in which your model will be used to ensure that you are selecting the right features.

31 October 2021


Speed Up Your Python Data Applications By Parallelizing Them With Bodo

Speed Up Your Python Data Applications By Parallelizing Them With Bodo

The speed of Python is a subject of constant debate, but there is no denying that for compute heavy work it is not the optimal tool. Rather than rewriting your data oriented applications, or having to rearchitect them, the team at Bodo wrote a compiler that will do the optimization for you. In this episode Ehsan Totoni explains how they are able to translate pure Python into massively parallel processes that are optimized for high performance compute systems.

25 October 2021


An Exploration Of Financial Exchange Risk Management Strategies

An Exploration Of Financial Exchange Risk Management Strategies

The world of finance has driven the development of many sophisticated techniques for data analysis. In this episode Paul Stafford shares his experiences working in the realm of risk management for financial exchanges. He discusses the types of risk that are involved, the statistical methods that he has found most useful for identifying strategies to mitigate that risk, and the software libraries that have helped him most in his work.

16 October 2021


Build Better Machine Learning Models By Understanding Their Decisions With SHAP

Build Better Machine Learning Models By Understanding Their Decisions With SHAP

Machine learning and deep learning techniques are powerful tools for a large and growing number of applications. Unfortunately, it is difficult or impossible to understand the reasons for the answers that they give to the questions they are asked. In order to help shine some light on what information is being used to provide the outputs to your machine learning models Scott Lundberg created the SHAP project. In this episode he explains how it can be used to provide insight into which features are most impactful when generating an output, and how that insight can be applied to make more useful and informed design choices. This is a fascinating and important subject and this episode is an excellent exploration of how to start addressing the challenge of explainability.

9 October 2021


Accelerating Drug Discovery Using Machine Learning With TorchDrug

Accelerating Drug Discovery Using Machine Learning With TorchDrug

Finding new and effective treatments for disease is a complex and time consuming endeavor, requiring a high degree of domain knowledge and specialized equipment. Combining his expertise in machine learning and graph algorithms with is interest in drug discovery Jian Tang created the TorchDrug project to help reduce the amount of time needed to find new candidate molecules for testing. In this episode he explains how the project is being used by machine learning researchers and biochemists to collaborate on finding effective treatments for real-world diseases.

30 September 2021


An Exploration Of Automated Speech Recognition

An Exploration Of Automated Speech Recognition

The overwhelming growth of smartphones, smart speakers, and spoken word content has corresponded with increasingly sophisticated machine learning models for recognizing speech content in audio data. Dylan Fox founded Assembly to provide access to the most advanced automated speech recognition models for developers to incorporate into their own products. In this episode he gives an overview of the current state of the art for automated speech recognition, the varying requirements for accuracy and speed of models depending on the context in which they are used, and what is required to build a special purpose model for your own ASR applications.

26 September 2021


Experimenting With Reinforcement Learning Using MushroomRL

Experimenting With Reinforcement Learning Using MushroomRL

Reinforcement learning is a branch of machine learning and AI that has a lot of promise for applications that need to evolve with changes to their inputs. To support the research happening in the field, including applications for robotics, Carlo D'Eramo and Davide Tateo created MushroomRL. In this episode they share how they have designed the project to be easy to work with, so that students can use it in their study, as well as extensible so that it can be used by businesses and industry professionals. They also discuss the strengths of reinforcement learning, how to design problems that can leverage its capabilities, and how to get started with MushroomRL for your own work.

19 September 2021


Doing Dask Powered Data Science In The Saturn Cloud

Doing Dask Powered Data Science In The Saturn Cloud

A perennial problem of doing data science is that it works great on your laptop, until it doesn't. Another problem is being able to recreate your environment to collaborate on a problem with colleagues. Saturn Cloud aims to help with both of those problems by providing an easy to use platform for creating reproducible environments that you can use to build data science workflows and scale them easily with a managed Dask service. In this episode Julia Signall, head of open source at Saturn Cloud, explains how she is working with the product team and PyData community to reduce the points of friction that data scientists encounter as they are getting their work done.

10 September 2021


Monitor The Health Of Your Machine Learning Products In Production With Evidently

Monitor The Health Of Your Machine Learning Products In Production With Evidently

You've got a machine learning model trained and running in production, but that's only half of the battle. Are you certain that it is still serving the predictions that you tested? Are the inputs within the range of tolerance that you designed? Monitoring machine learning products is an essential step of the story so that you know when it needs to be retrained against new data, or parameters need to be adjusted. In this episode Emeli Dral shares the work that she and her team at Evidently are doing to build an open source system for tracking and alerting on the health of your ML products in production. She discusses the ways that model drift can occur, the types of metrics that you need to track, and what to do when the health of your system is suffering. This is an important and complex aspect of the machine learning lifecycle, so give it a listen and then try out Evidently for your own projects.

3 September 2021


Making Automated Machine Learning More Accessible With EvalML

Making Automated Machine Learning More Accessible With EvalML

Building a machine learning model is a process that requires a lot of iteration and trial and error. For certain classes of problem a large portion of the searching and tuning can be automated. This allows data scientists to focus their time on more complex or valuable projects, as well as opening the door for non-specialists to experiment with machine learning. Frustrated with some of the awkward or difficult to use tools for AutoML, Angela Lin and Jeremy Shih helped to create the EvalML framework. In this episode they share the use cases for automated machine learning, how they have designed the EvalML project to be approachable, and how you can use it for building and training your own models.

25 August 2021


Growing And Supporting The Data Science Community At Anaconda

Growing And Supporting The Data Science Community At Anaconda

Data scientists are tasked with answering challenging questions using data that is often messy and incomplete. Anaconda is on a mission to make the lives of data professionals more manageable through creation and maintenance of high quality libraries and frameworks, the distribution of an easy to use Python distribution and package ecosystem, and high quality training material. In this episode Kevin Goldsmith, CTO of Anaconda, discusses the technical and social challenges faced by data scientists, the ways that the Python ecosystem has evolved to help address those difficulties, and how Anaconda is engaging with the community to provide high quality tools and education for this constantly changing practice.

19 August 2021


Network Analysis At The Speed Of C With The Power Of Python Using NetworKit

Network Analysis At The Speed Of C With The Power Of Python Using NetworKit

Analysing networks is a growing area of research in academia and industry. In order to be able to answer questions about large or complex relationships it is necessary to have fast and efficient algorithms that can process the data quickly. In this episode Eugenio Angriman discusses his contributions to the NetworKit library to provide an accessible interface for these algorithms. He shares how he is using NetworKit for his own research, the challenges of working with large and complex networks, and the kinds of questions that can be answered with data that fits on your laptop.

15 August 2021


Delivering Deep Learning Powered Speech Recognition As A Service For Developers At AssemblyAI

Delivering Deep Learning Powered Speech Recognition As A Service For Developers At AssemblyAI

Building a software-as-a-service (SaaS) business is a fairly well understood pattern at this point. When the core of the service is a set of machine learning products it introduces a whole new set of challenges. In this episode Dylan Fox shares his experience building Assembly AI as a reliable and affordable option for automatic speech recognition that caters to a developer audience. He discusses the machine learning development and deployment processes that his team relies on, the scalability and performance considerations that deep learning models introduce, and the user experience design that goes into building for a developer audience. This is a fascinating conversation about a unique cross-section of considerations and how Dylan and his team are building an impressive and useful service.

4 August 2021


Taking Aim At The Legacy Of SQL With The Preql Relational Language

Taking Aim At The Legacy Of SQL With The Preql Relational Language

SQL has gone through many cycles of popularity and disfavor. Despite its longevity it is objectively challenging to work with in a collaborative and composable manner. In order to address these shortcomings and build a new interface for your database oriented workloads Erez Shinan created Preql. It is based on the same relational algebra that inspired SQL, but brings in more robust computer science principles to make it more manageable as you scale in complexity. In this episode he shares his motivation for creating the Preql project, how he has used Python to develop a new language for interacting with database engines, and the challenges of taking on the legacy of SQL as an individual.

28 July 2021


Unleash The Power Of Dataframes At Any Scale With Modin

Unleash The Power Of Dataframes At Any Scale With Modin

When you start working on a data project there are always a variety of unknown factors that you have to explore. One of those is the volume of total data that you will eventually need to handle, and the speed and scale at which it will need to be processed. If you optimize for scale too early then it adds a high barrier to entry due to the complexities of distributed systems, but if you invest in a lot of engineering up front then it can be challenging to refactor for scale. Modin is a project that aims to remove that decision by letting you seamlessly replace your existing Pandas code and scale across CPU cores or across a cluster of machines. In this episode Devin Petersohn explains why he started working on solving this problem, how Modin is architected to allow for a smooth escalation from small to large volumes of data and compute, and how you can start using it today to accelerate your Pandas workflows.

22 July 2021


Exploring The SpeechBrain Toolkit For Speech Processing

Exploring The SpeechBrain Toolkit For Speech Processing

With the rising availability of computation in everyday devices, there has been a corresponding increase in the appetite for voice as the primary interface. To accomodate this desire it is necessary for us to have high quality libraries for being able to process and generate audio data that can make sense of human speech. To facilitate research and industry applications for speech data Mirco Ravanelli and Peter Plantinga are building SpeechBrain. In this episode they explain how it works under the hood, the projects that they are using it for, and how you can get started with it today.

14 July 2021


Fast And Educational Exploration And Analysis Of Graph Data Structures With graph-tool

Fast And Educational Exploration And Analysis Of Graph Data Structures With graph-tool

If you are interested in a library for working with graph structures that will also help you learn more about the research and theory behind the algorithms then look no further than graph-tool. In this episode Tiago Peixoto shares his work on graph algorithms and networked data and how he has built graph-tool to help in that research. He explains how it is implemented, how it evolved from a simple command line tool to a full-fledged library, and the benefits that he has found from building a personal project in the open.

7 July 2021


Lightening The Load For Deep Learning With Sparse Networks Using Neural Magic

Lightening The Load For Deep Learning With Sparse Networks Using Neural Magic

Deep learning has largely taken over the research and applications of artificial intelligence, with some truly impressive results. The challenge that it presents is that for reasonable speed and performance it requires specialized hardware, generally in the form of a dedicated GPU (Graphics Processing Unit). This raises the cost of the infrastructure, adds deployment complexity, and drastically increases the energy requirements for training and serving of models. To address these challenges Nir Shavit combined his experiences in multi-core computing and brain science to co-found Neural Magic where he is leading the efforts to build a set of tools that prune dense neural networks to allow them to execute on commodity CPU hardware. In this episode he explains how sparsification of deep learning models works, the potential that it unlocks for making machine learning and specialized AI more accessible, and how you can start using it today.

30 June 2021


Finding The Core Of Python For A Bright Future With Brett Cannon

Finding The Core Of Python For A Bright Future With Brett Cannon

Brett Cannon has been a long-time contributor to the Python language and community in many ways. In this episode he shares some of his work and thoughts on modernizing the ecosystem around the language. This includes standards for packaging, discovering the true core of the language, and how to make it possible to target mobile and web platforms.

23 June 2021


Traversing The Challenges And Promise Of Graph Machine Learning

Traversing The Challenges And Promise Of Graph Machine Learning

The foundation of every ML model is the data that it is trained on. In many cases you will be working with tabular or unstructured information, but there is a growing trend toward networked, or graph data sets. Benedek Rozemberczki has focused his research and career around graph machine learning applications. In this episode he discusses the common sources of networked data, the challenges of working with graph data in machine learning projects, and describes the libraries that he has created to help him in his work. If you are dealing with connected data then this interview will provide a wealth of context and resources to improve your projects.

16 June 2021


Keep Your Analytics Lint Free With SQLFluff

Keep Your Analytics Lint Free With SQLFluff

The growth of analytics has accelerated the use of SQL as a first class language. It has also grown the amount of collaboration involved in writing and maintaining SQL queries. With collaboration comes the inevitable variation in how queries are written, both structurally and stylistically which can lead to a significant amount of wasted time and energy during code review and employee onboarding. Alan Cruickshank was feeling the pain of this wasted effort first-hand which led him down the path of creating SQLFluff as a linter and formatter to enforce consistency and find bugs in the SQL code that he and his team were working with. In this episode he shares the story of how SQLFluff evolved from a simple hackathon project to an open source linter that is used across a range of companies and fosters a growing community of users and contributors. He explains how it has grown to support multiple dialects of SQL, as well as integrating with projects like DBT to handle templated queries. This is a great conversation about the long detours that are sometimes necessary to reach your original destination and the powerful impact that good tooling can have on team productivity.

9 June 2021


Exploring The Patterns And Practices For Deep Learning With Andrew Ferlitsch

Exploring The Patterns And Practices For Deep Learning With Andrew Ferlitsch

Deep learning is gaining an immense amount of popularity due to the incredible results that it is able to offer with comparatively little effort. Because of this there are a number of engineers who are trying their hand at building machine learning models with the wealth of frameworks that are available. Andrew Ferlitsch wrote a book to capture the useful patterns and best practices for building models with deep learning to make it more approachable for newcomers ot the field. In this episode he shares his deep expertise and extensive experience in building and teaching machine learning across many companies and industries. This is an entertaining and educational conversation about how to build maintainable models across a variety of applications.

2 June 2021


Automatically Generate Your Unit Tests From Scratch With Pynguin

Automatically Generate Your Unit Tests From Scratch With Pynguin

Unit tests are an important tool to ensure the proper functioning of your application, but writing them can be a chore. Stephan Lukasczyk wants to reduce the monotony of the process for Python developers. As part of his PhD research he created the Pynguin project to automate the creation of unit tests. In this episode he explains the complexity involved in generating useful tests for a dynamic language, how he has designed Pynguin to address the challenges, and how you can start using it today for your own work.

25 May 2021


Leveling Up Natural Language Processing with Transfer Learning

Leveling Up Natural Language Processing with Transfer Learning

Natural language processing is a powerful tool for extracting insights from large volumes of text. With the growth of the internet and social platforms, and the increasing number of people and communities conducting their professional and personal activities online, the opportunities for NLP to create amazing insights and experiences are endless. In order to work with such a large and growing corpus it has become necessary to move beyond purely statistical methods and embrace the capabilities of deep learning, and transfer learning in particular. In this episode Paul Azunre shares his journey into the application and implementation of transfer learning for natural language processing. This is a fascinating look at the possibilities of emerging machine learning techniques for transforming the ways that we interact with technology.

18 May 2021


Federated Learning For All With Flower

Federated Learning For All With Flower

Machine learning is a tool that has typically been performed on large volumes of data in one place. As more computing happens at the edge on mobile and low power devices, the learning is being federated which brings a new set of challenges. Daniel Beutel co-created the Flower framework to make federated learning more manageable. In this episode he shares his motivations for starting the project, how you can use it for your own work, and the unique challenges and benefits that this emerging model offers. This is a great exploration of the federated learning space and a framework that makes it more approachable.

11 May 2021


Data Exploration and Visualization Made Effortless with Lux

Data Exploration and Visualization Made Effortless with Lux

Data exploration is an important step in any analysis or machine learning project. Visualizing the data that you are working with makes that exploration faster and more effective, but having to remember and write all of the code to build a scatter plot or histogram is tedious and time consuming. In order to eliminate that friction Doris Lee helped create the Lux project, which wraps your Pandas data frame and automatically generates a set of visualizations without you having to lift a finger. In this episode she explains how Lux works under the hood, what inspired her to create it in the first place, and how it can help you create a better end result. The Lux project is a valuable addition to the toolbox of anyone who is doing data wrangling with Pandas.

4 May 2021


Extensible Open Source Authorization For Your Applications With Oso

Extensible Open Source Authorization For Your Applications With Oso

Any project that is used by more than one person will eventually need to handle permissions for each of those users. It is certainly possible to write that logic yourself, but you'll almost certainly do it wrong at least once. Rather than waste your time fighting with bugs in your authorization code it makes sense to use a well-maintained library that has already made and fixed all of the mistakes so that you don't have to. In this episode Sam Scott shares the Oso framework to give you a clean separation between your authorization policies and your application code. He explains how you can call a simple function to ask if something is allowed, and then manage the complex rules that match your particular needs as a separate concern. He describes the motivation for building a domain specific language based on logic programming for policy definitions, how it integrates with the host language (such as Python), and how you can start using it in your own applications today. This is a must listen even if you never use the project because it is a great exploration of all of the incidental complexity that is involved in permissions management.

27 April 2021


Teaching Geeks The Value And Skills Of Public Speaking

Teaching Geeks The Value And Skills Of Public Speaking

Being able to present your ideas is one of the most valuable and powerful skills to have as a professional, regardless of your industry. For software engineers it is especially important to be able to communicate clearly and effectively because of the detail-oriented nature of the work. Unfortunately, many people who work in software are more comfortable in front of the keyboard than a crowd. In this episode Neil Thompson shares his story of being an accidental public speaker and how he is helping other engineers start down the road of being effective presenters. He discusses the benefits for your career, how to build the skills, and how to find opportunities to practice them. Even if you never want to speak at a conference, it's still worth your while to listen to Neil's advice and find ways to level up your presentation and speaking skills.

20 April 2021


Let The Robots Do The Work Using Robotic Process Automation with Robocorp

Let The Robots Do The Work Using Robotic Process Automation with Robocorp

One of the great promises of computers is that they will make our work faster and easier, so why do we all spend so much time manually copying data from websites, or entering information into web forms, or any of the other tedious tasks that take up our time? As developers our first inclination is to "just write a script" to automate things, but how do you share that with your non-technical co-workers? In this episode Antti Karjalainen, CEO and co-founder of Robocorp, explains how Robotic Process Automation (RPA) can help us all cut down on time-wasting tasks and let the computers do what they're supposed to. He shares how he got involved in the RPA industry, his work with Robot Framework and RPA framework, how to build and distribute bots, and how to decide if a task is worth automating. If you're sick of spending your time on mind-numbing copy and paste then give this episode a listen and then let the robots do the work for you.

13 April 2021


Keep Your Code Clean And Maintainable Using Static Analysis With Flake8

Keep Your Code Clean And Maintainable Using Static Analysis With Flake8

When you are writing code it is all to easy to introduce subtle bugs or leave behind unused code. Unused variables, unused imports, overly complex logic, etc. If you are careful and diligent you can find these problems yourself, but isn't that what computers are supposed to help you with? Thankfully Python has a wealth of tools that will work with you to keep your code clean and maintainable. In this episode Anthony Sottile explores Flake8, one of the most popular options for identifying those problematic lines of code. He shares how he became involved in the project and took over as maintainer and explains the different categories of code quality tooling and how Flake8 compares to other static analyzers. He also discusses the ecosystem of plugins that have grown up around it, including some detailed examples of how you can write your own (and why you might want to).

6 April 2021


Make Your Code More Readable With The Magic Of Refactoring Using Sourcery

Make Your Code More Readable With The Magic Of Refactoring Using Sourcery

Writing code that is easy to read and understand will have a lasting impact on you and your teammates over the life of a project. Sometimes it can be difficult to identify opportunities for simplifying a block of code, especially if you are early in your journey as a developer. If you work with senior engineers they can help by pointing out ways to refactor your code to be more readable, but they aren't always available. Brendan Maginnis and Nick Thapen created Sourcery to act as a full time pair programmer sitting in your editor of choice, offering suggestions and automatically refactoring your Python code. In this episode they share their journey of building a tool to automatically find opportunities for refactoring in your code, including how it works under the hood, the types of refactoring that it supports currently, and how you can start using it in your own work today. It always pays to keep your tool box organized and your tools sharp and Sourcery is definitely worth adding to your repertoire.

30 March 2021


Be Data Driven At Any Scale With Superset

Be Data Driven At Any Scale With Superset

Becoming data driven is the stated goal of a large and growing number of organizations. In order to achieve that mission they need a reliable and scalable method of accessing and analyzing the data that they have. While business intelligence solutions have been around for ages, they don't all work well with the systems that we rely on today and a majority of them are not open source. Superset is a Python powered platform for exploring your data and building rich interactive dashboards that gets the information that your organization needs in front of the people that need it. In this episode Maxime Beauchemin, the creator of Superset, shares how the project got started and why it has become such a widely used and popular option for exploring and sharing data at companies of all sizes. He also explains how it functions, how you can customize it to fit your specific needs, and how to get it up and running in your own environment.

22 March 2021


Practical Advice On Using Python To Power A Business

Practical Advice On Using Python To Power A Business

Python is a language that is used in almost every imaginable context and by people from an amazing range of backgrounds. A lot of the people who use it wouldn't even call themselves programmers, because that is not the primary focus of their job. In this episode Chris Moffitt shares his experience writing Python as a business user. In order to share his insights and help others who have run up against the limits of Excel he maintains the site Practical Business Python where he publishes articles that help introduce newcomers to Python and explain how to perform tasks such as building reports, automating Excel files, and doing data analysis. This is a great conversation that illustrates how useful it is to learn Python even if you never intend to write software professionally.

16 March 2021


Analyzing The Ecosystem of Python Data Companies With Tony Liu

Analyzing The Ecosystem of Python Data Companies With Tony Liu

There are a large and growing number of businesses built by and for data science and machine learning teams that rely on Python. Tony Liu is a venture investor who is following that market closely and betting on its continued success. In this episode he shares his own journey into the role of an investor and discusses what he is most excited about in the industry. He also explains what he looks at when investing in a business and gives advice on what potential founders and early employees of startups should be thinking about when starting on that journey.

9 March 2021


Go From Notebook To Pipeline For Your Data Science Projects With Orchest

Go From Notebook To Pipeline For Your Data Science Projects With Orchest

Jupyter notebooks are a dominant tool for data scientists, but they lack a number of conveniences for building reusable and maintainable systems. For machine learning projects in particular there is a need for being able to pivot from exploring a particular dataset or problem to integrating that solution into a larger workflow. Rick Lamers and Yannick Perrenet were tired of struggling with one-off solutions when they created the Orchest platform. In this episode they explain how Orchest allows you to turn your notebooks into executable components that are integrated into a graph of execution for running end-to-end machine learning workflows.

2 March 2021


Write Your Python Scripts In A Flow Based Visual Editor With Ryven

Write Your Python Scripts In A Flow Based Visual Editor With Ryven

When you are writing a script it can become unwieldy to understand how the logic and data are flowing through the program. To make this easier to follow you can use a flow-based approach to building your programs. Leonn Thomm created the Ryven project as an environment for visually constructing a flow-based program. In this episode he shares his inspiration for creating the Ryven project, how it changes the way you think about program design, how Ryven is implemented, and how to get started with it for your own programs.

23 February 2021


CrossHair: Your Automatic Pair Programmer

CrossHair: Your Automatic Pair Programmer

One of the perennial challenges in software engineering is to reduce the opportunity for bugs to creep into the system. Some of the tools in our arsenal that help in this endeavor include rich type systems, static analysis, writing tests, well defined interfaces, and linting. Phillip Schanely created the CrossHair project in order to add another ally in the fight against broken code. It sits somewhere between type systems, automated test generation, and static analysis. In this episode he explains his motivation for creating it, how he uses it for his own projects, and how to start incorporating it into yours. He also discusses the utility of writing contracts for your functions, and the differences between property based testing and SMT solvers. This is an interesting and informative conversation about some of the more nuanced aspects of how to write well-behaved programs.

16 February 2021


Giving Your Data Science Projects And Teams A Home At DagsHub

Giving Your Data Science Projects And Teams A Home At DagsHub

Collaborating on software projects is largely a solved problem, with a variety of hosted or self-managed platforms to choose from. For data science projects, collaboration is still an open question. There are a number of projects that aim to bring collaboration to data science, but they are all solving a different aspect of the problem. Dean Pleban and Guy Smoilovsky created DagsHub to give individuals and teams a place to store and version their code, data, and models. In this episode they explain how DagsHub is designed to make it easier to create and track machine learning experiments, and serve as a way to promote collaboration on open source data science projects.

9 February 2021


Exploring Literate Programming For Python Projects With nbdev

Exploring Literate Programming For Python Projects With nbdev

Creating well designed software is largely a problem of context and understanding. The majority of programming environments rely on documentation, tests, and code being logically separated despite being contextually linked. In order to weave all of these concerns together there have been many efforts to create a literate programming environment. In this episode Jeremy Howard of fast.ai fame and Hamel Husain of GitHub share the work they have done on nbdev. The explain how it allows you to weave together documentation, code, and tests in the same context so that it is more natural to explore and build understanding when working on a project. It is built on top of the Jupyter environment, allowing you to take advantage of the other great elements of that ecosystem, and it provides a number of excellent out of the box features to reduce the friction in adopting good project hygiene, including continuous integration and well designed documentation sites. Regardless of whether you have been programming for 5 days, 5 years, or 5 decades you should take a look at nbdev to experience a different way of looking at your code.

2 February 2021


Making The Sans I/O Ideal A Reality For The Websockets Library

Making The Sans I/O Ideal A Reality For The Websockets Library

Working with network protocols is a common need for software projects, particularly in the current age of the internet. As a result, there are a multitude of libraries that provide interfaces to the various protocols. The problem is that implementing a network protocol properly and handling all of the edge cases is hard, and most of the available libraries are bound to a particular I/O paradigm which prevents them from being widely reused. To address this shortcoming there has been a movement towards "sans I/O" implementations that provide the business logic for a given protocol while remaining agnostic to whether you are using async I/O, Twisted, threads, etc. In this episode Aymeric Augustin shares his experience of refactoring his popular websockets library to be I/O agnostic, including the challenges involved in how to design the interfaces, the benefits it provides in simplifying the tests, and the work needed to add back support for async I/O and other runtimes. This is a great conversation about what is involved in making an ideal a reality.

26 January 2021


Driving Toward A Faster Python Interpreter With Pyston

Driving Toward A Faster Python Interpreter With Pyston

One of the common complaints about Python is that it is slow. There are languages and runtimes that can execute code faster, but they are not as easy to be productive with, so many people are willing to make that tradeoff. There are some use cases, however, that truly need the benefit of faster execution. To address this problem Kevin Modzelewski helped to create the Pyston intepreter that is focused on speeding up unmodified Python code. In this episode he shares the history of the project, discusses his current efforts to optimize a fork of the CPython interpreter, and his goals for building a business to support the ongoing work to make Python faster for everyone. This is an interesting look at the opportunities that exist in the Python ecosystem and the work being done to address some of them.

19 January 2021


Project Scaffolding That Evolves With Your Software Using Copier

Project Scaffolding That Evolves With Your Software Using Copier

Every software project has a certain amount of boilerplate to handle things like linting rules, test configuration, and packaging. Rather than recreate everything manually every time you start a new project you can use a utility to generate all of the necessary scaffolding from a template. This allows you to extract best practices and team standards into a reusable project that will save you time. The Copier project is one such utility that goes above and beyond the bare minimum by supporting project _evolution_, letting you bring in the changes to the source template after you already have a project that you have dedicated significant work on. In this episode Jairo Llopis explains how the Copier project works under the hood and the advanced capabilities that it provides, including managing the full lifecycle of a project, composing together multiple project templates, and how you can start using it for your own work today.

12 January 2021


How Python's Evolution Impacts Your Fluency With Luciano Ramalho

How Python's Evolution Impacts Your Fluency With Luciano Ramalho

On its surface Python is a simple language which is what has contributed to its rise in popularity. As you move to intermediate and advanced usage you will find a number of interesting and elegant design elements that will let you build scalable and maintainable systems and design friendly interfaces. Luciano Ramalho is best known as the author of Fluent Python which has quickly become a leading resource for Python developers to increase their facility with the language. In this episode he shares his journey with Python and his perspective on how the recent changes to the interpreter and ecosystem are influencing who is adopting it and how it is being used. Luciano has an interesting perspective on how the feedback loop between the community and the language is driving the curent and future priorities of the features that are added.

5 January 2021


Making Content Management A Smooth Experience With A Headless CMS

Making Content Management A Smooth Experience With A Headless CMS

Building a web application requires integrating a number of separate concerns into a single experience. One of the common requirements is a content management system to allow product owners and marketers to make the changes needed for them to do their jobs. Rather than spend the time and focus of your developers to build the end to end system a growing trend is to use a headless CMS. In this episode Jake Lumetta shares why he decided to spend his time and energy on building a headless CMS as a service, when and why you might want to use one, and how to integrate it into your applications so that you can focus on the rest of your application.

28 December 2020


Turning Notebooks Into Collaborative And Dynamic Data Applications With Hex

Turning Notebooks Into Collaborative And Dynamic Data Applications With Hex

Notebooks have been a useful tool for analytics, exploratory programming, and shareable data science for years, and their popularity is continuing to grow. Despite their widespread use, there are still a number of challenges that inhibit collaboration and use by non-technical stakeholders. Barry McCardel and his team at Hex have built a platform to make collaboration on Jupyter notebooks a first class experience, as well as allowing notebooks to be parameterized and exposing the logic through interactive web applications. In this episode Barry shares his perspective on the state of the notebook ecosystem, why it is such as powerful tool for computing and analytics, and how he has built a successful business around improving the end to end experience of working with notebooks. This was a great conversation about an important piece of the toolkit for every analyst and data scientist.

21 December 2020


Add Anomaly Detection To Your Time Series Data With Luminaire

Add Anomaly Detection To Your Time Series Data With Luminaire

When working with data it's important to understand when it is correct. If there is a time dimension, then it can be difficult to know when variation is normal. Anomaly detection is a useful tool to address these challenges, but a difficult one to do well. In this episode Smit Shah and Sayan Chakraborty share the work they have done on Luminaire to make anomaly detection easier to work with. They explain the complexities inherent to working with time series data, the strategies that they have incorporated into Luminaire, and how they are using it in their data pipelines to identify errors early. If you are working with any kind of time series then it's worth giving Luminaure a look.

15 December 2020


Building Big Data Pipelines For Audio With Klio

Building Big Data Pipelines For Audio With Klio

Technologies for building data pipelines have been around for decades, with many mature options for a variety of workloads. However, most of those tools are focused on processing of text based data, both structured and unstructured. For projects that need to manage large numbers of binary and audio files the list of options is much shorter. In this episode Lynn Root shares the work that she and her team at Spotify have done on the Klio project to make that list a bit longer. She discusses the problems that are specific to working with binary data, how the Klio project is architected to allow for scalable and efficient processing of massive numbers of audio files, why it was released as open source, and how you can start using it today for your own projects. If you are struggling with ad-hoc infrastructure and a medley of tools that have been cobbled together for analyzing large or numerous binary assets then this is definitely a tool worth testing out.

7 December 2020


Open Sourcing The Anvil Full Stack Python Web App Platform

Open Sourcing The Anvil Full Stack Python Web App Platform

Building a complete web application requires expertise in a wide range of disciplines. As a result it is often the work of a whole team of engineers to get a new project from idea to production. Meredydd Luff and his co-founder built the Anvil platform to make it possible to build full stack applications entirely in Python. In this episode he explains why they released the application server as open source, how you can use it to run your own projects for free, and why developer tooling is the sweet spot for an open source business model. He also shares his vision for how the end-to-end experience of building for the web should look, and some of the innovative projects and companies that were made possible by the reduced friction that the Anvil platform provides. Give it a listen today to gain some perspective on what it could be like to build a web app.

1 December 2020


Pants Has Got Your Python Monorepo Covered

Pants Has Got Your Python Monorepo Covered

In a software project writing code is just one step of the overall lifecycle. There are many repetitive steps such as linting, running tests, and packaging that need to be run for each project that you maintain. In order to reduce the overhead of these repeat tasks, and to simplify the process of integrating code across multiple systems the use of monorepos has been growing in popularity. The Pants build tool is purpose built for addressing all of the drudgery and for working with monorepos of all sizes. In this episode core maintainers Eric Arellano and Stu Hood explain how the Pants project works, the benefits of automatic dependency inference, and how you can start using it in your own projects today. They also share useful tips for how to organize your projects, and how the plugin oriented architecture adds flexibility for you to customize Pants to your specific needs.

23 November 2020


Scale Your Data Science Teams With Machine Learning Operations Principles

Scale Your Data Science Teams With Machine Learning Operations Principles

Building a machine learning model is a process that requires well curated and cleaned data and a lot of experimentation. Doing it repeatably and at scale with a team requires a way to share your discoveries with your teammates. This has led to a new set of operational ML platforms. In this episode Michael Del Balso shares the lessons that he learned from building the platform at Uber for putting machine learning into production. He also explains how the feature store is becoming the core abstraction for data teams to collaborate on building machine learning models. If you are struggling to get your models into production, or scale your data science throughput, then this interview is worth a listen.

17 November 2020


Making The Case For A (Semi) Formal Specification Of CPython

Making The Case For A (Semi) Formal Specification Of CPython

The CPython implementation has grown and evolved significantly over the past ~25 years. In that time there have been many other projects to create compatible runtimes for your Python code. One of the challenges for these other projects is the lack of a fully documented specification of how and why everything works the way that it does. In the most recent Python language summit Mark Shannon proposed implementing a formal specification for CPython, and in this episode he shares his reasoning for why that would be helpful and what is involved in making it a reality.

10 November 2020


Bringing Artificial Intelligence Projects From Idea To Production

Bringing Artificial Intelligence Projects From Idea To Production

Artificial intelligence applications can provide dramatic benefits to a business, but only if you can bring them from idea to production. Henrik Landgren was behind the original efforts at Spotify to leverage data for new product features, and in his current role he works on an AI system to evaluate new businesses to invest in. In this episode he shares advice on how to identify opportunities for leveraging AI to improve your business, the capabilities necessary to enable aa successful project, and some of the pitfalls to watch out for. If you are curious about how to get started with AI, or what to consider as you build a project, then this is definitely worth a listen.

3 November 2020


Power Up Your Java Using Python With JPype

Power Up Your Java Using Python With JPype

Python and Java are two of the most popular programming languages in the world, and have both been around for over 20 years. In that time there have been numerous attempts to provide interoperability between them, with varying methods and levels of success. One such project is JPype, which allows you to use Java classes in your Python code. In this episode the current lead developer, Karl Nelson, explains why he chose it as his preferred tool for combining these ecosystems, how he and his team are using it, and when and how you might want to use it for your own projects. He also discusses the work he has done to enable use of JPype on Android, and what is in store for the future of the project. If you have ever wanted to use a library or module from Java, but the rest of your project is already in Python, then this episode is definitely worth a listen.

26 October 2020


The Journey To Replace Python's Parser And What It Means For The Future

The Journey To Replace Python's Parser And What It Means For The Future

The release of Python 3.9 introduced a new parser that paves the way for brand new features. Every programming language has its own specific syntax for representing the logic that you are trying to express. The way that the rules of the language are defined and validated is with a grammar definition, which in turn is processed by a parser. The parser that the Python language has relied on for the past 25 years has begun to show its age through mounting technical debt and a lack of flexibility in defining new syntax. In this episode Pablo Galindo and Lysandros Nikolaou explain how, together with Python's creator Guido van Rossum, they replaced the original parser implementation with one that is more flexible and maintainable, why now was the time to make the change, and how it will influence the future evolution of the language.

19 October 2020


Cloud Native Application Delivery Using GitOps

Cloud Native Application Delivery Using GitOps

The way that applications are being built and delivered has changed dramatically in recent years with the growing trend toward cloud native software. As part of this movement toward the infrastructure and orchestration that powers your project being defined in software, a new approach to operations is gaining prominence. Commonly called GitOps, the main principle is that all of your automation code lives in version control and is executed automatically as changes are merged. In this episode Victor Farcic shares details on how that workflow brings together developers and operations engineers, the challenges that it poses, and how it influences the architecture of your software. This was an interesting look at an emerging pattern in the development and release cycle of modern applications.

12 October 2020


Threading The Needle Of Interesting And Informative While You Learn To Code

Threading The Needle Of Interesting And Informative While You Learn To Code

Learning to code is a neverending journey, which is why it's important to find a way to stay motivated. A common refrain is to just find a project that you're interested in building and use that goal to keep you on track. The problem with that advice is that as a new programmer, you don't have the knowledge required to know which projects are reasonable, which are difficult, and which are effectively impossible. Steven Lott has been sharing his programming expertise as a consultant, author, and trainer for years. In this episode he shares his insights on how to help readers, students, and colleagues interested enough to learn the fundamentals without losing sight of the long term gains. He also uses his own difficulties in learning to maintain, repair, and captain his sailboat as relatable examples of the learning process and how the lessons he has learned can be translated to the process of learning a new technology or skill. This was a great conversation about the various aspects of how to learn, how to stay motivated, and how to help newcomers bridge the gap between what they want to create and what is within their grasp.

6 October 2020


Solving Python Package Creation For End User Applications With PyOxidizer

Solving Python Package Creation For End User Applications With PyOxidizer

Python is a powerful and expressive programming language with a vast ecosystem of incredible applications. Unfortunately, it has always been challenging to share those applications with non-technical end users. Gregory Szorc set out to solve the problem of how to put your code on someone else's computer and have it run without having to rely on extra systems such as virtualenvs or Docker. In this episode he shares his work on PyOxidizer and how it allows you to build a self-contained Python runtime along with statically linked dependencies and the software that you want to run. He also digs into some of the edge cases in the Python language and its ecosystem that make this a challenging problem to solve, and some of the lessons that he has learned in the process. PyOxidizer is an exciting step forward in the evolution of packaging and distribution for the Python language and community.

29 September 2020


Flexible Network Security Detection And Response With Grapl

Flexible Network Security Detection And Response With Grapl

Servers and services that have any exposure to the public internet are under a constant barrage of attacks. Network security engineers are tasked with discovering and addressing any potential breaches to their systems, which is a never-ending task as attackers continually evolve their tactics. In order to gain better visibility into complex exploits Colin O'Brien built the Grapl platform, using graph database technology to more easily discover relationships between activities within and across servers. In this episode he shares his motivations for creating a new system to discover potential security breaches, how its design simplifies the work of identifying complex attacks without relying on brittle rules, and how you can start using it to monitor your own systems today.

22 September 2020


Simplified Data Extraction And Analysis For Current Events With Newspaper

Simplified Data Extraction And Analysis For Current Events With Newspaper

News media is an important source of information for understanding the context of the world. To make it easier to access and process the contents of news sites Lucas Ou-Yang built the Newspaper library that aids in automatic retrieval of articles and prepare it for analysis. In this episode he shares how the project got started, how it is implemented, and how you can get started with it today. He also discusses how recent improvements in the utility and ease of use of deep learning libraries open new possibilities for future iterations of the project.

15 September 2020


Digging Into Dagster: An Opinionated Open Source Framework For Data Orchestration

Digging Into Dagster: An Opinionated Open Source Framework For Data Orchestration

Data applications are complex and continually evolving, often requiring collaboration across multiple teams. In order to keep everyone on the same page a high level abstraction is needed to facilitate a cross-cutting view of the data orchestration across integration, transformation, analytics, and machine learning. Dagster is an innovative new framework that leans on the power and flexibility of Python to provide an extensible interface to the complete lifecycle of data projects. In this episode Nick Schrock explains how he designed the Dagster project to allow for integration with the entire data ecosystem while providing an opinionated structure for connecting the different stages of computation. He also discusses how he is working to grow an open ecosystem around the Dagster project, and his thoughts on building a sustainable business on top of it without compromising the integrity of the community. This was a great conversation about playing the long game when building a business while providing a valuable utility to a complex problem domain.

7 September 2020

Skill Piper
HomeBlogAboutContactNewsletter

© 2022 Skill Piper. All rights reserved

Twitter