To improve Data Literacy, organizations need high-quality data training programs that give their employees the most valuable and relevant data skills they need. Many companies fall into the trap of implementing training programs that are poorly designed or not relevant for the needs of their learners.
Sharon Castillo is the VP of Global Education at DataRobot, where she developed the DataRobot University, a self-service education portal that features both free and paid courses on AI and machine learning that are available to the public. With over 30 years of experience, Sharon is a leading expert in data training and employee upskilling programs, from development through execution.
Sharon joins the show to talk about what makes an effective data training program, how to ensure employees retain the information, how to properly incentivize training participation, why organizations should prioritize training, and much more. This is essential listening for anyone developing a training program for their team or organization.
Data Analytics has played a major role in Chelsea’s journey to becoming the seventh most valuable football club in the world, Chelsea has won six league titles, eight FA Cups, five League Cups, and two Champions League titles. Today, we are going behind the scenes at Chelsea FC to see how they use data analytics to analyze matches, inform tactical decision-making, and drive matchday success in one of the world’s top football leagues, just in time for the 2022 FIFA World Cup in Qatar! Federico Bettuzzi is a Data Scientist at Chelsea FC. As a specialist in match analytics, Federico works with Chelsea’s first team to inform tactical decision making during matches. Federico joins the show to break down how he gathers and synthesizes data, how they develop match analyses for tactical reviews, how managers prioritize data analytics differently, how to balance long-term and short-term projects, and much more.
21 November 2022 •
To become a data-driven organization, it takes a major shift in mindset and culture, investments in technology and infrastructure, skills transformation, and clearly evangelizing the usefulness of using data to drive better decision-making. With all of these levers to scale, many organizations get stuck early in their data transformation journey, not knowing what to prioritize and how. In this episode, Ganes Kesari joins the show to share the frameworks and processes that organizations can follow to become data-driven, measure their data maturity, and win stakeholder support across the organization. Ganes is Co-Founder and Chief Decision Scientist at Gramener, which helps companies make data-driven decisions through powerful data stories and analytics. He is an expert in data, analytics, organizational strategy, and hands-on execution. Throughout his 20-year career, Ganes has become an internationally-renowned speaker and has been published in Forbes, Entrepreneur, and has become a thought leader in Data Science. Throughout the episode, we talk about how organizations can scale their data maturity, how to build an effective data science roadmap, how to successfully navigate the skills and people components of data maturity, and much more.
14 November 2022 •
During Data Literacy Month, we shared how data journalists curate and distill data stories to the wider public. Since 2020, Data Journalism has risen both in significance and visibility. Throughout the COVID-19 pandemic, data journalists have been instrumental in keeping the public informed by investigating, challenging, interpreting, and explaining complex datasets. In this episode, Betsy Ladyzhets joins the show to talk about the state of Data Journalism today, and shares from her experience as a data journalist Betsy is an independent science, health, and data journalist focused on COVID-19 and Founder of the COVID-19 Data Dispatch, an independent publication providing updates and resources on public COVID-19 data. She is also currently working as a Senior Journalism Fellow with the Documenting COVID-19 project at the Brown Institute for Media Innovation and MuckRock. Her work has been featured in Science News, FiveThirtyEight, MIT Tech Review, and the Covid Tracking Project. Throughout the show, we discuss the importance of letting data shape a narrative, what characteristics of traditional journalism are needed for data journalists, the best practices for delivering effective data stories, how the rise of AI and data visualization are impacting data journalism, and much more. Links shared during the episode: Data Sonification The COVID-19 Data Dispatch The Data Visualization Society Learning on DataCamp? Take part in this week’s XP-challenge: http://www.datacamp.com/promo/free-week-xp-challenge-2022
7 November 2022 •
Python has dominated data science programming for the last few years, but there’s another rising star programming language seeing increased adoption and popularity—Julia. As the fourth most popular programming language, many data teams and practitioners are turning their attention toward understanding Julia and seeing how it could benefit individual careers, business operations, and drive increased value across organizations. Zacharias Voulgaris, PhD joins the show to talk about his experience with the Julia programming language and his perspective on the future of Julia’s widespread adoption. Zacharias is the author of Julia for Data Science. As a Data Science consultant and mentor with 10 years of international experience that includes the role of Chief Science Officer at three startups, Zacharias is an expert in data science, analytics, artificial intelligence, and information systems. In this episode, we discuss the strengths of Julia, how data scientists can get started using Julia, how team members and leaders alike can transition to Julia, why companies are secretive about adopting Julia, the interoperability of Julia with Python and other popular programming languages, and much more. Check out this month’s events: https://www.datacamp.com/data-driven-organizations-2022 Take the Introduction to Julia course for free! https://www.datacamp.com/courses/introduction-to-julia
31 October 2022 •
While securing the support of senior executives is a major hurdle of implementing a data transformation program, it’s often one of the earliest and easiest hurdles to overcome in comparison to the overall program itself. Leading a data transformation program requires thorough planning, organization-wide collaboration, careful execution, robust testing, and so much more. Vanessa Gonzalez is the Senior Director of Data and Analytics for ML & AI at Transamerica. Vanessa has experience in data transformation, leadership, and strategic direction for Data Science and Data Governance teams, and is an experienced senior data manager. Vanessa joins the show to share how she is helping to lead Transamerica’s Data Transformation program. In this episode, we discuss the biggest challenges Transamerica has faced throughout the process, the most important factors to making any large-scale transformation successful, how to collaborate with other departments, how Vanessa structures her team, the key skills data scientists need to be successful, and much more. Check out this month’s events: https://www.datacamp.com/data-driven-organizations-2022
24 October 2022 •
As data leaders continue to fill their talent gap, how should they approach sourcing, retaining, and upskilling their talent? What strategies should data leaders adopt in order to accomplish their talent goals and become data-driven? Kyle Winterbottom joins the show to talk about the key differentiators between data teams that build talent-dense teams and those that do not. Kyle is the host of Driven by Data: The Podcast, the Founder & CEO of Orbition, a talent solutions provider, for scaling Data, Analytics, & Artificial Intelligence teams across the UK, Europe and the USA. As an accomplished expert and thought leader in talent acquisition, attraction, and retention, as well as scaling data teams, Kyle was named one of Data IQ’s 100 Most Influential People in Data for 2022. In this episode, we talk about how data teams can position themselves to attract top talent, how to properly articulate how data team members are adding value to the business, how organizations can accidentally set data leaders up to fail, how to approach upskilling, and how data leaders can create an employer branding narrative to attract top talent. Check out this month’s events: https://www.datacamp.com/data-driven-organizations-2022
17 October 2022 •
We have had many guests on the show to discuss how different industries leverage data science to transform the way they do business, but arguably one of the most important applications of data science is in space research and technology. Justin Fletcher joins the show to talk about how the US Space Force is using deep learning with telescope data to monitor satellites, potentially lethal space debris, and identify and prevent catastrophic collisions. Justin is responsible for artificial intelligence and autonomy technology development within the Space Domain Awareness Delta of the United States Space Force Space Systems Command. With over a decade of experience spanning space domain awareness, high performance computing, and air combat effectiveness, Justin is a recognized leader in defense applications of artificial intelligence and autonomy. In this episode, we talk about how the US Space Force utilizes deep learning, how the US Space Force publishes its research and data to find high-quality peer review, the must-have skills aspiring practitioners need in order to pursue a career in Defense, and much more.
3 October 2022 •
Throughout data literacy month, we’ve shined a light on the importance of data literacy skills and how it impacts individuals and organizations. Equally as important is how to actually approach transformational data literacy programs and ensure they are successful. In this final episode of Data Literacy Month, we are unpacking how CBRE is upskilling over 3,000 of its employees on data literacy skills through a relevant, high-value learning program. Emily Hayward is the Data and Digital Change Manager at CBRE, a global leader in commercial real estate services and investment. Emily is a transformational leader with a track record of leading successful high-profile technology, data, and cultural transformations across both the public and private sectors through an ardent belief that change cannot be achieved without first winning people over. Throughout the episode, we talk about Emily’s approach to building CBRE’s learning program, effective change management, why it’s critical to secure executive sponsorship, and much more. Looking to build a data literacy program of your own? Check out DataCamp for Business: https://bit.ly/3r7BgsF
26 September 2022 •
Understanding and interpreting data visualizations are one of the most important aspects of data literacy. When done well, data visualization ensures that stakeholders can quickly take away critical insights from data. Moreover, data visualization is often the best place to start when increasing organizational data literacy, as it’s often titled the “gateway drug” to more advanced data skills. Andy Cotgreave, Senior Data Evangelist at Tableau Software and co-author of The Big Book of Dashboards, joins the show to break down data visualization and storytelling, drawing from his 15-year career in the data space. Andy has spoken for events like SXSW, Visualized, and Tableau’s conferences and has inspired thousands of people to develop their data skills. In this episode, we discuss why data visualization skills are so essential, how data visualization increases organizational data literacy, the best practices for visual storytelling, and much more. This episode of DataFramed is a part of DataCamp’s Data Literacy Month, where we raise awareness about Data Literacy throughout September through webinars, workshops, and resources featuring thought leaders and subject matter experts that can help you build your data literacy, as well as your organization’s. For more information, visit: https://www.datacamp.com/data-literacy-month/for-teams
19 September 2022 •
Data Literacy may be an important skill for everyone to have, but the level of need is always unique to each individual. Some may need advanced technical skills in machine learning algorithms, while others may just need to be able to understand the basics. Regardless of where anyone sits on the skills spectrum, the data community can help accelerate their careers. There’s no one who knows that better than Kate Strachnyi. Kate is the Founder and Community Manager at DATAcated, a company that is focused on bringing data professionals together and helping data companies reach their target audience through effective content strategies. Kate has created courses on data storytelling, dashboard and visualization best practices, and she is also the author of several books on data science, including a children’s book about data literacy. Through her professional accomplishments and her content efforts online, Kate has not only built a massive online following, she has also established herself as a leader in the data space. In this episode, we talk about best practices in data visualization, the importance of technical skills and soft skills for data professionals, how to build a personal brand and overcome Imposter Syndrome, how data literacy can make or break organizations, and much more. This episode of DataFramed is a part of DataCamp’s Data Literacy Month, where we raise awareness for Data Literacy throughout the month of September through webinars, workshops, and resources featuring thought leaders and subject matter experts that can help you build your data literacy, as well as your organization’s. For more information, visit: https://www.datacamp.com/data-literacy-month/for-teams
12 September 2022 •
Data Literacy is increasingly becoming a skill that every role needs to have, regardless of whether their role a data-oriented or not. No one knows this better than Jordan Morrow, who is known as the Godfather of Data Literacy. Jordan is the VP and Head of Data Analytics at Brainstorm, Inc., and is the author of Be Data Literate: The Skills Everyone Needs to Succeed.Jordan has been a fierce advocate for data literacy throughout his career, including helping the United Nations understand and utilize data literacy effectively. Throughout the episode, we define data literacy, why organizations need data literacy in order to use data properly and drive business impact, how to increase organizational data literacy, and more. This episode of DataFramed is a part of DataCamp’s Data Literacy Month, where we raise awareness for Data Literacy throughout the month of September through webinars, workshops, and resources featuring thought leaders and subject matter experts that can help you build your data literacy, as well as your organization’s. For more information, visit: https://www.datacamp.com/data-literacy-month/for-teams
5 September 2022 •
Taking inspiration from International Literacy Day on September 8, DataCamp is dedicating the whole month of September to raising awareness about Data Literacy. Throughout the month, we are featuring thought leaders and subject matter experts in order to get you Data Literacy, and we can’t wait for you to hear the exceptional guests we have lined up for you right here on DataFramed. Check out the full lineup of events.
2 September 2022 •
Many times, data scientists can fall into the trap of resume-driven development. As in, learning the shiniest, most advanced technique available to them in an attempt to solve a business problem. However, this is not what a learning mindset should look like for data teams. As it turns out, taking a step back and focusing on the fundamentals and step-by-step iteration can be the key to growing as a data scientist, because when data teams develop a strong understanding of the problems and solutions lying underneath the surface, they will be able to wield their tools with complete mastery. Ella Hilal joins the show to share why operating from an always-learning mindset will open up the path to a true mastery and innovation for data teams. Ella is the VP of Data Science and Engineering for Commercial and Service Lines at Shopify, a global commerce leader that helps businesses of all size grow, market, and manage their retail operations. Recognized as a leading woman in Data science, Internet of things and Machine Learning, Ella has over 15 years of experience spanning multiple countries, and is an advocate for responsible innovation, women in tech, and STEM. In this episode, we talk about the biggest mistakes data scientists make when solving business problems, how to create cohesion between data teams and the broader organization, how to be an effective data leader that prioritizes their team’s growth, and how developing an always-learning mindset based on iteration, experimentation, and deep understanding of the problems needing to be solved can accelerate the growth of data teams.
29 August 2022 •
Most companies experience the same pain point when working with data: it takes too long to get the right data to the right people. This creates a huge opportunity for data scientists to find innovative solutions to accelerate that process. One very effective method is to implement real-time data solutions that can increase business revenue and make it easier for anyone relying on the data to access the data they need, understand it, and make accurate decisions with it. George Trujillo joins the show to share how he believes real-time data has the potential to completely transform the way companies work with data. George is the Principal Data Strategist at DataStax, a tech company that helps businesses scale by mobilizing real-time data on a single, unified stack. With a career spanning 30 years and companies like Charles Schwab, Fidelity Investments, and Overstock.com, George is an expert in data-driven executive decision-making and tying data initiatives to tangible business value outcomes. In this episode, we talk about the real-world use cases of real-time analytics, why reducing data complexity is key to improving the customer experience, the common problems that slow data-driven decision-making, and how data practitioners can start implementing real-time data through small high-value analytical assets.
22 August 2022 •
Machine learning models are often thought to be mainly utilized by large tech companies that run large and powerful models to accomplish a wide array of tasks. However, machine learning models are finding an increasing presence in edge devices such as smart watches. ML engineers are learning how to compress models and fit them into smaller and smaller devices while retaining accuracy, effectiveness, and efficiency. The goal is to empower domain experts in any industry around the world to effectively use machine learning models without having to become experts in the field themselves. Daniel Situnayake is the Founding TinyML Engineer and Head of Machine Learning at Edge Impulse, a leading development platform for embedded machine learning used by over 3,000 enterprises across more than 85,000 ML projects globally. Dan has over 10 years of experience as a software engineer, which includes companies like Google (where he worked on TensorFlow Lite) and Loopt, and co-founded Tiny Farms America’s first insect farming technology company. He wrote the book, "TinyML," and the forthcoming "AI at the Edge". Daniel joins the show to talk about his work with EdgeML, the biggest challenges facing the field of embedded machine learning, the potential use cases of machine learning models in edge devices, and the best tips for aspiring machine learning engineers and data science practitioners to get started with embedded machine learning.
15 August 2022 •
Many machine learning practitioners dedicate most of their attention to creating and deploying models that solve business problems. However, what happens post-deployment? And how should data teams go about monitoring models in production? Hakim Elakhrass is the Co-Founder and CEO of NannyML, an open-source python library that allows users to estimate post-deployment model performance, detect data drift, and link data drift alerts back to model performance changes. Originally, Hakim started a machine learning consultancy with his NannyML co-founders, and the need for monitoring quickly arose, leading to the development of NannyML. Hakim joins the show to discuss post-deployment data science, the real-world use cases for tools like NannyML, the potentially catastrophic effects of unmonitored models in production, the most important skills for modern data scientists to cultivate, and more.
8 August 2022 •
One of the biggest challenges facing the adoption of machine learning and AI in Data Science is understanding, interpreting, and explaining models and their outcomes to produce higher certainty, accountability, and fairness. Serg Masis is a Climate & Agronomic Data Scientist at Syngenta and the author of the book, Interpretable Machine Learning with Python. For the last two decades, Serg has been at the confluence of the internet, application development, and analytics. Serg is a true polymath. Before his current role, he co-founded a search engine startup incubated by Harvard Innovation Labs, was the proud owner of a Bubble Tea shop, and more. Throughout the episode, Serg spoke about the different challenges affecting model interpretability in machine learning, how bias can produce harmful outcomes in machine learning systems, the different types of technical and non-technical solutions to tackling bias, the future of machine learning interpretability, and much more.
1 August 2022 •
Anjali Samani, Director of Data Science & Data Intelligence at Salesforce, joins the show to discuss what it takes to become a mature data organization and how to build an impactful, diverse data team. As a data leader with over 15 years of experience, Anjali is an expert at assessing and deriving maximum value out of data, implementing long-term and short-term strategies that directly enable positive business outcomes, and how you can do the same. You will learn the hallmarks of a mature data organization, how to measure ROI on data initiatives, how Salesforce implements its data science function, and how you can utilize strong relationships to develop trust with internal stakeholders and your data team.
25 July 2022 •
In 2020, OpenAI launched GPT-3, a large language AI model that is demonstrating the potential to radically change how we interact with software, and open up a completely new paradigm for cognitive software applications. Today’s episode features Sandra Kublik and Shubham Saboo, authors of GPT-3: Building Innovative NLP Products Using Large Language Models. We discuss what makes GPT-3 unique, transformative use-cases it has ushered in, the technology powering GPT-3, its risks and limitations, whether scaling models is the path to “Artificial General Intelligence”, and more. Announcement For the next seven days, DataCamp Premium and DataCamp for Teams are free. Gain free access by following going here.
18 July 2022 •
While leading a mature data science function is a challenge in its own right, building one from scratch at an organization can be just as, if not even more, difficult. As a data leader, you need to balance short-term goals with a long-term vision, translate technical expertise into business value, and develop strong communication skills and an internalized understanding of a business's values and goals in order to earn trust with key stakeholders and build the right team. Elettra Damaggio is no stranger to this process. Elettra is the Director for Global Data Science at StoneX, an institutional-grade financial services network that connects clients to the global markets ecosystem. Elettra has over 10 years of experience in machine learning, AI, and various roles within digital transformation and digital business growth. In this episode, she shares how data leaders can balance short-term wins with long-term goals, how to earn trust with stakeholders, major challenges when launching a data science function, and advice she has for new and aspiring data practitioners.
11 July 2022 •
In pharmaceuticals, wrong decisions can not only cost a company revenue, but they can also cost people their lives. With stakes so high, it’s vital that pharmaceutical companies have robust systems and processes in place to accurately gather, analyze, and interpret data and turn it into actionable steps to solving health issues. Suman Giri is the Global Head of Data Science of the Human Health Division at Merck, a biopharmaceutical research company that works to develop innovative health solutions for both people and animals. Suman joins the show today to share how Merck is using data to improve organizational decision-making, medical research outcomes, and how data science is transforming the pharmaceutical industry at scale. He also shares some of the biggest challenges facing the industry right now and what new trends are on the horizon.
4 July 2022 •
Building data science functions has become tables takes for many organizations today. However, before data science functions were needed, the finance function acted as the insights layer for many organizations over the past. This means that working in finance has become an effective entry point into data science function for professionals across all spectrums. Brian Richardi is the Head of Finance Data Science and Analytics at Stryker, a medical equipment manufacturing company based in Michigan, US. Brian brings over 14 years of global experience to the table. At Stryker, Brian leads a team of data scientists that use business data and machine learning to make predictions for optimization and automation. In this episode, Brian talks about his experience as a data science leader transitioning from Finance, how he utilizes collaboration and effective communication to drive value, how leads the data science finance function at Stryker, and what the future of data science looks like in the finance space, and more.
27 June 2022 •
Democratizing data, and developing data culture in large enterprise organizations is an incredibly complex process that can seem overwhelming if you don’t know where to start. And today’s guest draws a clear path towards becoming data-driven. Meenal Iyer, Sr. Director for Data Science and Experimentation at Tailored Brands, Inc., has over 20 years of experience as a Data and Analytics strategist. She has built several data and analytics platforms and drives the enterprises she works with to be insights-driven. Meenal has also led data teams at various retail organizations, and as a wide variety of specialties in Data Science, including data literacy programs, data monetization, machine learning, enterprise data governance, and more. In this episode, Meenal shares her thorough, effective, and clear strategy for democratizing data successfully and how that helps create a successful data culture in large enterprises, and gives you the tools you need to do the same in your organization. [Announcement] Join us for DataCamp Radar, our digital summit on June 23rd. During this summit, a variety of experts from different backgrounds will be discussing everything related to the future of careers in data. Whether you're recruiting for data roles or looking to build a career in data, there’s definitely something for you. Seats are limited, and registration is free, so secure your spot today on https://events.datacamp.com/radar/
20 June 2022 •
When many people talk about leading effective Data Science teams in large organizations, it’s easy for them to forget how much effort, intentionality, vision, and leadership are involved in the process. Glenn Hofmann, Chief Analytics Officer at New York Life Insurance, is no stranger to that work. With over 20 years of global leadership experience in data, analytics, and AI that spans the US, Germany, and South Africa, Glenn knows firsthand what it takes to build an effective data science function within a large organization. In this episode, we talk about how he built NeW York Life Insurance’s 50-person data science and AI function, how they utilize skillsets to offer different career paths for data scientists, building relationships across the organization, and so much more. [Announcement] Join us for DataCamp Radar, our digital summit on June 23rd. During this summit, a variety of experts from different backgrounds will be discussing everything related to the future of careers in data. Whether you're recruiting for data roles or looking to build a career in data, there’s definitely something for you. Seats are limited, and registration is free, so secure your spot today on https://events.datacamp.com/radar/
13 June 2022 •
The healthcare industry presents a set of unique challenges for data science, including how to manage and work with sensitive patient information and accounting for the real-world impact of AI and machine learning on patient care and experience. Curren Katz, Senior Director for Data Science & Project Management at Johnson & Johnson, believes that despite challenges like these, there are massive opportunities for data science and machine learning to increase care quality, drive business objectives, diagnose diseases earlier, and ultimately save countless lives around the world. Curren has over 10 years of leadership experience across both the US and Europe and has led more than 20 successful data science product launches in the payer, provider, and pharmaceutical spaces. She also brings her background as a cognitive neuroscientist to data science, with research in neural networks, connectivity analysis, and more. [Announcement] Join us for DataCamp Radar, our digital summit on June 23rd. During this summit, a variety of experts from different backgrounds will be discussing everything related to the future of careers in data. Whether you're recruiting for data roles or looking to build a career in data, there’s definitely something for you. Seats are limited, and registration is free, so secure your spot today on https://events.datacamp.com/radar/
6 June 2022 •
Today marks the last episode of our four-part DataFramed Careers Series on breaking into a data career. We’ve heard from Sadie St Lawrence, Nick Singh, and Khuyen Tran on best practices to adopt to help you land a data science interview. But what about the interview itself? Today’s guest, Jay Feng, joins the show to break down all the most important things you need to know about interviewing for data science roles. Jay is the co-founder of Interview Query, which helps data scientists, machine learning engineers, and other data professionals prepare for their dream jobs. Throughout the episode, we discuss The anatomy of data science interviews Biggest misconceptions and mistakes candidates make during interviews The importance of showcasing communication ability, business acumen, and technical intuition in the interview How to negotiate for the best salary possible [Announcement] Join us for DataCamp Radar, our digital summit on June 23rd. During this summit, a variety of experts from different backgrounds will be discussing everything related to the future of careers in data. Whether you're recruiting for data roles or looking to build a career in data, there’s definitely something for you. Seats are limited, and registration is free, so secure your spot today on https://events.datacamp.com/radar/
2 June 2022 •
Today is the third episode of this four-part DataFramed Careers series being published every day this week on building a career in data. We’ve heard from Nick Singh on the importance of portfolio projects, as well as the distinction between content-based and coding-based portfolio projects. When looking to get started with content-based projects, how do you move forward with getting yourself out there and sharing the work despite being a relative beginner in the field?Today’s guest tackles exactly this subject. Khuyen Tran is a developer advocate at prefect and a prolific data science writer. She is the author of the book “Efficient Python Tricks and Tools for Data Scientists” and has written 100s of blog-articles and tutorials on key data science topics, amassing thousands of followers across platforms. Her writing has been key to accelerating here data career opportunities. Throughout the episode, we discuss: How content creation accelerates the careers of aspiring practitioners The content creation process How to combat imposter syndrome What makes content useful Advice and feedback for aspiring data science writers Resources mentioned in the episode: Analyze and Visualize URLs with Network Graph Show Your Work by Austin Cloud Mastery by Robert Greene Deep Questions with Cal Newport Podcast [Announcement] Join us for DataCamp Radar, our digital summit on June 23rd. During this summit, a variety of experts from different backgrounds will be discussing everything related to the future of careers in data. Whether you're recruiting for data roles or looking to build a career in data, there’s definitely something for you. Seats are limited, and registration is free, so secure your spot today on https://events.datacamp.com/radar/
1 June 2022 •
Today marks the second episode in our DataFramed Careers Series. In this series, we will interview a diverse range of thought leaders and experts on the different aspects of landing a data role in 2022. In the first episode of the series, Sadie discussed at great length the importance of having a solid data science portfolio to land a role in data. But what makes a great data science portfolio? Nick Singh, co-author of Acing the Data Science Interview, joins the show to share everything you need to know to create high-quality, thorough portfolio projects. Throughout the episode, we discuss How portfolio projects build experience Who should be focusing on portfolio projects The different types of portfolio projects Biggest pitfalls when creating portfolio projects How to get noticed with your portfolio projects Concrete examples of great portfolio projects [Announcement] Join us for DataCamp Radar, our digital summit on June 23rd. During this summit, a variety of experts from different backgrounds will be discussing everything related to the future of careers in data. Whether you're recruiting for data roles or looking to build a career in data, there’s definitely something for you. Seats are limited, and registration is free, so secure your spot today on https://events.datacamp.com/radar/
31 May 2022 •
Today is the start of a four-day careers series covering breaking into data science in 2022. With so so much demand for data jobs today, we wanted to demystify the ins and outs of accelerating a career in data. In this series, we will interview a diverse range of thought leaders and experts on the different aspects of standing out from the crowd in the job hunt. Our first guest in the DataFramed Careers Series is Sadie St. Lawrence. Sadie St Lawrence is the Founder and CEO of Women in Data, the #1 Community for Women in AI and Tech. Women in Data is a community of over 20,000 individuals and has representation in 17 countries and 50 cities. She has trained over 350,000 people in data science and is the course developer for the Machine Learning Certification for UC Davis. In addition, she serves on multiple start-up boards, and is the host of the Data Bytes podcast. Sadie joins the show to talk about her career journey in data science and shares the best lessons she has learned in launching data careers. Throughout the episode, we discuss The different types of data career paths available How to break into your data science career How to build strong mentor/mentee relationships Best practices to stand out in a competitive industry Building a strong resume and standing out from the crowd [Announcement] Join us for DataCamp Radar, our digital summit on June 23rd. During this summit, a variety of experts from different backgrounds will be discussing everything related to the future of careers in data. Whether you're recruiting for data roles or looking to build a career in data, there’s definitely something for you. Seats are limited, and registration is free, so secure your spot today on https://events.datacamp.com/radar/
30 May 2022 •
Introducing the DataFramed Careers Series. Over the past year hosting the DataFramed podcast, we've had the incredible privilege of having biweekly conversations with data leaders at the forefront of the data revolution. This has led to fascinating conversations on the future of the modern data stack, the future of data skills, and how to build organizational data literacy. However, as the DataFramed podcast grows, we want to be able to provide the data science community across the spectrum from practitioners to leaders, with distilled insights that will help them manoeuvre their careers effectively. And we want to do that more often. This is why we’re excited to announce the launch of a four-day DataFramed Careers Series. Throughout next week, we will interview four different thought leaders and experts about what it takes to break into data science in 2022, best practices to stand out from the crowd, building a brand in data science, and more. Moreover, this episode series will mark DataFramed’s transition from biweekly to weekly. Starting Monday the 30th of May, DataFramed will become a weekly podcast. For next week’s DataFramed Careers Series, we’ll be covering the ins and outs of building a career in data, and the different aspects of standing out from the crowd during the job hunt. We’ll be hearing from Sadie St Lawrence, CEO and Founder of Women in Data on what it takes to launch a data career in 2022. Nick Singh, Co-author of Ace the Data Science Interview and 2nd time guest of DataFramed will join us to discuss what makes a great data science portfolio project. Khuyen Tran, Developer Advocate at Prefect on will outline how writing can accelerate a data career, and Jay Feng, CEO of Interview Query will join us to provide tips and frameworks on acing the data science interview. For future DataFramed episodes, we’ll definitely still cover the different aspects of building a data-driven organization, cover the latest advancements in data science, building data careers, and more. So expect more varied guests, topics, and more specials series like this one in the future.
27 May 2022 •
Data literacy at any organization takes buy-in from all levels of the company, from C-suite leaders all the way to customer-facing team members. But how do you get that buy-in, build a team around data literacy, and transform the way your company works with data? Today’s guest, Megan Brown, Director of Data Literacy and Knowledge Management at Starbucks, discusses what they have done to forge data culture and data literacy at Starbucks. Throughout the episode, we discuss How to increase data literacy in an organization How to secure executive sponsorship for data initiatives The importance of user experience research in building data literacy Balancing short-term business needs with long-term strategic upskilling Humanizing machine learning and AI within the organization
16 May 2022 •
Diversity in both skillset and experience are at the core of high-impact data teams, but how can you take your data team’s impact to the next level with subject matter expertise, attention to user experience, and mentorship? Today’s guest, Dan Kellet, Chief Data Officer at Capital One UK, joins us to discuss how he scaled Capital One’s data team. Throughout the episode, we discuss: The hallmarks of a high-impact data team The importance of skills and background diversity when building great data teams The importance of UX skills when developing data products The specific challenges of leading data teams in financial services
2 May 2022 •
As data volumes grow and become ever-more complex, the role of the data analyst has never been more important. At the disposal of the modern data analyst, are tools that reduce time to insight, and increase collaboration. However, as the tools of a data analyst evolve, so do the skills. Today’s guest, Peter Fishman, Co-Founder at Mozart Data, speaks to this exact notion. Join us as we discuss: Defining a data-driven organization & main challenges Breaking down the modern data stack & what it means What makes a great data analyst How data analysts can develop deep subject matter expertise in the areas they serve Find every episode of DataFramed on Apple, Spotify, and more. Find us on our website and join the conversation on LinkedIn. Listening on a desktop and can’t see the links? Just search for DataFramed in your favorite podcast player.
17 April 2022 •
When you hear the term-digital first, you might think about tech, platforms and data. But digital transformation succeeds when you put people first. Gathering and analyzing data, then using it to provide the customer value and an unparalleled experience, is vital for an organization’s success. Today’s guest, Bhavin Patel, Director o f Analytics and Innovation at J&J joins the show to share why people are the most important component to digital transformation. Join us as we discuss: Why you need to put people first The importance of customer value and experience Why digital transformation is an ongoing process, not an end-state Find every episode of DataFramed on Apple, Spotify, and more. Find us on our website and join the conversation on LinkedIn. Listening on a desktop and can’t see the links? Just search for DataFramed in your favorite podcast player.
3 April 2022 •
The data journey is a slow painstaking process. But knowing where to start and the areas to focus on can help any organization reach its goals faster. Today’s guest, Vijay Yadav, Director of Quantitative Sciences & Head of Data Science at the Center for Mathematical Sciences at Merck, explains the 6 key elements of data strategy, complete with advice on how to navigate each. Join us as we discuss: The different components of a data strategy Shifting mindset within the C-Suite Structuring the operating model Enabling people to work with data at scale Most effective tactics to kickstart a community around data science Find every episode of DataFramed on Apple, Spotify, and more. Find us on our website and join the conversation on LinkedIn. Listening on a desktop and can’t see the links? Just search for DataFramed in your favorite podcast player.
21 March 2022 •
It’s no secret that data science jobs are on the rise; but data skills across the board are rising — leading to what today’s guest calls “hybrid jobs.” This will require a paradigm shift in how we think about jobs and skills. Today’s guest, Matt Sigelman, President of The Burning Glass Institute & Chairman of Emsi Burning Glass, talks about the difficulties of connecting companies with top talent, the hybridization of many positions, and how to position yourself in the ever-changing market. Join us as we discuss: The methodology of using data science on the labor market The demand for data skills & how they’re evolving Blending skills to get ahead in the job market & the rise of subskills How educational institutions can prepare students for hybridization Advice to the audience on how to structure their approach to skill acquisition Find every episode of DataFramed on Apple, Spotify, and more. Find us on our website and join the conversation on LinkedIn. Listening on a desktop and can’t see the links? Just search for DataFramed in your favorite podcast player.
7 March 2022 •
Throughout the middle east, efforts are underway to build smart cities from the ground up. But to create a modern, intelligently-designed city, you first need to lay a solid foundation. And the strongest foundation you can build a smart city upon is data. In today’s episode, we speak with Kaveh Vessali, Digital, Data & AI Leader, PwC Middle East, about the intersection between data and public policy and the many exciting insights he’s gained from his role delivering smart cities and data transformation projects within the public sector in the middle east. Join us as we discuss: The important role data plays in shaping public policy What goes into designing a smart city The change management skills vital for successful digital transformation Data ethics and the importance of transparency Find every episode of DataFramed on Apple, Spotify, and more. Find us on our website and join the conversation on LinkedIn. Listening on a desktop and can’t see the links? Just search for DataFramed in your favorite podcast player.
21 February 2022 •
When most people hear digital transformation, it’s almost always the technology that first springs to mind. That’s a mistake. You can have the most sophisticated tech stack in the world, but if you don't build your organization’s data culture, your digital transformation efforts will be for naught. Today’s guest, Mai AlOwaish, Chief Data Officer at Gulf Bank, knows this better than anyone. As the first female CDO in Kuwait, she’s on a mission to ensure everyone at Gulf Bank becomes an expert in the data they use every day. Join us as we discuss: Why data and people are more important than technology for digital transformation The pioneering Data Ambassador program Mai spearheaded at Gulf Bank The importance of diversity in data science and technology overall Find every episode of DataFramed on Apple, Spotify, and more. Find us on our website and join the conversation on LinkedIn. Listening on a desktop and can’t see the links? Just search for DataFramed in your favorite podcast player.
7 February 2022 •
As we enter the new year—it seems like we’re telescoping into the future of work. Companies embracing remote work, the great resignation putting pressure on teams to create more fulfilling roles—signals an expanding opportunity for applicants to find their dream roles in data science, but also for hiring managers to create awesome candidate experiences. Today’s guests, Nick Singh, and Kevin Huo, authors of Ace The Data Science Interview, discuss how aspiring data scientists and data scientists can stand out from their crowd—and what hiring managers need to change to win over talent today. Join us as we discuss: How to wow recruiters and hiring managers with your resume The type of skills aspiring data scientists need to show on the job hunt The value of direct email over job listings What recruiters and hiring managers need to change in an evolving job market Relevant links from the interview: Ace the Data Science Interview Follow Nick Singh on LinkedIn Follow Kevin Huo on LinkedIn Noah Gift’s Appearance on DataFramed Sign up to gain early access to gain DataCamp Talent—DataCamp’s portal for data science jobs
24 January 2022 •
In this episode of DataFramed, we speak with Vishnu V Ram, VP of Data Science and Engineering at Credit Karma about how data science is being leveraged to increase financial inclusion. Throughout the episode, Vishnu discusses his background, Credit Karma’s mission, how data science is being used at Credit Karma to lower the barrier to entry for financial products, how he managed a data team through rapid growth, transitioning to Google Cloud, exciting trends in data science, and more. Relevant links from the interview: You can now learn data science with your team for free—try out DataCamp Professional with our 14-day free trial. Data roles at Credit Karma Credit Karma’s mission
29 November 2021 •
In this episode of DataFramed, we speak with Andy Cotgreave, Technical Evangelist at Tableau about the role of data storytelling when driving change with analytics, and the importance of the analyst role within a data-driven organization. Throughout the episode, Andy discusses his background, the skills every analyst should know to equip organizations with better data-driven decision making, his best practices for data storytelling, how he thinks about data literacy and ways to spread it within the organization, the importance of community when creating a data-driven organization, and more. Relevant links from the interview: We’d love your feedback! Let us know which topics you’d like us to cover and what you think of DataFramed by answering this 30-second survey Check out our upcoming webinar with Andy Check out Andy's book Become a Tableau expert
15 November 2021 •
In this episode of DataFramed, we speak with Brian Campbell, Engineering Manager at Lucid Software about managing data science projects effectively and harnessing the power of collaboration. Throughout the episode, Brian discusses his background, how data leaders can become better collaborators, data science project management best practices, the type of collaborators data teams should seek out, the latest innovations in the data engineering tooling space, and more. Relevant links from the interview: We’d love your feedback! Let us know which topics you’d like us to cover and what you think of DataFramed by answering this 30-second survey Lucid’s Tech Blog
1 November 2021 •
In this episode of DataFramed, we speak with Shameek Kundu, former group CDO at Standard Chartered Bank, and Chief Strategy Officer & Head of Financial Services at TruEra Inc about Scaling AI Adoption throughout financial services. Throughout the episode, Shameek discusses his background, the state of data transformation in financial services, the depth vs breadth of machine learning operationalization in financial services today, the challenges standing in the way of scalable AI adoption in the industry, the importance of data literacy, the trust and responsibility challenge of AI, the future of data science in financial services, and more. Relevant links from the interview: We’d love your feedback! Let us know which topics you’d like us to cover and what you think of DataFramed by answering this 30-second survey Check out TruEra in action Bank of England Report: The impact of Covid on machine learning and data science in UK Banking MIT Tech Review — Hundreds of AI tools have been built to catch covid. None of them helped
18 October 2021 •
In this episode of DataFramed, we speak with Syafri Bahar, VP of Data Science at Gojek about building high-performing data teams, and how data science is central to Gojek’s success. Throughout the episode, Syafri discusses his background, the hallmarks of a high-performance data team, how he measures the ROI on data activities, the skills needed in every successful data team, what is the best organizational model for data mature organizations, how Covid-19 affected Gojek’s data teams, his thoughts on data literacy and governance, future trends in data science and AI, and why data scientists should sharpen their maths and machine learning skills in an age of increasing automation. Relevant links from the interview: We’d love your feedback! Let us know which topics you’d like us to cover and what you think of DataFramed by answering this 30-second survey Gojek’s Data Blog
4 October 2021 •
In this episode of DataFramed, we speak with Noah Gift, founder of Pragmatic AI Labs and prolific author about operationalizing machine learning in organizations and his new book Practical MLOPs. Throughout the episode, Noah discusses his background, his philosophy around pragmatic AI, the differences between data science in academia and the real world, how data scientists can become more action-oriented by creating solutions that solve real-world problems, the importance of dev-ops, his most recent book on the practical guide to MLOps, how data science can be compared to Brazilian jiu-jitsu, what data scientists should learn to scale the amount of value they deliver, his thoughts on auto-ml and automation, and more. Relevant links from the interview: We’d love your feedback! Let us know which topics you’d like us to cover and what you think of DataFramed by answering this 30-second survey Unsettled: What Climate Science Tells Us, What It Doesn't, and Why It Matters Check out Noah's books Check out Noah's course on DataCamp Connect with Noah on LinkedIn Gain access to DataCamp's full course library at a discount!
20 September 2021 •
In this episode of DataFramed, we speak with Rick Scavetta and Boyan Angelov about their new book, Python and R for the Modern Data Scientist: The Best of Both Worlds, and how it dawns the start of a new bilingual data science community. Throughout the episode, Rick and Boyan discuss the history of Python and R, what led them to write the book, how Python and R can be interoperable, the advantages of each language and where to use it, how beginner data scientists should think about learning programming languages, how experienced data scientists can take it to the next level by learning a language they’re not necessarily comfortable with, and more. Relevant links from the interview: We’d love your feedback! Let us know which topics you’d like us to cover and what you think of DataFramed by answering this 30-second survey Check out Rick and Boyan’s book Check out Rick’s courses on DataCamp Check out Boyan's other books Connect with Rick on LinkedIn Connect with Boyan on LinkedIn
6 September 2021 •
In this episode of DataFramed, we speak with Brent Dykes, Senior Director of Insights & Data Storytelling at Blast Analytics and author of Effective Data Storytelling: How to Turn Insights into Action on how data storytelling is shaping the analytics space. Throughout the episode, Brent talks about his background, what made him write a book on effective data storytelling, how data storytelling is often misinterpreted and misused, the psychology of storytelling and how humans are shaped to resonate with it, the role of empathy when creating data stories, the blueprint of a successful data story, what data scientists can do to become better data storytellers, the future of augmented analytics and data storytelling, and more. Relevant links from the interview: Connect with Brent on LinkedIn Register for Brent's Webinar on DataCamp Check out Brent's Book
23 August 2021 •
In this episode of DataFramed, Adel speaks with Maria Luciana Axente, Responsible AI and AI for Good Lead at PwC UK on the state and future of responsible AI.Throughout the episode, Maria talks about her background, the differences & intersections between "AI ethics" and "Responsible AI", the state of responsible AI adoption within organizations, the link between responsible AI and organizational culture, what data scientists can do today to ensure they're part of their organization's responsible AI journey, and more. Relevant links from the interview: Connect with Maria on LinkedIn Kate Crawford's Atlas of AI 9 Ethical AI Principles for Organizations to Follow PwC's Responsible AI Toolkit Read our Data Literacy for Responsible AI White Paper
9 August 2021 •
In this episode of DataFramed, Adel speaks with Alessya Visnjic, CEO and co-founder of WhyLabs, an AI Observability company on a mission to build the interface between AI and human operators. Throughout the episode, Alessya talks about the unique challenges data teams face when operationalizing machine learning that spurred the need for MLOps, how MLOps intersects and diverges with different terms such as DataOps, ModelOps, and AIOps, how and when organizations should get started on their MLOps journey, the most important components of a successful MLOps practice, and more. Relevant links from the interview: Connect with Alessya on LinkedIn Andrew Ng on the important of being data-centric Joe Reis on the data culture and all things data whylogs: the standard for data logging — please send you feedback, contribute, help us build integrations into your favorite data tools and extend the concept of logging to new data types. Join the effort of building a new open standard for data logging! Try the WhyLabs platform
26 July 2021 •
In this episode of DataFramed, Adel speaks with Sudaman Thoppan Mohanchandralal, Regional Chief Data, and Analytics Officer at Allianz Benelux, on the importance of building data cultures and his experiences operationalizing data culture transformation programs.Throughout the episode, Sudaman talks about his background, the Chief Data Officer’s mandate and how it has evolved over the years, how organizations should prioritize building data cultures, the science behind culture change, the importance of executive data literacy when scaling value from data, and more. Relevant links from the interview: Connect with Sudaman on LinkedIn Check out Sudaman’s Webinar on DataCamp Why Data Culture Matters
12 July 2021 •
In this episode of DataFramed, Adel speaks with Elad Cohen, VP of Data Science and Research at Riskified on how data science is being used to combat fraud in eCommerce.Throughout the episode, Elad talks about his background, the plethora of data science use-cases in eCommerce, how Riskified builds state-of-the-art fraud detection models, common pitfalls data teams face, his best practices gaining organizational buy-in for data projects, how data scientists should focus on value, whether they should have engineering skills, and more. Relevant links from the interview: Connect with Elad on LinkedIn Register for our upcoming webinars How Riskified chooses what to research
28 June 2021 •
In this episode of DataFramed, Adel speaks with Barr Moses, CEO, and co-founder of Monte Carlo on the importance of data quality and how data observability creates trust in data throughout the organization. Throughout the episode, Barr talks about her background, the state of data-driven organizations and what it means to be data-driven, the data maturity of organizations, the importance of data quality, what data observability is, and why we’ll hear about it more often in the future. She also covers the state of data infrastructure, data meshes, and more. Relevant links from the interview: Connect with Barr on LinkedIn Learn more about data meshes Check out the Monte Carlo blog DataCamp's Guide to Organizational Data Maturity
14 June 2021 •
In this episode of DataFramed, Adel speaks with Sergey Fogelson, Vice President of Data Science and Modeling at Viacom on how data science has evolved over the past decade, and the remaining large-scale challenges facing data teams today. Throughout the episode, Sergey deep-dives into his background, the various projects he’s been involved with throughout his career, the most exciting advances he’s seen in the data science space, the largest challenges facing data teams today, best practices democratizing data, the importance of learning SQL, and more. Relevant links from the interview: Connect with Sergey on LinkedIn Check out Sergey’s course on DataCamp Learn more about Airflow Learn more about PySpark Learn more about SQL More resources from DataCamp Upskill your team with DataCamp Our Guide on Open Source Software in Data Science Your Organization’s Guide to Data Maturity
31 May 2021 •
In this episode of DataFramed, Adel speaks with Dan Becker, CEO of decision.ai and founder of Kaggle Learn on the intersection of decision sciences and AI, and best practices when aligning machine learning to business value. Throughout the episode, Dan deep-dives into his background, how he reached the top of a Kaggle competition, the difference between machine learning in a Kaggle competition and the real world, the role of empathy when aligning machine learning to business value, the importance of decisions sciences when maximizing the value of machine learning in production, and more. Links: Follow Dan on Twitter Follow Dan on LinkedIn What 70% of data science learners do wrong Check out Dan’s course on DataCamp decision.ai Dan’s climate dashboard
17 May 2021 •
In this episode of DataFramed, Adel speaks with Amen Ra Mashariki, principal scientist at Nvidia and the former Chief Analytics Officer of the City of New York on how data science is done in government agencies, and how it's driving smarter cities all around us. Throughout the episode, Amen deep-dives into the use-cases he worked on to make the city of New York smarter, how data science allows cities to become more reactive and proactive, the unique challenges of scaling data science in a government setting, the friction between providing value and data privacy and ethics, the state of data literacy in government, and more. Links from the interview: Follow Amen on LinkedIn Follow Amen on Twitter The New York City Business Atlas Hurricane Sandy FEMA After-Action Report Data Drills
3 May 2021 •
We are super excited to be relaunching the DataFramed podcast. In this iteration of DataFramed, Adel Nehme, a data science educator at DataCamp, will uncover the latest thinking on all things data and how it’s impacting organizations through biweekly (once every two weeks) interviews and conversations with data experts from across the world. Check out this snippet for a preview of what’s to come and for a short chat with DataCamp’s CEO Jonathan Cornelissen on where he thinks data science is headed and the major challenges facing data teams today. Links: For the rest of April, get free access to DataCamp. Get involved with DataCamp Donates
26 April 2021 •
Before the COVID-19 crisis, we were already acutely aware of the need for a broader conversation around data privacy: look no further than the Snowden revelations, Cambridge Analytica, the New York Times Privacy Project, the General Data Protection Regulation (GDPR) in Europe, and the California Consumer Privacy Act (CCPA). In the age of COVID-19, these issues are far more acute. We also know that governments and businesses exploit crises to consolidate and rearrange power, claiming that citizens need to give up privacy for the sake of security. But is this tradeoff a false dichotomy? And what type of tools are being developed to help us through this crisis? In this episode, Katharine Jarmul, Head of Product at Cape Privacy, a company building systems to leverage secure, privacy-preserving machine learning and collaborative data science, will discuss all this and more, in conversation with Dr. Hugo Bowne-Anderson, data scientist and educator at DataCamp.Links from the show FROM THE INTERVIEW Katharine on Twitter Katharine on LinkedIn Contact Tracing in the Real World (By Ross Anderson) The Price of the Coronavirus Pandemic (By Nick Paumgarten) Do We Need to Give Up Privacy to Fight the Coronavirus? (By Julia Angwin) Introducing the Principles of Equitable Disaster Response (By Greg Bloom) Cybersecurity During COVID-19 ( By Bruce Schneier)
14 May 2020 •
This week, Hugo speaks with Sean Law about data science research and development at TD Ameritrade. Sean’s work on the Exploration team uses cutting edge theories and tools to build proofs of concept. At TD Ameritrade they think about a wide array of questions from conversational agents that can help customers quickly get to information that they need and going beyond chatbots. They use modern time series analysis and more advanced techniques like recurrent neural networks to predict the next time a customer might call and what they might be calling about, as well as helping investors leverage alternative data sets and make more informed decisions. What does this proof of concept work on the edge of data science look like at TD Ameritrade and how does it differ from building prototypes and products? And How does exploration differ from production? Stick around to find out. LINKS FROM THE SHOW DATAFRAMED GUEST SUGGESTIONS DataFramed Guest Suggestions (who do you want to hear on DataFramed?) FROM THE INTERVIEW Sean on Twitter Sean's Website TD Ameritrade Careers Page PyData Ann Arbor Meetup PyData Ann Arbor YouTube Channel (Videos) TDA Github Account (Time Series Pattern Matching repo to be open sourced in the coming months) Aura Shows Human Fingerprint on Global Air Quality FROM THE SEGMENTS Guidelines for A/B Testing (with Emily Robinson ~19:20) Guidelines for A/B Testing (By Emily Robinson) 10 Guidelines for A/B Testing Slides (By Emily Robinson) Data Science Best Practices (with Ben Skrainka ~34:50) Debugging (By David J. Agans) Basic Debugging With GDB (By Ben Skrainka) Sneaky Bugs and How to Find Them (with git bisect) (By Wiktor Czajkowski) Good logging practice in Python (By Victor Lin) Original music and sounds by The Sticks.
1 April 2019 •
This week, Hugo speaks with Debbie Berebichez about the importance of critical thinking in data science. Debbie is a physicist, TV host and data scientist and is currently the Chief Data Scientist at Metis in NY.In a world and a professional space plagued by buzz terms like AI, big data, deep learning, and neural networks, conversations around skill sets and less than productive programming language wars, what has happened to critical thinking in data science and data thinking in general? What type of critical thinking skills are even necessary as data science, AI and machine learning become even more present in all of our lives and how spread out do they need to be across organizations and society? Listen to find out!LINKS FROM THE SHOW DATAFRAMED GUEST SUGGESTIONS DataFramed Guest Suggestions (who do you want to hear on DataFramed?) FROM THE INTERVIEW Debbie on Twitter Debbie's Website Debbie Berebichez- Media Reel (Video) Deborah Berebichez' Keynote at Grace Hopper Celebration 2017 (Video) Debbie Berebichez on Perseverance and Paying it Forward (Video) Things about the Future and the Future of Things (By Debbie Berebichez, Video) FROM THE SEGMENTS Data Science tools for getting stuff done and giving it to the world (with Jared Lander ~21:55) Lander Analytics Website Docker Website plumber Website Statistical Distributions and their Stories (with Justin Bois ~39:30) Probability distributions and their stories (By Justin Bois) The History of Statistics (By Stephen M. Stigler) The Evolution of the Normal Distribution (By Saul Stahl) Original music and sounds by The Sticks.
25 March 2019 •
This week, Hugo will be speaking with Skipper Seabold about the current and looming credibility crisis in data science. Skipper is Director of Data Science at Civis Analytics, a data science technology and solutions company, and also the creator of the statsmodels package for statistical modeling and computing in python. Skipper is also a data scientist with a beard bigger than Hugo's. They’re going to be talking about how data science is facing a credibility crisis that is manifesting itself in different ways in different industries, how and why expectations aren’t met and many stakeholders are disillusioned. You’ll see that if the crisis isn’t prevented, the data science labor market may cease to be a seller’s market and we’ll have big missed opportunities. But this isn’t an episode of Black Mirror so they’ll also discuss how to avoid the crisis, taking detours through the role of randomized control trials in data science, the rise of methods borrowed from econometrics and how to set realistic expectations around what data science can and can’t do.LINKS FROM THE SHOW DATAFRAMED GUEST SUGGESTIONS DataFramed Guest Suggestions (who do you want to hear on DataFramed?) FROM THE INTERVIEW Skipper on Twitter Skipper on Github What's the Science in Data Science? (Video by Skipper Seabold) The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con out of Econometrics (By Joshua D. Angrist & Jörn-Steffen Pischke, American Economic Association) Project Management for the Unofficial Project Manager: A FranklinCovey Title (By Kory Kogon) Courtyard by Marriott Designing a Hotel Facility with Consumer-Based Marketing Models (Jerry Wind et al., The Institute of Management Sciences) Statsmodels's Documentation FROM THE SEGMENTS Guidelines for A/B Testing (with Emily Robinson ~15:48 & ~35:20) Guidelines for A/B Testing (By Emily Robinson) 10 Guidelines for A/B Testing Slides (By Emily Robinson) Original music and sounds by The Sticks.
18 March 2019 •
This week, Hugo speaks with Noemi Derzsy, a Senior Inventive Scientist at AT&T Labs within the Data Science and AI Research organization, where she does lots of science with lots of data. They’ll be talking about her work at AT&T Labs Research, the mission of which is to look beyond today’s technology solutions to invent disruptive technologies that meet future needs. AT&T Labs works on a multitude of projects, from product development at AT&T, to how to combat bias and fairness issues in targeted advertising and creating drones for cell tower inspection research that leverages AI, ML and video analytics. They’ll be talking about some of the work Noemi does, from characterizing human mobility from cellular network data to characterizing their mobile network to analyze how its topology compares to other real social networks reported to understanding tv viewership, and how engaged people are in different shows. They’ll discuss what the future of data science looks like, whether it will even be around in 2029 and what types of skills would help you land a job in a place like AT&T Labs.LINKS FROM THE SHOW DATAFRAMED GUEST SUGGESTIONS DataFramed Guest Suggestions (who do you want to hear on DataFramed?) FROM THE INTERVIEW Noemi on Twitter Noemi's Website Human Mobility Characterization from Cellular Network Data (By Richard Becker et al., Communications of the ACM) AT&T Labs Research Website NASA Datanauts Website Open NASA Website FROM THE SEGMENTS Guidelines for A/B Testing (with Emily Robinson ~18:23 & ~36:38) Testing multiple statistical hypotheses resulted in spurious associations: a study of astrological signs and health (By Peter C. Austin et al., Journal of Clinical Epidemiology) From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks (By Ya Xu et al., LinkedIn Corp) Guidelines for A/B Testing (By Emily Robinson) 10 Guidelines for A/B Testing Slides (By Emily Robinson) Original music and sounds by The Sticks.
11 March 2019 •
This week, Hugo speaks with Chris Albon about getting your first data science job. Chris is a Data Scientist at Devoted Health, where he uses data science and machine learning to help fix America's healthcare system. Chris is also doing a lot of hiring at Devoted and that’s why he’s so excited today to talk about how to get your first data science job. You may know Chris as co-host of the podcast Partially Derivative, from his educational resources such as his blog and machine learning flashcards or as one of the funniest data scientists on Twitter.LINKS FROM THE SHOW DATAFRAMED GUEST SUGGESTIONS DataFramed Guest Suggestions (who do you want to hear on DataFramed?) FROM THE INTERVIEW Chris on Twitter Chris's Website Devoted Website Machine Learning Flashcards (By Chris Albon) Machine Learning with Python Cookbook (By Chris Albon) FROM THE SEGMENTS Guidelines for A/B Testing (with Emily Robinson ~26:50) Guidelines for A/B Testing (By Emily Robinson) 10 Guidelines for A/B Testing Slides (By Emily Robinson) Original music and sounds by The Sticks.
4 March 2019 •
This week, Hugo speaks with Reshama Shaikh, about women in machine learning and data science, inclusivity and diversity more generally and how being intentional in what you do is essential. Reshama, a freelance data scientist and statistician, is also an organizer of the meetup groups Women in Machine Learning & Data Science (otherwise known as WiMLDS) and PyLadies. She has organized WiMLDS for 4 years and is a Board Member. They’ll discuss her work at WiMLDS and what you can do to support and promote women and gender minorities in data science. They’ll also delve into why women are flourishing in the R community but lagging in Python and discuss more generally how NUMFOCUS thinks about diversity and inclusion, including their code of conduct. All this and more.LINKS FROM THE SHOW DATAFRAMED GUEST SUGGESTIONS DataFramed Guest Suggestions (who do you want to hear on DataFramed?) FROM THE INTERVIEW Reshama’s Blog Reshama on Twitter List of Relevant Conferences (and Code of Conduct info) NYC PyLadies meetup Code of Conduct for NeurIPS and Other Stem Organizations NumFOCUS Diversity & Inclusion in Scientific Computing (DISC) NumFOCUS DISCOVER Cookbook (for inclusive events) fastai deep learning notes WiMLDS (Women in Machine Learning and Data Science) NYC WiMLDS meetup To start a WiMLDS chapter: email firstname.lastname@example.org and more info at our starter kit. WiMLDS Website Global List of WiMLDS Meetup Chapters WiMLDS Paris: They run their meetups in English, so knowledge of French is not required. FROM THE SEGMENTS DataCamp User Stories (with David Sudolsky ~17:27 & ~31:50) Boldr Website Original music and sounds by The Sticks.
25 February 2019 •
This week, Hugo speaks with Marco Blume, Trading Director at Pinnacle Sports. Marco and Hugo will talk about the role of data science in large-scale bets and bookmaking, how Marco is training an army of data scientists and much more. At Pinnacle, Marco uses tight risk-management built on cutting-edge models to provide bets not only on sports but on questions such as who will be the next pope? Who will be the world hot dog eating champion, who will land on mars first and who will be on the iron throne at the end of game of thrones. They’ll discuss the relations between risk management and uncertainty, how great forecasters are necessarily good at updating their predictions in the light of new data and evidence, how you can model this using Bayesian inference and the future of biometric sensing in sports betting. And, as always, much, much more.LINKS FROM THE SHOW DATAFRAMED GUEST SUGGESTIONS DataFramed Guest Suggestions (who do you want to hear on DataFramed?) FROM THE INTERVIEW Pinnacle Website Training an army of new data scientists (Presentation by Marco Blume) FROM THE SEGMENTS Data Science Best Practices (with Ben Skrainka ~16:40) Python Debugging With Pdb (By Nathan Jennings) pdb Tutorial (Github) The Visual Python Debugger for Jupyter Notebooks You’ve Always Wanted (By David Taieb) Debugging with RStudio (By Jonathan McPherson) Basics of Debugging Statistical Distributions and their Stories (with Justin Bois at ~36:00) Justin's Website at Caltech Probability distributions and their stories (By Justin Bois) Original music and sounds by The Sticks.
18 February 2019 •
This week on DataFramed, the DataCamp podcast, Hugo speaks with Gabriel Straub, the Head of Data Science and Architecture at the BBC, where his role is to help make the organization more data informed and to make it easier for product teams to build data and machine learning powered products. They’ll be talking about data science and machine learning at the BBC and how they can impact content discoverability, understanding content, putting the right stuff in front of people, how Gabriel and his team develop broader data science & machine learning architecture to make sure best practices are adopted and what it means to apply machine learning in a sensible way. How does the BBC think about incorporating data science into its business, which has been around since 1922 and historically been at the forefront of technological innovation such as in radio and television? Listen to find out!LINKS FROM THE SHOW DATAFRAMED GUEST SUGGESTIONS DataFramed Guest Suggestions (who do you want to hear on DataFramed?) FROM THE INTERVIEW Gabriel Straub: It's bigger on the inside (Video) BBC datalab FROM THE SEGMENTS DataCamp User Stories (with Krittika Patil ~16:10 & ~38:12) Kespry (Drone Aerial Intelligence for Industry) Original music and sounds by The Sticks.
11 February 2019 •
This week Hugo speaks with Dr. Brandeis Marshall, about people of color and under-represented groups in data science. They’ll talk about the biggest barriers to entry for people of color, initiatives that currently exist and what we as a community can do to be as diverse and inclusive as possible. Brandeis is an Associate Professor of Computer Science at Spelman College. Her interdisciplinary research lies in the areas of information retrieval, data science, and social media. Other research includes the BlackTwitter Project, which blends data analytics, social impact and race as a lens to understanding cultural sentiments. Brandeis is involved in a number of projects, workshops, and organizations that support data literacy and understanding, share best data practices and broaden participation in data science. LINKS FROM THE SHOW DATAFRAMED GUEST SUGGESTIONS DataFramed Guest Suggestions (who do you want to hear on DataFramed?) FROM THE INTERVIEW Brandeis on Twitter The BlackTwitter Project The Impact of Live Tweeting on Social Movements (By Brandeis Marshall, Takeria Blunt, Tayloir Thompson) EvergreenLP: Using a social network as a learning platform (By Brandeis Marshall, Jaye Nias, Tayloir Thompson, Takeria Blunt) Journal of Computing Sciences in Colleges (By Brandeis Marshall) DSX (Data Science eXtension Faculty development and undergraduate instruction in data science) African American Women Computer Science PhDs 500 Women Scientists Black in AI Women in Machine Learning FROM THE SEGMENTS What Data Scientists Really Do (with Hugo Bowne-Anderson & Emily Robinson ~21:30 & ~41:40) What Data Scientists Really Do, According to 35 Data Scientists (Harvard Business Review article by Hugo Bowne-Anderson) What Data Scientists Really Do, According to 50 Data Scientists (Slides from a talk by Hugo Bowne-Anderson) Original music and sounds by The Sticks.
4 February 2019 •
In episode 50, our Season 1, 2018 finale of DataFramed, the DataCamp podcast, Hugo speaks with Cathy O’Neil, data scientist, investigative journalist, consultant, algorithmic auditor and author of the critically acclaimed book Weapons of Math Destruction. Cathy and Hugo discuss the ingredients that make up weapons of math destruction, which are algorithms and models that are important in society, secret and harmful, from models that decide whether you keep your job, a credit card or insurance to algorithms that decide how we’re policed, sentenced to prison or given parole? Cathy and Hugo discuss the current lack of fairness in artificial intelligence, how societal biases are perpetuated by algorithms and how both transparency and auditability of algorithms will be necessary for a fairer future. What does this mean in practice? Tune in to find out. As Cathy says, “Fairness is a statistical concept. It's a notion that we need to understand at an aggregate level.” And, moreover, “data science doesn't just predict the future. It causes the future.”LINKS FROM THE SHOW DATAFRAMED SURVEY DataFramed Survey (take it so that we can make an even better podcast for you) DATAFRAMED GUEST SUGGESTIONS DataFramed Guest Suggestions (who do you want to hear on Season 2?) FROM THE INTERVIEW Cathy on Twitter Cathy's Blog Mathbabe Weapons of Math Destruction: How big data increases inequality and threatens democracy by Cathy O'Neil Cathy's Opinion Column, Bloomberg Doing Data Science (By Cathy O'Neil and Rachel Schutt) Cathy O'Neil & Hanna Gunn's "Ethical Matrix" paper coming soon. FROM THE SEGMENTS Data Science Best Practices (with Heather Nolis ~20:30) Using docker to deploy an R plumber API (By Jonathan Nolis and Heather Nolis) Enterprise Web Services with Neural Networks Using R and TensorFlow (By Jonathan Nolis and Heather Nolis) Data Science Best Practices (with Ben Skrainka ~39:35) The Clean Coder Blog (By Robert C. Martin) James Shore’s blog post on Red, Green, Refactor Jeff Knupp’s Python Unittesting tutorial (general unit tests in Python) John Myles White’s Intro to Unit Testing in R Original music and sounds by The Sticks.
26 November 2018 •
Hugo speaks with Wes McKinney, creator of the pandas project for data analysis tools in Python and author of Python for Data Analysis, among many other things. Wes and Hugo talk about data science tool building, what it took to get pandas off the ground and how he approaches building “human interfaces to data” to make individuals more productive. On top of this, they’ll talk about the future of data science tooling, including the Apache arrow project and how it can facilitate this future, the importance of DataFrames that are portable between programming languages and building tools that facilitate data analysis work in the big data limit. Pandas initially arose from Wes noticing that people were nowhere near as productive as they could be due to lack of tooling & the projects he’s working on today, which they’ll discuss, arise from the same place and present a bold vision for the future.LINKS FROM THE SHOWDATAFRAMED SURVEY DataFramed Survey (take it so that we can make an even better podcast for you) DATAFRAMED GUEST SUGGESTIONS DataFramed Guest Suggestions (who do you want to hear on Season 2?) FROM THE INTERVIEW Wes on Twitter Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure by Nadia Eghbal pandas, an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Ursa Labs FROM THE SEGMENTS Data Science Best Practices (with Ben Skrainka ~17:10) To Explain or To Predict? (By Galit Shmueli) Statistical Modeling: The Two Cultures (By Leo Breiman) The Book of Why (By Judea Pearl & Dana Mackenzie) Studies in Interpretability (with Peadar Coyle at ~39:00) Modelling Loss Curves in Insurance with RStan (By Mick Cooney) Lime: Explaining the predictions of any machine learning classifier Probabilistic Programming Primer Original music and sounds by The Sticks.
19 November 2018 •
In this episode of DataFramed, the DataCamp podcast, Hugo speaks with Angela Bassa about managing data science teams. Angela is Director of Data Science at iRobot, where she leads the team through development of machine learning algorithms, sentiment analysis, and anomaly detection processes. iRobot are the makers of consumer robots that we all know and love, like the Roomba, and the Braava which are, respectively, a robotic vacuum cleaner and a robotic mop. Angela will talk about how to get into data science management, the most important strategies to ensure that your data science team delivers value to the organization, how to hire data scientists and key points to consider as your data science team grows over time, in addition to the types of trade-offs you need to make as a data science manager and how you make the right ones. Along the way, you’ll see why a former marine biologist has the skills and ways of thinking to be a super data scientist at a company like iRobot and you’ll also see the importance of throwing data analysis parties.LINKS FROM THE SHOW FROM THE INTERVIEW Angela on Twitter HBR Newsletters iRobot Careers Data Science Internship FROM THE SEGMENTS Correcting Data Science Misconceptions (w/ Heather Nolis ~18:45) Using docker to deploy an R plumber API (By Jonathon Nolis) Enterprise Web Services with Neural Networks Using R and TensorFlow (By Jonathan Nolis and Heather Nolis) Project of the Month (w/ David Venturi ~38:45) Rise and Fall of Programming Languages (R Project by David Robinson) Learn, Practice, Apply! (By Ramnath Vaidyanathan) Apply to create a DataCamp project! Original music and sounds by The Sticks.
12 November 2018 •
Hugo speaks with Peter Bull about the importance of human-centered design in data science. Peter is a data scientist for social good and co-founder of Driven Data, a company that brings cutting-edge practices in data science and crowdsourcing to some of the world's biggest social challenges and the organizations taking them on, including machine learning competitions for social good. They’ll speak about the practice of considering how humans interact with data and data products and how important it is to consider them while designing your data projects. They’ll see how human-centered design provides a robust and reproducible framework for involving the end-user all through the data work, illuminated by examples such as DrivenData’s work in financial services and Mobile Money in Tanzania. Along the way, they’ll discuss the role of empathy in data science, the increasingly important conversation around data ethics and much, much more.LINKS FROM THE SHOW FROM THE INTERVIEW Peter on Twitter DrivenData Deon (Ethics Checklist) Cookiecutter Data Science If you liked this interview, you might be interested in working with DrivenData! Currently, the team is looking for a software engineer who loves the idea of building Python applications for social impact. Apply Here! FROM THE SEGMENTS Probability Distributions and their Stories (with Justin Bois at ~24:00) Justin's Website at Caltech Probability distributions and their stories (By Justin Bois) Studies in Interpretability (with Peadar Coyle at ~38:10) Interpretable ML Symposium How will the GDPR impact machine learning? (By Andrew Burt) How to use Bayesian Stats in your daily job (Gates, Perry, Zorn (2002)) Fairness in Machine Learning (By Moritz Hardt) Original music and sounds by The Sticks.
5 November 2018 •
In this episode of DataFramed, a DataCamp podcast, Hugo speaks with Arnaub Chatterjee. Arnaub is a Senior Expert and Associate Partner in the Pharmaceutical and Medical Products group at McKinsey & Company. They’ll discuss cutting through the hype about artificial intelligence (AI) and machine learning (ML) in healthcare by looking at practical applications and how McKinsey & Company is helping the industry evolve. Tune in for an insider’s account into what has worked in healthcare, from ML models being used to predict nearly everything in clinical settings, to imaging analytics for disease diagnosis, to wound therapeutics. Will robots and AI replace disciplines such as radiology, ophthalmology, and dermatology? How have the moving parts of data science work evolved in healthcare? What does the future of data science, ML and AI in healthcare hold? Stick around to find out. LINKS FROM THE SHOW FROM THE INTERVIEW McKinsey Analytics on Twitter Hot off the press article for HBR’s Future of Healthcare online forum (By Arnaub Chatterjee) Our latest piece on the promise & challenge of AI (By James Manyika and Jacques Bughin) Are robots coming for our jobs? (mckinsey.com) Analytics Careers page (mckinsey.com) How we help clients in healthcare analytics (mckinsey.com) AI analysis of 400+ use cases, including ones in healthcare (By Michael Chui et al. mckinsey.com) FROM THE SEGMENTS Machines that Multi-task (with Manny Moss) Part 1 at ~21:05 Responsible AI in Consumer Enterprise Hilary Mason, DJ Patil and Mike Loukides on Data Ethics EthicalOS Tookit Part 2 at ~40:00 21 Definitions of Fairness Tutorial from FAT* (Arvind Naranayan) Kate Crawford's keynote address "The Trouble with Bias" from NIPS 2017 The (im)possibility of Fairness (Sorelle et al. arXiv.org) Learning from disparate data sources (Li Y et al. PubMed.gov) Distributed Multi-task Learning (Liyang Xie et al. KDD.org) The Cost of Fairness in Binary Classification (Aditya Krishna Menon et al. proceedings.mlr.press) Original music and sounds by The Sticks.
29 October 2018 •
In this episode of DataFramed, Hugo speaks with Cassie Kozyrkov, Chief Decision Scientist at Google Cloud. Cassie and Hugo will be talking about data science, decision making and decision intelligence, which Cassie thinks of as data science plus plus, augmented with the social and managerial sciences. They’ll talk about the different and evolving models for how the fruits of data science work can be used to inform robust decision making, along with pros and cons of all the models for embedding data scientists in organizations relative to the decision function. They’ll tackle head on why so many organizations fail at using data to robustly inform decision making, along with best practices for working with data, such as not verifying your results on the data that inspired your models. As Cassie says, “Split your damn data”.Links from the show FROM THE INTERVIEW Cassie on Twitter Is data science a bubble? (By Cassie Kozyrkov, Hackernoon) Incompetence, delegation, and population (By Cassie Kozyrkov, Hackernoon) Populations — You’re doing it wrong (By Cassie Kozyrkov, Hackernoon) What on earth is data science? (By Cassie Kozyrkov, Hackernoon) FROM THE SEGMENTS Probability Distributions and their Stories (with Justin Bois at ~19:45) Justin's Website at Caltech Probability distributions and their stories (By Justin Bois) Machines that Multi-Task (with Friederike Schüür of Fast Forward Labs ~43:45) Sebastian’s Ruder’s Overview of Multi-Task Learning in Deep Neural Networks Multi-Task Learning for NLP, also by Sebastian Ruder GANs for Fake Celebrity Images (Karras et al, Nvidia) Adversarial Multi-Task Learning for Text Classification (Liu et al., arXiv.org) Original music and sounds by The Sticks.
22 October 2018 •
In this episode of DataFramed, Hugo speaks with Brian Granger, co-founder and co-lead of Project Jupyter, physicist and co-creator of the Altair package for statistical visualization in Python. They’ll speak about data science, interactive computing, open source software and Project Jupyter. With over 2.5 million public Jupyter notebooks on github alone, Project Jupyter is a force to be reckoned with. What is interactive computing and why is it important for data science work? What are all the the moving parts of the Jupyter ecosystem, from notebooks to JupyterLab to JupyterHub and binder and why are they so relevant as more and more institutions adopt open source software for interactive computing and data science? From Netflix running around 100,000 Jupyter notebook batch jobs a day to LIGO’s Nobel prize winning discovery of gravitational waves publishing all their results reproducibly using Notebooks, Project Jupyter is everywhere. Links from the show FROM THE INTERVIEW Brian on Twitter Project Jupyter Beyond Interactive: Notebook Innovation at Netflix (Ufford, Pacer, Seal, Kelley, Netflix Tech Blog) Gravitational Wave Open Science Center (Tutorials) JupyterCon YouTube Playlist jupyterstream Github Repository FROM THE SEGMENTS Machines that Multi-Task (with Friederike Schüür of Fast Forward Labs)Part 1 at ~24:40 Brief Introduction to Multi-Task Learning (By Friederike Schüür) Overview of Multi-Task Learning Use Cases (By Manny Moss) Multi-Task Learning for the Segmentation of Building Footprints (Bischke et al., arXiv.org) Multi-Task as Question Answering (McCann et al., arXiv.org) The Salesforce Natural Language Decathlon: A Multitask Challenge for NLP Part 2 at ~44:00 Rich Caruana’s Awesome Overview of Multi-Task Learning and Why It Works Sebastian’s Ruder’s Overview of Multi-Task Learning in Deep Neural Networks Massively Multi-Task Network for Drug Discovery, 259 Tasks (!) (Ramsundar et al. arXiv.org) Brief Overview of Multi-Task Learning with Video of Newsie, the Prototype (By Friederike Schüür) Original music and sounds by The Sticks.
15 October 2018 •
Hugo speaks with Andrew Gelman about statistics, data science, polling, and election forecasting. Andy is a professor of statistics and political science and director of the Applied Statistics Center at Columbia University and this week we’ll be talking the ins and outs of general polling and election forecasting, the biggest challenges in gauging public opinion, the ever-present challenge of getting representative samples in order to model the world and the types of corrections statisticians can and do perform. "Chatting with Andy was an absolute delight and I cannot wait to share it with you!"-Hugo Links from the show FROM THE INTERVIEW Andrew's Blog Andrew on Twitter We Need to Move Beyond Election-Focused Polling (Gelman and Rothschild, Slate) We Gave Four Good Pollsters the Same Raw Data. They Had Four Different Results (Cohn, The New York Times). 19 things we learned from the 2016 election (Gelman and Azari, Science, 2017) The best books on How Americans Vote (Gelman, Five Books) The best books on Statistics (Gelman, Five Books) Andrew's Research FROM THE SEGMENTS Statistical Lesson of the Week (with Emily Robinson at ~13:30) The five Cs (Loukides, Mason, and Patil, O'Reilly) Data Science Best Practices (with Ben Skrainka~40:40) Oberkampf & Roy’s Verification and Validation in Scientific Computing provides a thorough yet very readable treatment A comprehensive framework for verification, validation, and uncertainty quantification in scientific computing (Roy and Oberkampf, Science Direct) Original music and sounds by The Sticks.
8 October 2018 •
Hugo speaks with Vicki Boykis about what full-stack end-to-end data science actually is, how it works in a consulting setting across various industries and why it’s so important in developing modern data-driven solutions to business problems. Vicki is a full-stack data scientist and senior manager at CapTech Consulting, working on projects in machine learning and data engineering. They'll also discuss the increasing adoption of data science in the cloud technologies and associated pitfalls, along with how to equip businesses with the skills to maintain the data products you developed for them. All this and more: Hugo is pumped! Links from the show FROM THE INTERVIEW Vicki's Tech Blog Vicki on Twitter CapTech Consulting Vicki's Tweet about Programming Building a Twitter art bot with Python, AWS, and socialist realism art FROM THE SEGMENTS Data Science Best Practices (with Ben Skrainka~15:00) Cross-industry standard process for data mining Fundamentals of Machine Learning for Predictive Data Analytics Statistical Lesson of the Week (with Emily Robinson at ~32:05) Sex Bias in Graduate Admissions: Data from Berkeley (Bickel et al., Science, 1975) Time Series Analysis Tutorial with Python Original music and sounds by The Sticks.
1 October 2018 •
Hugo speaks with Allen Downey about uncertainty in data science. Allen is a professor of Computer Science at Olin College and the author of a series of free, open-source textbooks related to software and data science. Allen and Hugo speak about uncertainty in data science and how we, as humans, are not always good at thinking about uncertainty, which we need be to in such an uncertain world. Should we have been surprised at the outcome of the 2016 election? What approaches can we, as a data reporting community, take to communicate around uncertainty better in the future? From election forecasting to health and safety, thinking about uncertainty and using data & data-oriented tools to communicate around uncertainty are essential. Links from the show FROM THE INTERVIEW Data Science Data Optimism Allen's Twitter List of cognitive biases Why are we so surprised? (Allen's Blog) Probably Overthinking It (Allen Downey's Blog) Think Stats (Allen's Book) There is only one test! (Allen's Blog) FROM THE SEGMENT Statistical Distributions and their Stories (with Justin Bois at ~27:00) Justin's Website at Caltech Probability distributions and their stories LeBron James Field Goals Original music and sounds by The Sticks.
24 September 2018 •
Hugo speaks with Renee Teate about the many paths to becoming a data scientist. Renee is a Data Scientist at higher ed analytics start-up HelioCampus, and creator and host of the Becoming a Data Scientist Podcast. In addition to discussing the many possible ways to become becoming a data scientist, they will discuss the common data scientist profiles and how to figure out which ones may be a fit for you. They’ll also dive into the fact that you need to figure out both where you are in terms of skills and knowledge and where you want to go in terms of your career. Renee has a bunch of great suggestions for aspiring data scientists and also flags several important pitfalls and warnings. On top of this, they'll dive into how much statistics, linear algebra and calculus you need to know in order to become an effective data scientist and/or data analyst. Links from the show FROM THE INTERVIEW Becoming a Data Scientist (Renée's Blog) Renée's Twitter Data Sci Guide (Data Science Learning Directory) FROM THE SEGMENTS Statistical Distributions and their Stories (with Justin Bois at ~19:20) Justin's Website at Caltech Probability distributions and their stories Programming Topic of the Week (with Emily Robinson at ~43:20) Categorical Data in the Tidyverse, a DataCamp Course taught by Emily Robinson. R for Data Science Book by Hadley Wickham (Factors Chapter) Inference for Categorical Data, a DataCamp Course taught by Andrew Bray. stringsAsFactors: An unauthorized biography (Roger Peng, July 24, 2015) Wrangling categorical data in R (Amelia McNamara & Nicholas J Horton, August 30, 2017) Original music and sounds by The Sticks.
17 September 2018 •
Hugo speaks with Eric Colson, Chief Algorithms Officer at Stitch Fix, an online personal styling service reinventing the shopping experience by delivering one-to-one personalization to their clients through the combination of data science and human judgment. Eric is responsible for the creation of dozens of algorithms at Stitch Fix that are pervasive to nearly every function of the company, from merchandise, inventory, and marketing to forecasting and demand, operations, and the styling recommender system. Join for all of this and more. Links from the show FROM THE INTERVIEW Stitch Fix Algorithm Tour Warehouse Maps, Movie Recommendation, Structural Biology Advice for Data Scientists on where to work More Human Humans: how our work-life can be improved by ceding tasks to machines. Learning from Textual Feedback (natural Language processing) Deep Style: Teaching machines about style from images Hybrid Designs You Can’t Make this stuff up … or can you? The Blissful Ignorance of the Narrative Fallacy FROM THE SEGMENTS Blog Post of the Week (with Emily Robinson) Doing Good Data Science by Mike Loukides, Hilary Mason and DJ Patil Original music and sounds by The Sticks.
10 September 2018 •
Meet Tanya Cashorali, a founding partner of TCB Analytics, a Boston-based data consultancy. Tanya started her career in bioinformatics and has applied her experience to other industries such as healthcare, finance, retail, and sports. We’ll be talking about what it means to be a data consultant, the wide range of industries that Tanya works in, the impact of data products in her work and the importance of rapid prototyping and getting MVPs or minimum viable products out the door. How does Tanya balance the trade-off between rapid prototyping and building fully mature data products? How does this play out in particular cases in the healthcare and telecommunications spaces? How has her ability to do this evolved as a function of open source software development? We’ll also dive into how general data literacy has evolved, how it can help decision making in business more generally, the data science skills gap and how many data science hiring processes are broken and how to fix them.
3 September 2018 •
Hugo speaks with JD Long, VP of risk management for Renaissance reinsurance, about applications of data science techniques to the omnipresent worlds of insurance, reinsurance, risk management and uncertainty. What are the biggest challenges in insurance and reinsurance that data science can impact? How does JD go about building risk representations of every deal? How can thinking in a distributed fashion allow us to think about risk and uncertainty? What is the role of empathy in data science?
27 August 2018 •
Hugo speaks with Christie Bahlai, Assistant Professor at Kent State University, about data science, ecology, and the adoption of techniques such as machine learning in academic research. What are the biggest challenges in ecology that data science can help to solve? What does the intersection of open science and data science look like? In scientific research, what is happening at the interface between data science & machine learning methods, which are pattern-based, and traditional research methods, which are classically hypothesis driven? Is there a paradigm shift occurring here? Listen to find out! Links from the show The Bahlai Lab of applied quantitative ecology Christie Bahlai on twitter Hugo's article on What Data Scientists Really Do in Harvard Business Review Hugo's webinar on What Managers Need To Know About Machine Learning
20 August 2018 •
Hugo speaks with Yves Hilpisch about how data science is disrupting finance. Yves’ name is synonymous with Python for Finance and he is founder and managing partner of The Python Quants, a group focusing on the use of open source technologies for financial data science, artificial intelligence, algorithmic trading and computational finance. Why are banks such as Bank of America & JP Morgan adopting the open source data science ecosystem? What are the major sub-disciplines of Finance that data science is and can have a large impact in? How has the rise of data science changed the financial world and how the work is done and thought about? Stick around to find out.
13 August 2018 •
Hugo speaks with Amber Thomas about data journalism, interactive visualization and data storytelling. Amber is a journalist-engineer at The Pudding, which is a collection of data-driven, visual essays. We’ll discuss the ins and outs of what it takes to tell interactive journalistic stories using data visualization and, in the process, we’ll find out what it takes to be successful at data journalism, the trade-off between being being a generalist and specialist and much more. We’ll explore these issues by focusing on several case studies, including a piece that Amber worked on late last year called “How far is too far? An analysis of driving times to abortion clinics in the US.”
6 August 2018 •
What are the biggest challenges in Pharmaceuticals that data science can help to solve? How are data science and statistics generally embedded in organizations such as Pfizer? What aspects of the pharmaceutical business run the gamut of nonclinical statistics? Hugo speaks with Max Kuhn, a software engineer at RStudio who was previously Senior Director of Nonclinical Statistics at Pfizer Global R&D. Max was applying models in the pharmaceutical and diagnostic industries for over 18 years.
30 July 2018 •
Hugo speaks with Derek Johnson, an epidemiologist with Doctors without Borders. Derek leverages statistical methods, experimental design and data scientific techniques to investigate the barriers impeding people from accessing health care in Lahe Township, Myanmar. If you thought data science was all machine learning, SQL databases and convolutional neural nets, this is gonna be a wild ride as to get the data for their baseline health assessments, Derek and his team ride motorcycles into villages in northern Myanmar for weeks on end to perform in person surveys, equipped with translators and pens and paper because they can’t be guaranteed of electricity. Derek also researches the factors associated with the transmission of hepatitis C between family members and has helped to conduct studies in Uganda, Nepal, and India. All this and more.
23 July 2018 •
Hugo speaks with Alan Nichol about chatbots, conversational software and data science. Alan is co-founder and CTO of Rasa, who build open source machine learning tools for developers and product teams to expand bots beyond answering simple questions. Which verticals are conversational software currently having the biggest impact on? What are the biggest challenges facing the fields of chatbots and conversational software? What misapprehensions do we as a society have about these technologies that experts such as Alan would like to correct? And how can we all build chatbots and conversational software ourselves?
16 July 2018 •
Hugo speaks with Taras Gorishnyy, a Senior Analytics Manager at McKinsey and Head of Data Science at QuantumBlack, a McKinsey company. They discuss the role of data science in management consulting, what it takes to change organizations through data science, how the different moving parts of data science have evolved over the past decade and in which direction they’re heading. You’ll see the impact that data science can have not only in tech, but also in such various verticals as retail, agriculture and the penal system. Taras will also take us through the 5 steps required to change organizations through data science, all of which are necessary. Can you guess what they are? We're really excited to have Taras on the show as DataCamp has had a long relationship with McKinsey, including that McKinsey uses DataCamp for training.
9 July 2018 •
Omoju Miller, a Senior Machine Learning Data Scientist with Github, speaks with Hugo about the role of data science in product development at github, what it means to “use computation to build products to solve real-life decision making, practical challenges” and what building data products at github actually looks like. Machine learning has the power to automate so much of the drudgery around data science & software engineering, from automated code review to flagging security vulnerabilities in code, and from recommending repositories to contributors to matching issues with maintainers and contributors and identifying duplicate issues. And just in case that’s not enough, they'll discuss github as a platform for work, not just technical, and, as Omoju has called it, “a collaborative work environment centered around humans.”
2 July 2018 •
What are best practices for organizing data science teams? Having data scientists distributed through companies or having a Centre of Excellence? What are the most important skills for data scientists? Is the ability to use the most sophisticated deep learning models more important than being able to make good powerpoint slides? Find out in this conversation with Jacqueline Nolis, a data science leader in the Seattle area with over a decade of experience. Jacqueline is currently running a consulting firm helping Fortune 500 companies with data science, machine learning, and AI. This interview is with Jacqueline Nolis, but at the time of recording, she went by Jonathan Nolis. Links from the show Jacqueline Nolis' website You're relying on data too much: making decisions worse, not better, by Jacqueline Nolis Hiring data scientists (part 1): what to look for in a candidate, by Jacqueline Nolis Jacqueline on Twitter For more, see our page here
25 June 2018 •
What are the biggest challenges currently facing data security and privacy? What does the GDPR mean for civilians, working data scientists and businesses around the world? Is data anonymization actually possible or a pipe dream? Find out in Hugo's conversation with Katharine Jarmul, a data scientist, consultant, educator and co-founder of KI protect, a company that provides real-time protection for your data infrastructure, data science and AI. Links from the show KI Protect, providing real-time protection for your data infrastructure. What is GDPR? The summary guide to GDPR compliance in the UK by Matt Burgess for Wired Apple's differential privacy approach For more, see our page here
18 June 2018 •
Why are spreadsheets ubiquitous in data analytics, why are so many data scientists anti-spreadsheet? Join Jenny Bryan, a software engineer at RStudio & recovering biostatistician who takes special delight in eliminating the small agonies of data analysis, and Hugo to discover why spreadsheets are in fact necessary in data analytics and how spreadsheet workflows can be incorporated into more general data science flows in sustainable and healthy ways. Welcome to the future. Links from the show Best Practices for Using Google Sheets in Your Data Project Jenny Bryan's repository of scary Excel stories Sanesheets, a self-proclaimed > by Jenny Bryan DataCamp's first two free courses on spreadsheets
11 June 2018 •
Community building is an essential aspect of data science. But how do you do it? Find out in Hugo's conversation with Jared Lander, organizer of the New York Open Statistical Programming Meetup and the New York R Conference. Jared is also the Chief Data Scientist of Lander Analytics, a data science consultancy based in New York City and an Adjunct Professor of Statistics at Columbia University. How does Jared think about creating safe and welcoming spaces for budding and practicing data scientists of all ilk? How does he put this into practice? How does he make people feel comfortable and at home in a field in which so many intelligent and curious people feel like imposters? What practical & specific considerations are there in creating this home for underrepresented groups? How does he stay ahead of the curve in terms of modern, up-to-date content and speakers for his meetup and conference?
4 June 2018 •
"Cloud computing is a huge revolution in the computing space, and it's also probably going to be one of the most transformative technologies that any of us experience in our lifetime. " Paige Bailey, Senior Cloud Developer Advocate at Microsoft, in this episode of DataFramed. In this conversation with Hugo, Paige reports from the frontier of cloud-based data science technologies, having just been at the Microsoft Build and Google I/O conferences. What is the future of data science in the cloud? How can you get started? Stick around to find out and much, much more.
28 May 2018 •
What do online experiments, data science and product development look like at Booking.com, the world’s largest accommodations provider? Join Hugo's conversation with Lukas Vermeer to find out. Lukas is responsible for experimentation at Booking in the broadest sense of the word: from Infrastructure and Tools used to run experiments, Methodology and Metrics that help people make decisions to Training and Culture that help people understand what to do. They'll be talking about how Booking leverages Data Science to help empower people to experience the world through the three pillars of exploratory analysis, qualitative research and quantitative studies. They'll also take a deep dive into the fact that data science isn't actually anywhere near as objective as you may think.
21 May 2018 •
Building models of the world is dangerous and there are pitfalls everywhere, even down to the assumptions that you make. To find out about many statistical pitfalls, and how to build more robust data scientific models using statistical modeling, whether it be in tech, epidemiology, finance or anything else, join Hugo's chat with Michael Betancourt, a physicist, statistician and one of the core developers of the open source statistical modeling platform Stan.
14 May 2018 •
How can data science help in the fight against cancer? What are its limitations? Find out in this conversation from the frontier of research. Hugo speaks with Sandy Griffith from Flatiron Health, a healthcare technology and services company focused on accelerating cancer research and improving patient care. Sandy is Principal methodologist on Flatiron's Quantitative Sciences team and is tasked with leveraging data science "To improve lives by learning from the experience of every cancer patient".
7 May 2018 •
Anthony Goldbloom, CEO of Kaggle, speaks with Hugo about Kaggle, data science communities, reproducible data science, machine learning competitions and the future of data science in the cloud. If you thought that Kaggle was merely a platform for machine learning competitions, you have to check out this chat, because these ML comps account for less than a third of activity on Kaggle today. In the discussion: Kaggle kernels for reproducible data science and the evolution of the Kaggle public data platform; the genesis of Kaggle and how Anthony managed to solve the cold start problem of building a two-sided market place; the exciting implications of Kaggle's recent acquisition by Google for the future of cloud-based data science; why Python is dominant on Kaggle.
30 April 2018 •
"We should be looking at Automated Machine Learning tools as more like data science assistants, rather than replacements for data scientists" -- Randy Olson, Lead Data Scientist at Life Epigenetics, Inc. Randy specializes in artificial intelligence, machine learning, and created TPOT, a Data Science Assistant and a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming. Will the future of data science be automated? Which verticals will experience the largest disruption? What will the role of data science become? There's one way to find out: jump straight into this chat with Randy and Hugo.
23 April 2018 •
Michelle Gill, a deep learning expert at NVIDIA, an Artificial Intelligence company that builds GPUs, the processors that everybody uses for deep learning, speaks with Hugo about the modern superpower of deep learning and where it has the largest impact, past, present and future, filtered through the lens of Michelle's work at NVIDIA. Where is the modern superpower of deep learning most effective? Where is it not? Where should we channel our skepticism of the hype surrounding it?
16 April 2018 •
Sebastian Raschka, a machine learning aficionado, data analyst, author, python programmer, open source contributor, computational biologist, and occasional blogger, speaks with Hugo about the role of data science in modern biology and the power of deep learning in today's rapidly evolving data science landscape. How is Sebastian using deep learning to build facial recognition software that also prevents racial and gender profiling? Check out this week's episode to find out.
9 April 2018 •
Drew Conway, world-renowned data scientist, entrepreneur, author, speaker and creator of the Data Science Venn Diagram speaks with Hugo about how to build data science teams, along with the unique challenges of building data science products for industrial users. How does Drew now view the Venn circles he created, those of hacking skills, mathematical and statistical knowledge and substantive expertise, when building out data science teams?
26 March 2018 •
Fake news: how can data science and deep learning be leveraged to detect it? Come on a journey with Mike Tamir, Head of Data Science at Uber ATG, who is building out a data science product that classifies text as news, editorial, satire, hate speech and fake news, among others. We'll also see what types of unique challenges Mike faced in his work at Takt, using data science to service the needs of Fortune 500 companies such as Starbucks.Links from the show FROM THE INTERVIEW FakerFact(Chrome Extension) FakerFact (Firefox Extension) FakerFact The Unreasonable Effectiveness of Recurrent Neural Networks by Andrei Karpathy FROM THE SEGMENTS The Double-edged Sword of Impact Parts I & 2 (with Friederike Schüür, Cloudera Fast Forward Labs) Media Manipulation and Disinformation Online from Data & Society James Bridle's blog post 'Something is wrong on the internet' The Cost of Fairness in Binary Classification (.pdf), a paper by Menon & Williamson (2018) Multisided Fairness for Recommendation, a paper by Burke (2017) All The Cool Kids, How Do They Fit In? Popularity and Demographic Biases in Recommender Evaluation and Effectiveness, a paper by Ekstrand et al. (2018) The spread of true and false news online, a paper by Vosoughi et al. (2018) Original music and sounds by The Sticks.
12 March 2018 •
Nuclear engineering, data science and open source software development: where do these all intersect? To find out, join Hugo and Katy Huff, Assistant Professor in the Department of Nuclear, Plasma, and Radiological Engineering at the University of Illinois where she leads the Advanced Reactors and Fuel Cycles research group.
5 March 2018 •
How does data science help Buzzfeed achieve online virality? What type of mass online experiments do data scientists at BuzzFeed run for this purpose? What products do they develop to make all of this easy and intuitive for content producers? Find out about all of this and more in this episode when Hugo talks with Adam Kelleher, Principal Data Scientist at BuzzFeed and Adjunct Assistant Professor at Columbia University. They'll also dive into the role of thinking about causality in modern data science.
26 February 2018 •
Air pollution, the environment and data science: where do these intersect? Find out in this episode of DataFramed, in which Hugo speaks with Roger Peng, Professor in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health, co-director of the Johns Hopkins Data Science Lab and co-founder of the Johns Hopkins Data Science Specialization. Join our discussion about data science, it's role in researching the environment and air pollution, massive open online courses for democratizing data science and much more.
19 February 2018 •
Etsy, online experiments and data science are the topics of this episode, in which Hugo speaks with Emily Robinson, a data analyst at Etsy. How are data science and analysis integral to their business and decision making? Join us to find out. We'll also dive into the types of statistical modeling that occurs at Etsy and the importance of both diversity and community in data science.
12 February 2018 •
Jake VanderPlas, a data science fellow at the University of Washington's eScience Institute, astronomer, open source beast and renowned Pythonista, joins Hugo to speak about data science, astronomy, the open source development world and the importance of interdisciplinary conversations to data science.
5 February 2018 •
Airbnb's business depends on data science. In this episode, Hugo speaks with Robert Chang, data scientist at airbnb and previously at twitter. We'll be chatting about the different types of roles data science can play in digital businesses such as airbnb and twitter, how companies at different stages of development actually require divergent types of data science to be done, along with the different models for how data scientists are placed within companies, from the centralized model to the embedded to the hybrid: can you guess which is Robert's favourite? This is a hands-on, practical look at how data science works at airbnb and digital businesses in general.
29 January 2018 •
David Robinson, a data scientist at Stack Overflow, joins Hugo to speak about the evolving importance of citizen data science and a future in which data literacy is considered a necessary skill to navigate the world, similar to literacy today. We'll speak about many of Dave projects, including his analysis of Trump's tweets that demonstrated the stark contrast between Trump's own tweets and those of his PR machine. We'll also speak about ways for journalists, software engineers, scientists and all walks of life to get up and running doing data science and analysis.
17 January 2018 •
Maelle Salmon, a data scientist who has worked in public health, both in infectious disease and environmental epidemiology, joins Hugo for a chat about the role of data science, statistics and data management in researching the health effects of air pollution and urbanization. In the process, we'll dive into the continual need for open source toolbox development, open data, knowledge organisation and diversity in this emerging discipline.
17 January 2018 •
The trucking industry is being revolutionized by Data Science. And how? Hugo speaks with Ben Skrainka, a data scientist at Convoy, a company that provides trucking services for shippers and carriers powered by technology to drive reliability, transparency, efficiency, and insights. We'll dive into how data science can help to achieve such a trucking revolution, and how this will impact all of us, from truckers to businesses and consumers alike. Along the way, we'll delve into Ben's thoughts on best practices in data science, how the field is evolving and how we can all help to shape the future of this emerging discipline.
17 January 2018 •
Claudia Perlich, Chief Scientist at DStillery, a role in which she designs, develops, analyzes and optimizes the machine learning algorithms that drive digital advertising, speaks with Hugo about the role of data science in the online advertising world, the predictability of humans, how her team builds real time bidding algorithms and detects bots online, along with the ethical implications of all of these evolving concepts.
17 January 2018 •
Chris Volinsky, AT&T Labs' Assistant Vice President for Big Data Research and a member of the team that won the $1M Netflix Prize, an open competition for improving Netflix' online recommendation system, speaks with Hugo. We'll be discussing the role data science plays in the modern telecommunications network landscape, how it helps a company that services over 140 million customers and what statistical and data scientific techniques his team uses to work with such large amounts of data. Along the way, we'll dive into the need for more transparency concerning the use of civilian data and Chris's work on the Netflix recommendation system prize.
17 January 2018 •
Hilary Mason talks about the past, present, and future of data science with Hugo. Hilary is the VP of Research at Cloudera Fast Forward, a machine intelligence research company, and the data scientist in residence at Accel. If you want to hear about where data science has come from, where it is now, and the direction it's heading, you've come to the right place. Along the way, we'll delve into the ethics of machine learning, the challenges of AI, automation and the roles of humanity and empathy in data science.
16 January 2018 •
We are super pumped to be launching a weekly data science podcast called DataFramed, in which Hugo Bowne-Anderson, a data scientist and educator at DataCamp, speaks with industry experts about what data science is, what it’s capable of, what it looks like in practice and the direction it is heading over the next decade and into the future. Check out this snippet for a sneak preview!
15 January 2018 •