Hello Data Science

A Friendly Introduction with Applications

Author

Mine Doğucu, Catalina Medina, Alma Castro

Published

January 14, 2026

Preface

The Hello Data Science book is intended for anyone who wants to take on their first data science project to look for patterns, relationships, and meaning in data - without being intimidated while still being challenged. It is currently a work in progress. We have only released the first five chapters as we continue to write the rest of the book. If you would like us to inform you about updates to the book please sign up using this form.

This book is a result of our teaching of introductory data science courses at a research university, a teaching university, and a community college. Thus, the book is intended first and foremost for university students who are taking their first introductory data science course. We assume no prior programming background and start by introducing basic programming concepts. We assume high-school level algebra knowledge. Other than students, any reader with the equivalent prerequisite knowledge might benefit from reading the book and doing the tasks. As data science emerged as a new field, many of us, including instructors, had to learn new topics. So, we hope that this book will serve also those who already know some topics in data science but are trying to fill in the gaps in between.

Pedagogical Approach

We will answer the first question that’s probably on many readers’ mind. Do we teach coding the old school way or do we welcome generative AI? As education with generative AI use evolves, our ideas will too. Our current approach is that we start the old school way. We want readers to understand the basics before they rely on AI. If we were to use the typical analogy of calculators, students still learn addition and subtraction. Once they have an understanding of arithmetic then they use calculators to speed up the process. Our current approach is similar, learn the basics, then use AI later to speed up the process.

It is also worth noting that we ourselves used generative AI in the writing of this book including but not limited to drafting certain paragraphs, editing, paraphrasing, and finding datasets and examples, but the core material and ideas are from the teaching materials that we had created for our courses.

In our classrooms, we expect students to be actively engaged with the material. This is no different for our readers. As you read the book, you will come across code. We recommend that you run this code on your computer as you are reading, especially in early chapters as you are getting used to R. You will also come across boxes within the text titled “Test Yourself!”. We recommend that you take the time to tackle these questions.

Setting Up Your Computer

Before you begin working on data science projects using this book, you will first need to get your computer ready by installing the latest versions of the following programs.

We hope to provide further instructions for these once we have a more complete book.

Acknowledgments

We would like to thank the National Science Foundation (NSF) for funding the collaborative project #2123366 and #2123384. All authors have collaborated in this project, and it is this project that has supported improvements to our courses and creation of one of our courses.

License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license.

About the Authors

Mine Doğucu is Associate Professor of Teaching in the Department of Statistics at University of California, Irvine. She focuses on modernizing statistics education, instructor training, Bayesian methods, and making data science accessible physically and cognitively. Several of her projects have been funded by the National Science Foundation and National Institutes of Health. In addition to the Hello Data Science Book, she has coauthored the Bayes Rules! book. She finds joy in teaching her students and learning from them, playing with her cats, and going to art museums and galleries with her spouse.

Catalina Medina is an Assistant Professor of Data Science in the Department of Mathematics and Data Science at California State University Channel Islands. Her work is motivated by interdisciplinary, real-world problems and she believes data science should be taught with real data from a variety of interesting contexts to engage students and create data literate citizens. Outside of academia she loves cooking with friends, playing with her cats, and bingeing TV shows.

Alma Castro is a Professor in the Department of Mathematics at Cypress College. Her emphasis has been on coordinating and teaching the introductory statistics course for non-STEM majors, by helping create a more uniform curriculum across sections and developing and implementing open educational resources for the course. Most currently, she has led efforts to introduce data science at her school through the creation of the first data science course, as well as helping in the developing of an associate’s degree in data science. Aside from teaching, she enjoys reading, running, and watching series with her husband, but she loves more being a mother to her two beautiful children, Isabella and Mateo.