Hello Data Science
A Friendly Introduction with Applications
Preface
The Hello Data Science book is intended for anyone who wants to take on their first data science project to look for patterns, relationships, and meaning in data - without being intimidated while still being challenged. It is currently a work in progress. We have only released the first five chapters as we continue to write the rest of the book. If you would like us to inform you about updates to the book please sign up using this form.
This book is a result of our teaching of introductory data science courses at a research university, a teaching university, and a community college. Thus, the book is intended first and foremost for university students who are taking their first introductory data science course. We assume no prior programming background and start by introducing basic programming concepts. We assume high-school level algebra knowledge. Other than students, any reader with the equivalent prerequisite knowledge might benefit from reading the book and doing the tasks. As data science emerged as a new field, many of us, including instructors, had to learn new topics. So, we hope that this book will serve also those who already know some topics in data science but are trying to fill in the gaps in between.
Pedagogical Approach
We will answer the first question that’s probably on many readers’ mind. Do we teach coding the old school way or do we welcome generative AI? As education with generative AI use evolves, our ideas will too. Our current approach is that we start the old school way. We want readers to understand the basics before they rely on AI. If we were to use the typical analogy of calculators, students still learn addition and subtraction. Once they have an understanding of arithmetic then they use calculators to speed up the process. Our current approach is similar, learn the basics, then use AI later to speed up the process.
It is also worth noting that we ourselves used generative AI in the writing of this book including but not limited to drafting certain paragraphs, editing, paraphrasing, and finding datasets and examples, but the core material and ideas are from the teaching materials that we had created for our courses.
In our classrooms, we expect students to be actively engaged with the material. This is no different for our readers. As you read the book, you will come across code. We recommend that you run this code on your computer as you are reading, especially in early chapters as you are getting used to R. You will also come across boxes within the text titled “Test Yourself!”. We recommend that you take the time to tackle these questions.
Setting Up Your Computer
Before you begin working on data science projects using this book, you will first need to get your computer ready by installing the latest versions of the following programs.
- Download and install R
- Download and install RStudio Desktop
- Download and install git (optional)
- Sign up for a GitHub account (optional). You may want to consider the username advice from Jenny Bryan.
- Introduce yourself to git (also optional)
- Set up SSH keys (also optional)
We hope to provide further instructions for these once we have a more complete book.
Acknowledgments
We would like to thank the National Science Foundation (NSF) for funding the collaborative project #2123366 and #2123384. All authors have collaborated in this project, and it is this project that has supported improvements to our courses and creation of one of our courses.
License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license.