Cantitate/Preț
Produs

Pandas for Everyone: Addison-Wesley Data & Analytics

Autor Daniel Y. Chen
en Limba Engleză Paperback – 22 noi 2017

This tutorial teaches students everything they need to get started with Python programming for the fast-growing field of data analysis. Daniel Chen tightly links each new concept with easy-to-apply, relevant examples from modern data analysis.

Unlike other beginner's books, this guide helps today's newcomers learn both Python and its popular Pandas data science toolset in the context of tasks they'll really want to perform. Following the proven Software Carpentry approach to teaching programming, Chen introduces each concept with a simple motivating example, slowly offering deeper insights and expanding your ability to handle concrete tasks.

Citește tot Restrânge

Din seria Addison-Wesley Data & Analytics

Preț: 20890 lei

Preț vechi: 26112 lei
-20% Nou

Puncte Express: 313

Preț estimativ în valută:
3998 4153$ 3321£

Indisponibil temporar

Doresc să fiu notificat când acest titlu va fi disponibil:

Preluare comenzi: 021 569.72.76

Specificații

ISBN-13: 9780134546933
ISBN-10: 0134546938
Pagini: 400
Dimensiuni: 178 x 232 x 22 mm
Greutate: 0.64 kg
Editura: Addison-Wesley Professional
Seria Addison-Wesley Data & Analytics


Descriere

This tutorial teaches everything you need to get started with Python programming for the fast-growing field of data analysis. Daniel Chen tightly links each new concept with easy-to-apply, relevant examples from modern data analysis. Unlike other beginner's books, this guide helps today's newcomers learn both Python and its popular Pandas data science toolset in the context of tasks they'll really want to perform. Following the proven Software Carpentry approach to teaching programming, Chen introduces each concept with a simple motivating example, slowly offering deeper insights and expanding your ability to handle concrete tasks. Each chapter is illuminated with a concept map: an intuitive visual index of what you'll learn -- and an easy way to refer back to what you've already learned. An extensive set of easy-to-read appendices help you fill knowledge gaps wherever they may exist. Coverage includes:

  • Setting up your Python and Pandas environment
  • Getting started with Pandas dataframes
  • Using dataframes to calculate and perform basic statistical tasks
  • Plotting in Matplotlib
  • Cleaning data, reshaping dataframes, handling missing values, working with dates, and more
  • Building basic data analytics models
  • Applying machine learning techniques: both supervised and unsupervised
  • Creating reproducible documents using literate programming techniques

"


Notă biografică

Daniel Chen is a graduate student in the interdisciplinary PhD program in Genetics, Bioinformatics & Computational Biology (GBCB) at Virginia Tech. He is involved with Software Carpentry as an instructor and lesson maintainer. He completed his master's degree in public health at Columbia University Mailman School of Public Health in Epidemiology, and currently works at the Social and Decision Analytics Laboratory under the Biocomplexity Institute of Virginia Tech where he is working with data to inform policy decision-making. He is the author of Pandas for Everyone and Pandas Data Analysis with Python Fundamentals LiveLessons.

Cuprins

 

Foreword xix

Preface xxi

Acknowledgments xxvii

About the Author xxxi

 

 

 

Part I: Introduction 1

 

Chapter 1: Pandas DataFrame Basics 3

 

1.1 Introduction 3

1.2 Loading Your First Data Set 4

1.3 Looking at Columns, Rows, and Cells 7

1.4 Grouped and Aggregated Calculations 18

1.5 Basic Plot 23

1.6 Conclusion 24

 

Chapter 2: Pandas Data Structures 25

2.1 Introduction 25

2.2 Creating Your Own Data 26

2.3 The Series 28

2.4 The DataFrame 36

2.5 Making Changes to Series and DataFrames 38

2.6 Exporting and Importing Data 43

2.7 Conclusion 47

 

Chapter 3: Introduction to Plotting 49

3.1 Introduction 49

3.2 Matplotlib 51

3.3 Statistical Graphics Using matplotlib 56

3.4 Seaborn 61

3.5 Pandas Objects 83

3.6 Seaborn Themes and Styles 86

3.7 Conclusion 90

 

 

Part II: Data Manipulation 91

 

Chapter 4: Data Assembly 93

 

4.1 Introduction 93

4.2 Tidy Data 93

4.3 Concatenation 94

4.4 Merging Multiple Data Sets 102

4.5 Conclusion 107

 

Chapter 5: Missing Data 109

5.1 Introduction 109

5.2 What Is a NaN Value? 109

5.3 Where Do Missing Values Come From? 111

5.4 Working with Missing Data 116

5.5 Conclusion 121

 

Chapter 6: Tidy Data 123

6.1 Introduction 123

6.2 Columns Contain Values, Not Variables 124

6.3 Columns Contain Multiple Variables 128

6.4 Variables in Both Rows and Columns 133

6.5 Multiple Observational Units in a Table (Normalization) 134

6.6 Observational Units Across Multiple Tables 137

6.7 Conclusion 141

 

 

Part III: Data Munging 143

 

Chapter 7: Data Types 145

 

7.1 Introduction 145

7.2 Data Types 145

7.3 Converting Types 146

7.4 Categorical Data 152

7.5 Conclusion 153

 

Chapter 8: Strings and Text Data 155

8.1 Introduction 155

8.2 Strings 155

8.3 String Methods 158

8.4 More String Methods 160

8.5 String Formatting 161

8.6 Regular Expressions (RegEx) 164

8.7 The regex Library 170

8.8 Conclusion 170

 

Chapter 9: Apply 171

9.1 Introduction 171

9.2 Functions 171

9.3 Apply (Basics) 172

9.4 Apply (More Advanced) 177

9.5 Vectorized Functions 182

9.6 Lambda Functions 185

9.7 Conclusion 187

 

Chapter 10: Groupby Operations: Split-Apply-Combine 189

10.1 Introduction 189

10.2 Aggregate 190

10.3 Transform 197

10.4 Filter 201

10.5 The pandas.core.groupby.DataFrameGroupBy Object 202

10.6 Working with a MultiIndex 207

10.7 Conclusion 211

 

Chapter 11: The datetime Data Type 213

11.1 Introduction 213

11.2 Python's datetime Object 213

11.3 Converting to datetime 214

11.4 Loading Data That Include Dates 217

11.5 Extracting Date Components 217

11.6 Date Calculations and Timedeltas 220

11.7 Datetime Methods 221

11.8 Getting Stock Data 224

11.9 Subsetting Data Based on Dates 225

11.10 Date Ranges 227

11.11 Shifting Values 230

11.12 Resampling 237

11.13 Time Zones 238

11.14 Conclusion 240

 

 

Part IV: Data Modeling 241

 

Chapter 12: Linear Models 243

 

12.1 Introduction 243

12.2 Simple Linear Regression 243

12.3 Multiple Regression 247

12.4 Keeping Index Labels From sklearn 251

12.5 Conclusion 252

 

Chapter 13: Generalized Linear Models 253

13.1 Introduction 253

13.2 Logistic Regression 253

13.3 Poisson Regression 257

13.4 More Generalized Linear Models 260

13.5 Survival Analysis 260

13.6 Conclusion 264

 

Chapter 14: Model Diagnostics 265

14.1 Introduction 265

14.2 Residuals 265

14.3 Comparing Multiple Models 270

14.4 k-Fold Cross-Validation 275

14.5 Conclusion 278

 

Chapter 15: Regularization 279

15.1 Introduction 279

15.2 Why Regularize? 279

15.3 LASSO Regression 281

15.4 Ridge Regression 283

15.5 Elastic Net 285

15.6 Cross-Validation 287

15.7 Conclusion 289

 

Chapter 16: Clustering 291

16.1 Introduction 291

16.2 k-Means 291

16.3 Hierarchical Clustering 297

16.4 Conclusion 301

 

 

Part V: Conclusion 303

 

Chapter 17: Life Outside of Pandas 305

 

17.1 The (Scientific) Computing Stack 305

17.2 Performance 306

17.3 Going Bigger and Faster 307

 

Chapter 18: Toward a Self-Directed Learner 309

18.1 It's Dangerous to Go Alone! 309

18.2 Local Meetups 309

18.3 Conferences 309

18.4 The Internet 310

18.5 Podcasts 310

18.6 Conclusion 311

 

 

Part VI: Appendixes 313

 

Appendix A: Installation 315

 

A.1 Installing Anaconda 315

A.2 Uninstall Anaconda 316

 

Appendix B: Command Line 317

B.1 Installation 317

B.2 Basics 318

 

 

Appendix C: Project Templates 319

 

Appendix D: Using Python 321

 

D.1 Command Line and Text Editor 321

D.2 Python and IPython 322

D.3 Jupyter 322

D.4 Integrated Development Environments (IDEs) 322

 

 

Appendix E: Working Directories 325

 

Appendix F: Environments 327

 

Appendix G: Install Packages 329

 

G.1 Updating Packages 330

 

 

Appendix H: Importing Libraries 331

 

Appendix I: Lists 333

 

Appendix J: Tuples 335

 

Appendix K: Dictionaries 337

 

Appendix L: Slicing Values 339

 

Appendix M: Loops 341

 

 

 

Appendix N: Comprehensions 343

 

Appendix O: Functions 345

 

O.1 Default Parameters 347

O.2 Arbitrary Parameters 347

 

 

Appendix P: Ranges and Generators 349

 

Appendix Q: Multiple Assignment 351

 

Appendix R: numpy ndarray 353

 

Appendix S: Classes 355

 

Appendix T: Odo: The Shapeshifter 357

 

 

Index 359