Pandas for Everyone: Addison-Wesley Data & Analytics
Autor Daniel Y. Chenen Limba Engleză Paperback – 22 noi 2017
This tutorial teaches students everything they need to get started with Python programming for the fast-growing field of data analysis. Daniel Chen tightly links each new concept with easy-to-apply, relevant examples from modern data analysis.
Unlike other beginner's books, this guide helps today's newcomers learn both Python and its popular Pandas data science toolset in the context of tasks they'll really want to perform. Following the proven Software Carpentry approach to teaching programming, Chen introduces each concept with a simple motivating example, slowly offering deeper insights and expanding your ability to handle concrete tasks.
Preț: 208.90 lei
Preț vechi: 261.12 lei
-20% Nou
39.98€ • 41.53$ • 33.21£
Indisponibil temporar
Specificații
ISBN-10: 0134546938
Pagini: 400
Dimensiuni: 178 x 232 x 22 mm
Greutate: 0.64 kg
Editura: Addison-Wesley Professional
Seria Addison-Wesley Data & Analytics
Descriere
This tutorial teaches everything you need to get started with Python programming for the fast-growing field of data analysis. Daniel Chen tightly links each new concept with easy-to-apply, relevant examples from modern data analysis. Unlike other beginner's books, this guide helps today's newcomers learn both Python and its popular Pandas data science toolset in the context of tasks they'll really want to perform. Following the proven Software Carpentry approach to teaching programming, Chen introduces each concept with a simple motivating example, slowly offering deeper insights and expanding your ability to handle concrete tasks. Each chapter is illuminated with a concept map: an intuitive visual index of what you'll learn -- and an easy way to refer back to what you've already learned. An extensive set of easy-to-read appendices help you fill knowledge gaps wherever they may exist. Coverage includes:
- Setting up your Python and Pandas environment
- Getting started with Pandas dataframes
- Using dataframes to calculate and perform basic statistical tasks
- Plotting in Matplotlib
- Cleaning data, reshaping dataframes, handling missing values, working with dates, and more
- Building basic data analytics models
- Applying machine learning techniques: both supervised and unsupervised
- Creating reproducible documents using literate programming techniques
"
Notă biografică
Cuprins
Foreword xix
Preface xxi
Acknowledgments xxvii
About the Author xxxi
Part I: Introduction 1
Chapter 1: Pandas DataFrame Basics 3
1.1 Introduction 3
1.2 Loading Your First Data Set 4
1.3 Looking at Columns, Rows, and Cells 7
1.4 Grouped and Aggregated Calculations 18
1.5 Basic Plot 23
1.6 Conclusion 24
Chapter 2: Pandas Data Structures 25
2.1 Introduction 25
2.2 Creating Your Own Data 26
2.3 The Series 28
2.4 The DataFrame 36
2.5 Making Changes to Series and DataFrames 38
2.6 Exporting and Importing Data 43
2.7 Conclusion 47
Chapter 3: Introduction to Plotting 49
3.1 Introduction 49
3.2 Matplotlib 51
3.3 Statistical Graphics Using matplotlib 56
3.4 Seaborn 61
3.5 Pandas Objects 83
3.6 Seaborn Themes and Styles 86
3.7 Conclusion 90
Part II: Data Manipulation 91
Chapter 4: Data Assembly 93
4.1 Introduction 93
4.2 Tidy Data 93
4.3 Concatenation 94
4.4 Merging Multiple Data Sets 102
4.5 Conclusion 107
Chapter 5: Missing Data 109
5.1 Introduction 109
5.2 What Is a NaN Value? 109
5.3 Where Do Missing Values Come From? 111
5.4 Working with Missing Data 116
5.5 Conclusion 121
Chapter 6: Tidy Data 123
6.1 Introduction 123
6.2 Columns Contain Values, Not Variables 124
6.3 Columns Contain Multiple Variables 128
6.4 Variables in Both Rows and Columns 133
6.5 Multiple Observational Units in a Table (Normalization) 134
6.6 Observational Units Across Multiple Tables 137
6.7 Conclusion 141
Part III: Data Munging 143
Chapter 7: Data Types 145
7.1 Introduction 145
7.2 Data Types 145
7.3 Converting Types 146
7.4 Categorical Data 152
7.5 Conclusion 153
Chapter 8: Strings and Text Data 155
8.1 Introduction 155
8.2 Strings 155
8.3 String Methods 158
8.4 More String Methods 160
8.5 String Formatting 161
8.6 Regular Expressions (RegEx) 164
8.7 The regex Library 170
8.8 Conclusion 170
Chapter 9: Apply 171
9.1 Introduction 171
9.2 Functions 171
9.3 Apply (Basics) 172
9.4 Apply (More Advanced) 177
9.5 Vectorized Functions 182
9.6 Lambda Functions 185
9.7 Conclusion 187
Chapter 10: Groupby Operations: Split-Apply-Combine 189
10.1 Introduction 189
10.2 Aggregate 190
10.3 Transform 197
10.4 Filter 201
10.5 The pandas.core.groupby.DataFrameGroupBy Object 202
10.6 Working with a MultiIndex 207
10.7 Conclusion 211
Chapter 11: The datetime Data Type 213
11.1 Introduction 213
11.2 Python's datetime Object 213
11.3 Converting to datetime 214
11.4 Loading Data That Include Dates 217
11.5 Extracting Date Components 217
11.6 Date Calculations and Timedeltas 220
11.7 Datetime Methods 221
11.8 Getting Stock Data 224
11.9 Subsetting Data Based on Dates 225
11.10 Date Ranges 227
11.11 Shifting Values 230
11.12 Resampling 237
11.13 Time Zones 238
11.14 Conclusion 240
Part IV: Data Modeling 241
Chapter 12: Linear Models 243
12.1 Introduction 243
12.2 Simple Linear Regression 243
12.3 Multiple Regression 247
12.4 Keeping Index Labels From sklearn 251
12.5 Conclusion 252
Chapter 13: Generalized Linear Models 253
13.1 Introduction 253
13.2 Logistic Regression 253
13.3 Poisson Regression 257
13.4 More Generalized Linear Models 260
13.5 Survival Analysis 260
13.6 Conclusion 264
Chapter 14: Model Diagnostics 265
14.1 Introduction 265
14.2 Residuals 265
14.3 Comparing Multiple Models 270
14.4 k-Fold Cross-Validation 275
14.5 Conclusion 278
Chapter 15: Regularization 279
15.1 Introduction 279
15.2 Why Regularize? 279
15.3 LASSO Regression 281
15.4 Ridge Regression 283
15.5 Elastic Net 285
15.6 Cross-Validation 287
15.7 Conclusion 289
Chapter 16: Clustering 291
16.1 Introduction 291
16.2 k-Means 291
16.3 Hierarchical Clustering 297
16.4 Conclusion 301
Part V: Conclusion 303
Chapter 17: Life Outside of Pandas 305
17.1 The (Scientific) Computing Stack 305
17.2 Performance 306
17.3 Going Bigger and Faster 307
Chapter 18: Toward a Self-Directed Learner 309
18.1 It's Dangerous to Go Alone! 309
18.2 Local Meetups 309
18.3 Conferences 309
18.4 The Internet 310
18.5 Podcasts 310
18.6 Conclusion 311
Part VI: Appendixes 313
Appendix A: Installation 315
A.1 Installing Anaconda 315
A.2 Uninstall Anaconda 316
Appendix B: Command Line 317
B.1 Installation 317
B.2 Basics 318
Appendix C: Project Templates 319
Appendix D: Using Python 321
D.1 Command Line and Text Editor 321
D.2 Python and IPython 322
D.3 Jupyter 322
D.4 Integrated Development Environments (IDEs) 322
Appendix E: Working Directories 325
Appendix F: Environments 327
Appendix G: Install Packages 329
G.1 Updating Packages 330
Appendix H: Importing Libraries 331
Appendix I: Lists 333
Appendix J: Tuples 335
Appendix K: Dictionaries 337
Appendix L: Slicing Values 339
Appendix M: Loops 341
Appendix N: Comprehensions 343
Appendix O: Functions 345
O.1 Default Parameters 347
O.2 Arbitrary Parameters 347
Appendix P: Ranges and Generators 349
Appendix Q: Multiple Assignment 351
Appendix R: numpy ndarray 353
Appendix S: Classes 355
Appendix T: Odo: The Shapeshifter 357
Index 359