Computational Analysis of Communication
Autor W van Atteveldten Limba Engleză Paperback – 2 mar 2022
Preț: 347.39 lei
Nou
Puncte Express: 521
Preț estimativ în valută:
66.49€ • 69.86$ • 55.88£
66.49€ • 69.86$ • 55.88£
Carte disponibilă
Livrare economică 18 februarie-04 martie
Livrare express 04-08 februarie pentru 67.23 lei
Preluare comenzi: 021 569.72.76
Specificații
ISBN-13: 9781119680239
ISBN-10: 1119680239
Pagini: 336
Dimensiuni: 178 x 253 x 19 mm
Greutate: 0.68 kg
Editura: Wiley
Locul publicării:Hoboken, United States
ISBN-10: 1119680239
Pagini: 336
Dimensiuni: 178 x 253 x 19 mm
Greutate: 0.68 kg
Editura: Wiley
Locul publicării:Hoboken, United States
Cuprins
Preface xi
Acknowledgments xiii
1 Introduction 1
1.1 The Role of Computational Analysis in the Social Sciences 1
1.2 Why Python and/or R? 3
1.3 How to Use This Book 4
1.4 Installing R and Python 5
1.4.1 Installing R and RStudio 7
1.4.2 Installing Python and Jupyter Notebook 9
1.5 Installing Third-Party Packages 12
2 Getting Started: Fun with Data and Visualizations 13
2.1 Fun With Tweets 14
2.2 Fun With Textual Data 15
2.3 Fun With Visualizing Geographic Information 17
2.4 Fun With Networks 19
3 Programming Concepts for Data Analysis 23
3.1 About Objects and Data Types 24
3.1.1 Storing Single Values: Integers, Floating-Point Numbers, Booleans 25
3.1.2 Storing Text 26
3.1.3 Combining Multiple Values: Lists, Vectors, And Friends 28
3.1.4 Dictionaries 32
3.1.5 From One to More Dimensions: Matrices and n-Dimensional Arrays 33
3.1.6 Making Life Easier: Data Frames 34
3.2 Simple Control Structures: Loops and Conditions 35
3.2.1 Loops 36
3.2.2 Conditional Statements 37
3.3 Functions and Methods 39
4 How to Write Code 43
4.1 Re-using Code: How Not to Re-Invent the Wheel 43
4.2 Understanding Errors and Getting Help 46
4.2.1 Error Messages 46
4.2.2 Debugging Strategies 48
4.3 Best Practice: Beautiful Code, GitHub, and Notebooks 49
5 From File to Data Frame and Back 55
5.1 Why and When Do We Use Data Frames? 56
5.2 Reading and Saving Data 57
5.2.1 The Role of Files 57
5.2.2 Encodings and Dialects 59
5.2.3 File Handling Beyond Data Frames 61
5.3 Data from Online Sources 62
6 Data Wrangling 65
6.1 Filtering, Selecting, and Renaming 66
6.2 Calculating Values 67
6.3 Grouping and Aggregating 69
6.3.1 Combining Multiple Operations 70
6.3.2 Adding Summary Values 71
6.4 Merging Data 72
6.4.1 Equal Units of Analysis 72
6.4.2 Inner and Outer Joins 75
6.4.3 Nested Data 76
6.5 Reshaping Data: Wide To Long And Long To Wide 78
6.6 Restructuring Messy Data 79
7 Exploratory Data Analysis 83
7.1 Simple Exploratory Data Analysis 84
7.2 Visualizing Data 87
7.2.1 Plotting Frequencies and Distributions 88
7.2.2 Plotting Relationships 92
7.2.3 Plotting Geospatial Data 98
7.2.4 Other Possibilities 99
7.3 Clustering and Dimensionality Reduction 100
7.3.1 k-means Clustering 101
7.3.2 Hierarchical Clustering 102
7.3.3 Principal Component Analysis and Singular Value Decomposition 106
8 Statistical Modeling and Supervised Machine Learning 113
8.1 Statistical Modeling and Prediction 115
8.2 Concepts and Principles 117
8.3 Classical Machine Learning: From Naïve Bayes to Neural Networks 122
8.3.1 Naïve Bayes 122
8.3.2 Logistic Regression 124
8.3.3 Support Vector Machines 125
8.3.4 Decision Trees and Random Forests 127
8.3.5 Neural Networks 129
8.4 Deep Learning 130
8.4.1 Convolutional Neural Networks 131
8.5 Validation and Best Practices 133
8.5.1 Finding a Balance Between Precision and Recall 133
8.5.2 Train, Validate, Test 137
8.5.3 Cross-validation and Grid Search 138
9 Processing Text 141
9.1 Text as a String of Characters 142
9.1.1 Methods for Dealing With Text 144
9.2 Regular Expressions 145
9.2.1 Regular Expression Syntax 146
9.2.2 Example Patterns 147
9.3 Using Regular Expressions in Python and R 150
9.3.1 Splitting and Joining Strings, and Extracting Multiple Matches 151
10 Text as Data 155
10.1 The Bag of Words and the Term-Document Matrix 156
10.1.1 Tokenization 157
10.1.2 The DTM as a Sparse Matrix 159
10.1.3 The DTM as a "Bag of Words" 162
10.1.4 The (Unavoidable) Word Cloud 163
10.2 Weighting and Selecting Documents and Terms 164
10.2.1 Removing stop words 165
10.2.2 Removing Punctuation and Noise 167
10.2.3 Trimming a DTM 170
10.2.4 Weighting a DTM 171
10.3 Advanced Representation of Text 172
10.3.1 n-grams 173
10.2.3 Collocations 174
10.3.3 Word Embeddings 176
10.3.4 Linguistic Preprocessing 177
10.4 Which Preprocessing to Use? 182
11 Automatic Analysis of Text 184
11.1 Deciding on the Right Method 185
11.2 Obtaining a Review Dataset 187
11.3 Dictionary Approaches to Text Analysis 189
11.4 Supervised Text Analysis: Automatic Classification and Sentiment Analysis 191
11.4.1 Putting Together a Workflow 191
11.4.2 Finding the Best Classifier 194
11.4.3 Using the Model 198
11.4.4 Deep Learning 199
11.5 Unsupervised Text Analysis: Topic Modeling 203
11.5.1 Latent Dirichlet Allocation (LDA) 203
11.5.2 Fitting an LDA Model 206
11.5.3 Analyzing Topic Model Results 207
11.5.4 Validating and Inspecting Topic Models 208
11.5.5 Beyond LDA 209
12 Scraping Online Data 212
12.1 Using Web APIs: From Open Resources to Twitter 213
12.2 Retrieving and Parsing Web Pages 219
12.2.1 Retrieving and Parsing an HTML Page 219
12.2.2 Crawling Websites 223
12.2.3 Dynamic Web Pages 225
12.3 Authentication, Cookies, and Sessions 228
12.3.1 Authentication and APIs 228
12.3.2 Authentication and Webpages 229
12.4 Ethical, Legal, and Practical Considerations 230
13 Network Data 233
13.1 Representing and Visualizing Networks 234
13.2 Social Network Analysis 241
13.2.1 Paths and Reachability 242
13.2.2 Centrality Measures 246
13.2.3 Clustering and Community Detection 248
14 Multimedia Data 258
14.1 Beyond Text Analysis: Images, Audio and Video 259
14.2 Using Existing Libraries and APIs 261
14.3 Storing, Representing, and Converting Images 263
14.4 Image Classification 270
14.4.1 Basic Classification with Shallow Algorithms 272
14.4.2 Deep Learning for Image Analysis 273
14.4.3 Re-using an Open Source CNN 279
15 Scaling Up and Distributing 283
15.1 Storing Data in SQL and noSQL Databases 283
15.1.1 When to Use a Database 283
15.1.2 Choosing the Right Database 285
15.1.3 A Brief Example Using SQLite 286
15.2 Using Cloud Computing 286
15.3 Publishing Your Source 290
15.4 Distributing Your Software as Container 291
16 Where to Go Next 293
16.1 How Far Have We Come? 293
16.2 Where To Go Next? 294
16.3 Open, Transparent, and Ethical Computational Science 295
Bibliography 297
Index 303
Notă biografică
Dr. Wouter van Atteveldt is an Associate Professor of Political Communication at Vrije Universiteit, Amsterdam. He is co-founder of the Computational Methods division of the International Communication Association, and Founding Chief Editor of Computational Communication Research. He has published extensively on innovative methods for analyzing political text and contributed to a number of relevant R and Python packages. Dr. Damian Trilling is an Associate Professor, Department of Communication Science, at the University of Amsterdam, and Associate Editor of Computational Communication Research. His research uses computational methods such as the analysis of digital trace data and large-scale text analysis to study the use and effects of news media. He has developed extensive teaching materials to introduce social scientists to the Python programming language. Dr. Carlos Arcila Calderón is an Associate Professor, Department of Sociology and Communication, at the University of Salamanca, Chief Editor of the journal Disertaciones, and member of the Editorial Board of Computational Communication Research. He has published extensively on new media and social media studies, and has led the prototype Autocop, a Spark-based environment to run distributed supervised sentiment analysis of Twitter messages.