Data Mining
- What is Data Mining?
- Statistical Limits of Data Mining
- Things Useful to Know
- Outline of the book
MapReduce and the New Software Stack
- Distributed File Systems
- MapReduce
- Algorithms Using MapReduce
- Extensions to MapReduce
- The Communication Cost Model
- Complexity Theory for MapReduce
Finding Similar Items
- Applications of Near-Neighbor Search
- Shingling of Documents
- Similarity-Preserving Summaries of Sets
- Locality-Sensitive Hashing for Documents
- Distance Measures
- The Theory of Locality-Sensitive Functions
- LSH Families for Other Distance Measures
- Applications of Locality-Sensitive Hashing
- Methods for High Degrees of Similarity
Mining Data Streams
- The Stream Data Model
- Sampling Data in a Stream
- Filtering Streams
- Counting Distinct Elements in a Stream
- Estimating Moments
- Counting Ones in a Window
- Decaying Windows
Link Analysis
- PageRank
- Efficient Computation of PageRank
- Topic-Sensitive PageRank
- Link Spam
- Hubs and Authorities
Frequent Itemsets
- The Market-Basket Model
- Market Baskets and the A-Priori Algorithm
- Handling Larger Datasets in Main Memory
- Limited-Pass Algorithms
- Counting Frequent Items in a Stream
Clustering
- Introduction to Clustering Techniques
- Hierarchical Clustering
- K-means Algorithms
- The CURE Algorithm
- Clustering in Non-Euclidean Spaces
- Clustering for Streams and Parallelism
Advertising on the Web
- Issues in On-Line Advertising
- On-Line Algorithms
- The Matching Problem
- The Adwords Problem
- Adwords Implementation
Recommendation Systems
- A Model for Recommendation Systems
- Content-Based Recommendations
- Collaborative Filtering
- Dimensionality Reduction
Mining Social-Network Graphs
- Social Networks as Graphs
- Clustering of Social-Network Graphs
- Direct Discovery of Communities
- Partitioning of Graphs
- Finding Overlapping Communities
- Simrank
- Counting Triangles
- Neighborhood Properties of Graphs
Dimensionality Reduction
- Eigenvalues and Eigenvectors of Symmetric Matrices
- Principal-Component Analysis
- Singular-Value Decomposition
- CUR Decomposition
Large-Scale Machine Learning
- The Machine-Learning Model
- Perceptrons
- Support-Vector Machines
- Learning from Nearest Neighbors
- Mining of Massive Datasets. Jure Leskovec; Anand Rajaraman; Jeffrey D. Ullman. Stanford University; Milliway Labs. infolab.stanford.edu .