Top 10 algorithms in data mining

While it is difficult to identify the top 10, this paper contains 10 very important data mining/machine learning algorithms
A Few Useful Things to Know about Machine Learning

Just like the title says, it contains many useful tips and gotchas for machine learning
Random Forests

The initial paper on random forests
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

The paper introducing conditional random fields as a framework for building probabilistic models.
Support-Vector Networks

The initial paper on support-vector networks for classification.
The Fast Johnson-Lindenstrauss Transforms

The Johnson-Lindenstrauss transform (JLT) prescribes that there exists a matrix of size k x d, where k = O(1/eps^2 log d) such that with high probability, a matrix A drawn from this distribution preserves pairwise distances up to epsilon (e.g. (1-eps) * ||x-y|| < ||Ax - Ay|| < (1+eps) ||x-y||). This paper was the first paper to show that you can actually compute the JLT in less that O(kd) operations (e.g. you don't need to do the full matrix multiplication). They used their faster algorithm to construct one of the fastest known approximate nearest neighbor algorithms.

Ailon, Nir, and Bernard Chazelle. "The fast Johnson-Lindenstrauss transform and approximate nearest neighbors." SIAM Journal on Computing 39.1 (2009): 302-322. Available: https://www.cs.princeton.edu/~chazelle/pubs/FJLT-sicomp09.pdf
Applications of Machine Learning to Location Data

Using machine learning to design and analyze novel algorithms that leverage location data.
"Why Should I Trust You?" Explaining the Predictions of Any Classifier

This paper introduces an explanation technique for any classifier in a interpretable manner.
Multiple Narrative Disentanglement: Unraveling Inﬁnite Jest

Uses an unsupervised approach to natural language processing that classifies narrators in David Foster Wallace's 1,000-page novel.
ImageNet Classification with Deep Convolutional Neural Networks

This paper introduces AlexNet, a neural network architecture which dramatically improved over the state-of-the-art in image classification algorithms and is widely regarded as a breakthrough moment for deep learning.
Interpretable machine learning: definitions, methods, and applications

This paper introduces the foundations of the rapidly emerging field of interpretable machine learning.
Distilling the Knowledge in a Neural Network

This seminal paper introduces a method to distill information from an ensemble of neural networks into a single model.
Truncation of Wavelet Matrices: Edge Effects and the Reduction of Topological Control by Freedman

In this paper by Michael Hartley Freedman, he applies Robion Kirby “torus trick”, via wavelets, to the problem of compression.

Hosted Papers

📜 A Sparse Johnson-Lindenstrauss Transform

The JLT is still computationally expensive for a lot of applications and one goal would be to minimize the overall operations needed to do the aforementioned matrix multiplication. This paper showed that a goal of a O(k log d) algorithm (e.g. (log(d))^2) may be attainable by showing that very sparse, structured random matrices could provide the JL guarantee on pairwise distances.

Dasgupta, Anirban, Ravi Kumar, and Tamás Sarlós. "A sparse johnson: Lindenstrauss transform." Proceedings of the forty-second ACM symposium on Theory of computing. ACM, 2010. Available: arXiv/cs/1004:4240
📜 Towards a unified theory of sparse dimensionality reduction in Euclidean space

This paper attempts to layout the generic mathematical framework (in terms of convex analysis and functional analysis) for sparse dimensionality reduction. The first author is a Fields Medalist who is interested in taking techniques for Banach Spaces and applying them to this problem. This paper is a very technical paper that attempts to answer the question, "when does a sparse embedding exist deterministically?" (e.g. doesn't require drawing random matrices).

Bourgain, Jean, and Jelani Nelson. "Toward a unified theory of sparse dimensionality reduction in euclidean space." arXiv preprint arXiv:1311.2542; Accepted in an AMS Journal but unpublished at the moment (2013). Available: http://arxiv.org/abs/1311.2542
📜 Understanding Deep Convolutional Networks by Mallat

Stéphane Mallat proposes a model by which renormalisation can identify self-similar structures in deep networks. This video of Curt MacMullen discussing renormalization can help with more context.
📜 General self-similarity: an overview by Leinster

Dr. Leinster's paper provides a concise, straightforward, picture of self-similarity, and its role in renormalization.