papers-we-love_papers-we-love/machine_learning/README.md

# Machine Learning

## External Papers

* [Top 10 algorithms in data mining](http://www.cs.uvm.edu/~icdm/algorithms/10Algorithms-08.pdf) - While it is difficult to identify the top 10, this paper contains 10 very important data mining/machine learning algorithms
* [A Few Useful Things to Know about Machine Learning](http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf) - Just like the title says, it contains many useful tips and gotchas for machine learning
* [Random Forests](https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf) - The initial paper on random forests
* [Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data](http://repository.upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers) - The paper introducing conditional random fields as a framework for building probabilistic models.
* [Support-Vector Networks](http://rd.springer.com/content/pdf/10.1007%2FBF00994018.pdf) - The initial paper on support-vector networks for classification.
* [The Fast Johnson-Lindenstrauss Transforms](https://www.cs.princeton.edu/~chazelle/pubs/FJLT-sicomp09.pdf)

    The Johnson-Lindenstrauss transform (JLT) prescribes that there exists a matrix of size `k x d`, where `k = O(1/eps^2 log d)` such that with high probability, a matrix A drawn from this distribution preserves pairwise distances up to epsilon (e.g. `(1-eps) * ||x-y|| < ||Ax - Ay|| < (1+eps) ||x-y||`). This paper was the first paper to show that you can actually compute the JLT in less that `O(kd)` operations (e.g. you don't need to do the full matrix multiplication). They used their faster algorithm to construct one of the fastest known approximate nearest neighbor algorithms.

    *Ailon, Nir, and Bernard Chazelle. "The fast Johnson-Lindenstrauss transform and approximate nearest neighbors." SIAM Journal on Computing 39.1 (2009): 302-322. Available: https://www.cs.princeton.edu/~chazelle/pubs/FJLT-sicomp09.pdf*

* [Applications of Machine Learning to Location Data](http://www.berkkapicioglu.com/wp-content/uploads/2013/11/thesis_final.pdf) -  Using machine learning to design and analyze novel algorithms that leverage location data.

* ["Why Should I Trust You?" Explaining the Predictions of Any Classifier](http://www.kdd.org/kdd2016/papers/files/rfp0573-ribeiroA.pdf) - This paper introduces an explanation technique for any classifier in a interpretable manner. 

## Hosted Papers

* :scroll: **[A Sparse Johnson-Lindenstrauss Transform](dimensionality_reduction/a-sparse-johnson-lindenstrauss-transform.pdf)**

    The JLT is still computationally expensive for a lot of applications and one goal would be to minimize the overall operations needed to do the aforementioned matrix multiplication. This paper showed that a goal of a `O(k log d)` algorithm (e.g. `(log(d))^2)` may be attainable by showing that very sparse, structured random matrices could provide the *JL* guarantee on pairwise distances.

    *Dasgupta, Anirban, Ravi Kumar, and Tamás Sarlós. "A sparse johnson: Lindenstrauss transform." Proceedings of the forty-second ACM symposium on Theory of computing. ACM, 2010. Available: [arXiv/cs/1004:4240](http://arxiv.org/abs/1004.4240)*

* :scroll: **[Towards a unified theory of sparse dimensionality reduction in Euclidean space](dimensionality_reduction/toward-a-unified-theory-of-sparse-dimensionality-reduction-in-euclidean-space.pdf)**

    This paper attempts to layout the generic mathematical framework (in terms of convex analysis and functional analysis) for sparse dimensionality reduction. The first author is a Fields Medalist who is interested in taking techniques for Banach Spaces and applying them to this problem. This paper is a very technical paper that attempts to answer the question, "when does a sparse embedding exist deterministically?" (e.g. doesn't require drawing random matrices).

    *Bourgain, Jean, and Jelani Nelson. "Toward a unified theory of sparse dimensionality reduction in euclidean space." arXiv preprint arXiv:1311.2542; Accepted in an AMS Journal but unpublished at the moment  (2013). Available: http://arxiv.org/abs/1311.2542*
remove paper with prohibitive copyright Also, fix readme formatting in two files. 2015-06-17 17:00:13 +00:00			`# Machine Learning`
Create README.md just the initial folder structure Update README.md added 3 links Update README.md update links to academic sites 2014-04-05 17:59:48 +00:00
			`## External Papers`

			`* [Top 10 algorithms in data mining](http://www.cs.uvm.edu/~icdm/algorithms/10Algorithms-08.pdf) - While it is difficult to identify the top 10, this paper contains 10 very important data mining/machine learning algorithms`
			`* [A Few Useful Things to Know about Machine Learning](http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf) - Just like the title says, it contains many useful tips and gotchas for machine learning`
Updated Random forests paper location the old one was not responding and https://www.stat.berkeley.edu/~breiman/papers.html links to new location. 2016-03-05 14:56:22 +00:00			`* [Random Forests](https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf) - The initial paper on random forests`
Fix machine learning paper link and spelling (#419) 2016-09-29 21:45:45 +00:00			`* [Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data](http://repository.upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers) - The paper introducing conditional random fields as a framework for building probabilistic models.`
Add 'Support-Vector Networks' paper. 2015-04-08 23:01:58 +00:00			`* [Support-Vector Networks](http://rd.springer.com/content/pdf/10.1007%2FBF00994018.pdf) - The initial paper on support-vector networks for classification.`
machine learning additions. sublinear README. The Fast Johnson-Lindenstrauss Transform A Sparse Johnson-Lindenstrauss Transform Towards a unified theory of sparse dimensionality reduction in Euclidean space 2015-05-30 22:26:17 +00:00			`* [The Fast Johnson-Lindenstrauss Transforms](https://www.cs.princeton.edu/~chazelle/pubs/FJLT-sicomp09.pdf)`
Create README.md just the initial folder structure Update README.md added 3 links Update README.md update links to academic sites 2014-04-05 17:59:48 +00:00
machine learning additions. sublinear README. The Fast Johnson-Lindenstrauss Transform A Sparse Johnson-Lindenstrauss Transform Towards a unified theory of sparse dimensionality reduction in Euclidean space 2015-05-30 22:26:17 +00:00			The Johnson-Lindenstrauss transform (JLT) prescribes that there exists a matrix of size `k x d`, where `k = O(1/eps^2 log d)` such that with high probability, a matrix A drawn from this distribution preserves pairwise distances up to epsilon (e.g. `(1-eps) * \|\|x-y\|\| < \|\|Ax - Ay\|\| < (1+eps) \|\|x-y\|\|`). This paper was the first paper to show that you can actually compute the JLT in less that `O(kd)` operations (e.g. you don't need to do the full matrix multiplication). They used their faster algorithm to construct one of the fastest known approximate nearest neighbor algorithms.

			`Ailon, Nir, and Bernard Chazelle. "The fast Johnson-Lindenstrauss transform and approximate nearest neighbors." SIAM Journal on Computing 39.1 (2009): 302-322. Available: https://www.cs.princeton.edu/~chazelle/pubs/FJLT-sicomp09.pdf`

Fix machine learning paper link and spelling (#419) 2016-09-29 21:45:45 +00:00			`* [Applications of Machine Learning to Location Data](http://www.berkkapicioglu.com/wp-content/uploads/2013/11/thesis_final.pdf) - Using machine learning to design and analyze novel algorithms that leverage location data.`
Added Applications of Machine Learning to Location Data 2015-09-11 01:00:03 +00:00
Adding the trusting classifiers paper from the Seattle papers we love (#466) chapter, presented on July 2017. 2017-07-09 20:26:39 +00:00			`* ["Why Should I Trust You?" Explaining the Predictions of Any Classifier](http://www.kdd.org/kdd2016/papers/files/rfp0573-ribeiroA.pdf) - This paper introduces an explanation technique for any classifier in a interpretable manner.`

machine learning additions. sublinear README. The Fast Johnson-Lindenstrauss Transform A Sparse Johnson-Lindenstrauss Transform Towards a unified theory of sparse dimensionality reduction in Euclidean space 2015-05-30 22:26:17 +00:00			`## Hosted Papers`

Update to all READMEs for hosted content reorganization of so-called historical papers 2015-10-07 19:11:04 +00:00			`* :scroll: [A Sparse Johnson-Lindenstrauss Transform](dimensionality_reduction/a-sparse-johnson-lindenstrauss-transform.pdf)`
machine learning additions. sublinear README. The Fast Johnson-Lindenstrauss Transform A Sparse Johnson-Lindenstrauss Transform Towards a unified theory of sparse dimensionality reduction in Euclidean space 2015-05-30 22:26:17 +00:00
			The JLT is still computationally expensive for a lot of applications and one goal would be to minimize the overall operations needed to do the aforementioned matrix multiplication. This paper showed that a goal of a `O(k log d)` algorithm (e.g. `(log(d))^2)` may be attainable by showing that very sparse, structured random matrices could provide the JL guarantee on pairwise distances.

			`Dasgupta, Anirban, Ravi Kumar, and Tamás Sarlós. "A sparse johnson: Lindenstrauss transform." Proceedings of the forty-second ACM symposium on Theory of computing. ACM, 2010. Available: [arXiv/cs/1004:4240](http://arxiv.org/abs/1004.4240)`

Update to all READMEs for hosted content reorganization of so-called historical papers 2015-10-07 19:11:04 +00:00			`* :scroll: [Towards a unified theory of sparse dimensionality reduction in Euclidean space](dimensionality_reduction/toward-a-unified-theory-of-sparse-dimensionality-reduction-in-euclidean-space.pdf)`
machine learning additions. sublinear README. The Fast Johnson-Lindenstrauss Transform A Sparse Johnson-Lindenstrauss Transform Towards a unified theory of sparse dimensionality reduction in Euclidean space 2015-05-30 22:26:17 +00:00
			`This paper attempts to layout the generic mathematical framework (in terms of convex analysis and functional analysis) for sparse dimensionality reduction. The first author is a Fields Medalist who is interested in taking techniques for Banach Spaces and applying them to this problem. This paper is a very technical paper that attempts to answer the question, "when does a sparse embedding exist deterministically?" (e.g. doesn't require drawing random matrices).`

			`Bourgain, Jean, and Jelani Nelson. "Toward a unified theory of sparse dimensionality reduction in euclidean space." arXiv preprint arXiv:1311.2542; Accepted in an AMS Journal but unpublished at the moment (2013). Available: http://arxiv.org/abs/1311.2542`
Create README.md just the initial folder structure Update README.md added 3 links Update README.md update links to academic sites 2014-04-05 17:59:48 +00:00