diff --git a/concurrency/experience-withprocesses-and-monitors-in-mesa.pdf b/concurrency/experience-withprocesses-and-monitors-in-mesa.pdf deleted file mode 100644 index 79a34c3..0000000 Binary files a/concurrency/experience-withprocesses-and-monitors-in-mesa.pdf and /dev/null differ diff --git a/distributed_systems/README.md b/distributed_systems/README.md index e15f634..4adab39 100644 --- a/distributed_systems/README.md +++ b/distributed_systems/README.md @@ -83,9 +83,9 @@ * :scroll: [Beehive: O(1) Lookup Performance for Power-Law Query Distributions in Peer-to-Peer Overlays](beehive-lookup-performance-for-power-law-query-distributions-in-peer-to-peer-overlays.pdf) -* :scroll: [Byzantine Chain Replication](bizantine-chain-replication.pdf) +* :scroll: [Byzantine Chain Replication](byzantine-chain-replication.pdf) -* :scroll: [A Byzantine Fault Tolerant Distributed Commit Protocol](bizantine-fault-tolerant-distributed-commit-protocol.pdf) +* :scroll: [A Byzantine Fault Tolerant Distributed Commit Protocol](byzantine-fault-tolerant-distributed-commit-protocol.pdf) * :scroll: [Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services](brewers-conjecture.pdf) @@ -130,7 +130,7 @@ Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web](con * :scroll: [Large-scale Incremental Processing Using Distributed Transactions and Notifications](large-scale-incremental-processing-using-distributed-transactions-and-notifications.pdf) -* :scroll: [Life beyond Distributed Transactions: an Apostate’s Opinion](life-beyoud-distributed-transactions-an-apostates-opinion.pdf) +* :scroll: [Life beyond Distributed Transactions: an Apostate’s Opinion](life-beyond-distributed-transactions-an-apostates-opinion.pdf) * :scroll: [MapReduce: Simplified Data Processing on Large Clusters](mapreduce-simplified-data-processing-on-large-clusters.pdf) @@ -154,7 +154,7 @@ Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web](con * :scroll: [Signal/Collect: Graph Algorithms for the (Semantic) Web](signal-%26-collect-graph-algorithms-for-the-\(semantic\)-web.pdf) -* :scroll: [Slution of a Problem in +* :scroll: [Slution of a Problem in Concurrent Programming Control](solution-of-a-problem-in-concurrent-programming-control.pdf) * :scroll: [Sparse Partitions](sparse-partitions.pdf) @@ -260,7 +260,7 @@ Full Cluster Geo-replication](tiered-replication-a-cost-effective-alternative-to * :scroll: [“On the Electrodynamics of Moving Bodies” (1905) — Einstein](../physics/on-the-electrodynamics-of-moving-bodies.pdf) By solving the [asymmetries](http://en.wikipedia.org/wiki/Moving_magnet_and_conductor_problem) that arise in Maxwell’s equations, Einstein’s 1905 paper set the stage for current distributed systems work by demonstrating that there is no absolute frame of reference and by providing an upper bound on the speed of communication. - + ### Testing, Verification, and Correctness * :scroll: [Simple Testing Can Prevent Most Critical Failures: @@ -268,5 +268,3 @@ An Analysis of Production Failures in Distributed Data-Intensive Systems](https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-yuan.pdf) * :scroll: [IronFleet: Proving Practical Distributed Systems Correct](http://research.microsoft.com/pubs/255833/IronFleet-twocol.pdf) - - diff --git a/distributed_systems/bizantine-chain-replication.pdf b/distributed_systems/byzantine-chain-replication.pdf similarity index 100% rename from distributed_systems/bizantine-chain-replication.pdf rename to distributed_systems/byzantine-chain-replication.pdf diff --git a/distributed_systems/bizantine-fault-tolerant-distributed-commit-protocol.pdf b/distributed_systems/byzantine-fault-tolerant-distributed-commit-protocol.pdf similarity index 100% rename from distributed_systems/bizantine-fault-tolerant-distributed-commit-protocol.pdf rename to distributed_systems/byzantine-fault-tolerant-distributed-commit-protocol.pdf diff --git a/distributed_systems/life-beyoud-distributed-transactions-an-apostates-opinion.pdf b/distributed_systems/life-beyond-distributed-transactions-an-apostates-opinion.pdf similarity index 100% rename from distributed_systems/life-beyoud-distributed-transactions-an-apostates-opinion.pdf rename to distributed_systems/life-beyond-distributed-transactions-an-apostates-opinion.pdf diff --git a/information_retrieval/README.md b/information_retrieval/README.md index ab83fc9..bfacac0 100644 --- a/information_retrieval/README.md +++ b/information_retrieval/README.md @@ -17,35 +17,34 @@ The included documents are significantly better results that popular existing models. This paper won a honorable mention at CIKM 2013. -* [:scroll:](pagerank.pdf) [Pagerank Algorithm](http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf) - Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd - - This paper introduces the PageRank algorithm, which forms the backbone of - the present day google search engine. Pagerank operates by assessing the - number of incoming and outgoing hyper links to a given web page and ranks the - pages based on the link structure of a page. The authors also implemented +* [:scroll:](the-pagerank-citation-ranking-bringing-order-to-the-web) [The PageRank Citation Ranking: Bringing Order to the Web](http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf) - Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd + + This paper introduces the PageRank algorithm, which forms the backbone of + the present day google search engine. Pagerank operates by assessing the + number of incoming and outgoing hyper links to a given web page and ranks the + pages based on the link structure of a page. The authors also implemented PageRank on the backrub system (now called the Google Search - Engine) in the [Anatomy of a Large-Scale Hypertextual Web Search Engine] + Engine) in the [Anatomy of a Large-Scale Hypertextual Web Search Engine] http://infolab.stanford.edu/~backrub/google.html which assigned page ranks to every webpage in the world wide web. Google is currently the most commercially - sucessful generic search engine in the world. + sucessful generic search engine in the world. -* [:scroll:](ocapi-trec3.pdf) [Okapi System](http://trec.nist.gov/pubs/trec3/papers/city.ps.gz) - Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford +* [:scroll:](okapi-at-trec3.pdf) [Okapi at TREC3](http://trec.nist.gov/pubs/trec3/papers/city.ps.gz) - Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford This paper introduces the now famous Okapi information retrieval - framework which introduces the BM25 ranking function for ranked + framework which introduces the BM25 ranking function for ranked retrieval. It is one of the first implementations of the probabilistic - retrieval frameworks in literature. BM25 is a bag of words retrieval + retrieval frameworks in literature. BM25 is a bag of words retrieval function. The IDF(Inverse document frequency) term can be interpreted via information theory. If a query q appears in n(q) docs the probability - of picking a doc randomly and it containing that term :p(q) = n(q) / D, - where D is the number of documents. The information content based on + of picking a doc randomly and it containing that term :p(q) = n(q) / D, + where D is the number of documents. The information content based on shannon's noisy channel model is = -log(p(q)) = log (D / n(q)). Smoothing by adding a constant to both numberator and demoninator leads to IDF term - used in BM25. BM25 has been shown to be one of the best probabilistic + used in BM25. BM25 has been shown to be one of the best probabilistic weighting schemes. While the paper was in postscript form, the committer has changed the format to pdf as per guidelines of papers we love via ps2pdf. -* [:scroll:](hits.pdf) [Hits Algorithm](https://www.cs.cornell.edu/home/kleinber/auth.pdf) - Jon M. Kleinberg +* [:scroll:](authoritative-sources-in-a-hyperlinked-environment.pdf) [Authoritative Sources in a Hyperlinked Environment](https://www.cs.cornell.edu/home/kleinber/auth.pdf) - Jon M. Kleinberg -This paper introduces the HITS algorithm, a link analysis algorithm that rates webpages. Unlike the more famous page rank algorithm, the hits algorithm makes a distinction between webpage behavior classifies them as hubs and authorities. A page is authoratitative (in the sense the page has a large number of incoming links) or acts as a hub (a directory of sort, which can be measured by the number of outgoing link). The hits algorithm computes two scores for a page (authority and hub score) where the algorithm iteratively computes the hub score as sum of authority scores of outgoing links and authority scores as sum of hub scores of incoming links until a convergence is attained. These scores can then be used to rank documents. While this algorithm is famous in academia, its not very widely used in the industry (a variant of this algorithm was used by a company called Teoma which was acquired by AskJeeves) - +This paper introduces the **HITS algorithm**, a link analysis algorithm that rates webpages. Unlike the more famous page rank algorithm, the hits algorithm makes a distinction between webpage behavior classifies them as hubs and authorities. A page is authoratitative (in the sense the page has a large number of incoming links) or acts as a hub (a directory of sort, which can be measured by the number of outgoing link). The hits algorithm computes two scores for a page (authority and hub score) where the algorithm iteratively computes the hub score as sum of authority scores of outgoing links and authority scores as sum of hub scores of incoming links until a convergence is attained. These scores can then be used to rank documents. While this algorithm is famous in academia, its not very widely used in the industry (a variant of this algorithm was used by a company called Teoma which was acquired by AskJeeves) diff --git a/information_retrieval/README.md~ b/information_retrieval/README.md~ deleted file mode 100644 index 385cca4..0000000 --- a/information_retrieval/README.md~ +++ /dev/null @@ -1,51 +0,0 @@ -## Information Retrieval - -Information retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources. (Says Wikipedia). - -The included documents are - -* [:scroll:](graph_of_word_and_tw_idf.pdf) [Graph of Word and TW-IDF](http://www.lix.polytechnique.fr/~rousseau/papers/rousseau-cikm2013.pdf) - Francois Rousseau & Michalis Vazirgiannis - - The traditional IR system stores term-specific statistics (typically - a term's frequency in each document - which we call TF) in an index. - Such a model ignores dependencies between terms and considers a - document's terms to occur independently of each other (and is aptly - called the bag-of-words model). In this paper the authors use a - statistic that uses a graph representation of a document to encode - dependencies between terms and replace the TF statistic with a new - TW statistic based on the graph constructed and achieve - significantly better results that popular existing models. This - paper won a honorable mention at CIKM 2013. - -* [:scroll:](pagerank.pdf) [Pagerank Algorithm](http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf) - Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd - - This paper introduces the PageRank algorithm, which forms the backbone of - the present day google search engine. Pagerank operates by assessing the - number of incoming and outgoing hyper links to a given web page and ranks the - pages based on the link structure of a page. The authors also implemented - PageRank on the backrub system (now called the Google Search - Engine) in the [Anatomy of a Large-Scale Hypertextual Web Search Engine] - http://infolab.stanford.edu/~backrub/google.html which assigned page ranks to - every webpage in the world wide web. Google is currently the most commercially - sucessful generic search engine in the world. - -* [:scroll:](ocapi-trec3.pdf) [Okapi System](http://trec.nist.gov/pubs/trec3/papers/city.ps.gz) - Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford - - This paper introduces the now famous Okapi information retrieval - framework which introduces the BM25 ranking function for ranked - retrieval. It is one of the first implementations of the probabilistic - retrieval frameworks in literature. BM25 is a bag of words retrieval - function. The IDF(Inverse document frequency) term can be interpreted - via information theory. If a query q appears in n(q) docs the probability - of picking a doc randomly and it containing that term :p(q) = n(q) / D, - where D is the number of documents. The information content based on - shannon's noisy channel model is = -log(p(q)) = log (D / n(q)). Smoothing - by adding a constant to both numberator and demoninator leads to IDF term - used in BM25. BM25 has been shown to be one of the best probabilistic - weighting schemes. While the paper was in postscript form, the committer has - changed the format to pdf as per guidelines of papers we love via ps2pdf. - -* [:scroll:](hits.pdf) [Hits Algorithm](https://www.cs.cornell.edu/home/kleinber/auth.pdf) - Jon M. Kleinberg - -This paper introduces the HITS algorithm, a link analysis algorithm that rates webpages. Unlike the more famous page rank algorithm, the hits algorithm makes a distinction between webpage behavior classifies them as hubs and autho rities. A page is authoratitative (in the sense the page has a large number of incoming links) or acts as a hub (a directory of sort, which can be measured by the number of outgoing link). The hits algorithm computes two scores for a page (authority and hub score) where the algorithm iteratively computes the hub score as sum of authority scores of outgoing links and authority scores as sum of hub scores of incoming links until a convergence is attained. These scores can then be used to rank documents. While this algorithm is famous in academia, its not very widely used in the industry (a variant of this algorithm was used by a company called Teoma which was acquired by AskJeeves) - diff --git a/information_retrieval/hits.pdf b/information_retrieval/authoritative-sources-in-a-hyperlinked-environment.pdf similarity index 100% rename from information_retrieval/hits.pdf rename to information_retrieval/authoritative-sources-in-a-hyperlinked-environment.pdf diff --git a/information_retrieval/ocapi-trec3.pdf b/information_retrieval/okapi-at-trec3.pdf similarity index 100% rename from information_retrieval/ocapi-trec3.pdf rename to information_retrieval/okapi-at-trec3.pdf diff --git a/information_retrieval/pagerank.pdf b/information_retrieval/the-pagerank-citation-ranking-bringing-order-to-the-web.pdf similarity index 100% rename from information_retrieval/pagerank.pdf rename to information_retrieval/the-pagerank-citation-ranking-bringing-order-to-the-web.pdf