diff --git a/distributed_systems/README.md b/distributed_systems/README.md index 6e15f13..d42dc33 100644 --- a/distributed_systems/README.md +++ b/distributed_systems/README.md @@ -4,7 +4,7 @@ * [:scroll:](a-note-on-distributed-computing.pdf) [A Note on Distributed Computing](http://www.eecs.harvard.edu/~waldo/Readings/waldo-94.pdf) -* [A simple totally ordered broadcast protocol](http://labs.yahoo.com/files/ladis08.pdf) +* [A simple totally ordered broadcast protocol](http://diyhpl.us/~bryan/papers2/distributed/distributed-systems/zab.totally-ordered-broadcast-protocol.2008.pdf) * [Above the Clouds: A Berkeley View of Cloud Computing](http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf) @@ -142,7 +142,8 @@ By solving the [asymmetries](http://en.wikipedia.org/wiki/Moving_magnet_and_cond * :scroll: [Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms](consistent-global-states-of-distributed-systems-fundamental-concepts-and-mechanisms.pdf) -* :scroll: [Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web](consistent-hashing-and-random-trees.pdf) +* :scroll: [Consistent Hashing and Random Trees: +Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web](consistent-hashing-and-random-trees.pdf) * :scroll: [Copysets: Reducing the Frequency of Data Loss in Cloud Storage](copysets-reducing-the-frequency-of-data-loss-in-cloud-storage.pdf) @@ -212,7 +213,8 @@ By solving the [asymmetries](http://en.wikipedia.org/wiki/Moving_magnet_and_cond * :scroll: [Signal/Collect: Graph Algorithms for the (Semantic) Web](signal-%26-collect-graph-algorithms-for-the-\(semantic\)-web.pdf) -* :scroll: [Slution of a Problem in Concurrent Programming Control](solution-of-a-problem-in-concurrent-programming-control.pdf) +* :scroll: [Slution of a Problem in +Concurrent Programming Control](solution-of-a-problem-in-concurrent-programming-control.pdf) * :scroll: [Spanner: Google’s Globally-Distributed Database](spanner-google's-globally-distributed-database.pdf) @@ -222,7 +224,8 @@ By solving the [asymmetries](http://en.wikipedia.org/wiki/Moving_magnet_and_cond * :scroll: [The Akamai Network: A Platform for High-Performance Internet Applications](the-akamai-network.pdf) -* :scroll: [The Dining CryptographersProblem: Unconditional Sender and Recipient Untraceability](the-dining-cryptographers-problem.pdf) +* :scroll: [The Dining CryptographersProblem: +Unconditional Sender and Recipient Untraceability](the-dining-cryptographers-problem.pdf) * :scroll: [Tor: The Second-Generation Onion Router](tor-the-second-generation-onion-router.pdf) @@ -236,7 +239,8 @@ By solving the [asymmetries](http://en.wikipedia.org/wiki/Moving_magnet_and_cond * :scroll: [Warp: Multi-Key Transactions for Key-Value Stores](warp-multi-key-transactions-for-key-value-stores.pdf) -* :scroll: [Zab: High-performance broadcast for primary-backup systems ](zab-high-performance-broadcast-for-primary-backup-systems.pdf) +* :scroll: [Zab: High-performance broadcast for primary-backup systems +](zab-high-performance-broadcast-for-primary-backup-systems.pdf) * :scroll: [ZooKeeper: Wait-free coordination for Internet-scale systems](zookeeper-wait-free-coordination-for-internet-scale-systems.pdf) diff --git a/information_retrieval/README.md b/information_retrieval/README.md index 3df9c08..65ac807 100644 --- a/information_retrieval/README.md +++ b/information_retrieval/README.md @@ -17,7 +17,17 @@ The included documents are significantly better results that popular existing models. This paper won a honorable mention at CIKM 2013. -* [The Anatomy of a Large-Scale Hypertextual Web Search Engine](http://infolab.stanford.edu/~backrub/google.html) +* [:scroll:](pagerank.pdf) [Pagerank Algorithm](http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf) - Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd + + This paper introduces the PageRank algorithm, which forms the backbone of + the present day google search engine. Pagerank operates by assessing the + number of incoming and outgoing hyper links to a given web page and ranks the + pages based on the link structure of a page. The authors also implemented + PageRank on the backrub system (now called the Google Search + Engine) in the [Anatomy of a Large-Scale Hypertextual Web Search Engine] + http://infolab.stanford.edu/~backrub/google.html which assigned page ranks to + every webpage in the world wide web. Google is currently the most commercially + sucessful generic search engine in the world. * [:scroll:](ocapi-trec3.pdf) [Okapi System](http://trec.nist.gov/pubs/trec3/papers/city.ps.gz) - Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford @@ -34,3 +44,19 @@ The included documents are used in BM25. BM25 has been shown to be one of the best probabilistic weighting schemes. While the paper was in postscript form, the committer has changed the format to pdf as per guidelines of papers we love via ps2pdf. + +* [:scroll:](hits.pdf) [Hits Algorithm](https://www.cs.cornell.edu/home/kleinber/auth.pdf) - Jon M. Kleinberg + + This paper introduces the HITS algorithm, a link analysis algorithm that rates + webpages. Unlike the more famous page rank algorithm, the hits algorithm + makes a distinction between webpage behavior classifies them as hubs and authorities. + A page is authoratitative (in the sense the page has a large number of incoming links) + or acts as a hub (a directory of sort, which can be measured by the number of outgoing + link). The hits algorithm computes two scores for a page (authority and hub score) + where the algorithm iteratively computes the hub score as sum of authority scores of + outgoing links and authority scores as sum of hub scores of incoming links until a + convergence is attained. These scores can then be used to rank documents. While + this algorithm is famous in academia, its not very widely used in the industry + (a variant of this algorithm was used by a company called Teoma which was acquired + by AskJeeves) + diff --git a/information_retrieval/hits.pdf b/information_retrieval/hits.pdf new file mode 100644 index 0000000..590b6be Binary files /dev/null and b/information_retrieval/hits.pdf differ diff --git a/information_retrieval/pagerank.pdf b/information_retrieval/pagerank.pdf new file mode 100644 index 0000000..0523ae0 Binary files /dev/null and b/information_retrieval/pagerank.pdf differ