papers-we-love_papers-we-love/distributed_systems/README.md

247 lines
16 KiB
Markdown
Raw Normal View History

# Distributed Systems
## External Papers
* [:scroll:](a-note-on-distributed-computing.pdf) [A Note on Distributed Computing](http://www.eecs.harvard.edu/~waldo/Readings/waldo-94.pdf)
* [A simple totally ordered broadcast protocol](http://diyhpl.us/~bryan/papers2/distributed/distributed-systems/zab.totally-ordered-broadcast-protocol.2008.pdf)
2014-03-08 09:16:05 +00:00
* [Above the Clouds: A Berkeley View of Cloud Computing](http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf)
2014-04-24 18:06:57 +00:00
* The Calvin papers:
* [The Case for Determinism in Database Systems](http://cs-www.cs.yale.edu/homes/dna/papers/determinism-vldb10.pdf)
* [Consistency Tradeoffs in Modern Distributed Database System Design](http://cs-www.cs.yale.edu/homes/dna/papers/abadi-pacelc.pdf)
* [Modularity and Scalability in Calvin](http://sites.computer.org/debull/A13june/calvin1.pdf)
* [Calvin: Fast Distributed Transactions for Partitioned Database Systems](http://www.cs.yale.edu/homes/dna/papers/calvin-sigmod12.pdf)
* [Lightweight Locking for Main Memory Database Systems](http://cs-www.cs.yale.edu/homes/dna/papers/vll-vldb13.pdf)
2014-03-14 02:39:39 +00:00
* [Cassandra - A Decentralized Structured Storage System](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.161.6751&rep=rep1&type=pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications](http://pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data](http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [Dont Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS](http://www.cs.cmu.edu/~dga/papers/cops-sosp2011.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [Dremel: Interactive Analysis of Web-Scale Datasets](http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/36632.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [F1: A Distributed SQL Database That Scales](http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/41344.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [HaLoop: Efficient Iterative Data Processing on Large Clusters](http://www.ics.uci.edu/~yingyib/papers/HaLoop_camera_ready.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [Hoard: A Scalable Memory Allocator for Multithreaded Applications](http://people.cs.umass.edu/~emery/pubs/berger-asplos2000.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [HyperDex: A Distributed, Searchable Key-Value Store](https://cs.uwaterloo.ca/~bernard/hyperdex.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial](https://www.cs.cornell.edu/fbs/publications/SMSurvey.pdf)
2014-03-08 09:16:05 +00:00
2014-08-26 01:22:02 +00:00
* [Introduction to a System for Distributed Databases SDD-1](http://www.few.vu.nl/~kgr700/sdd1.pdf)
2014-03-14 02:39:39 +00:00
* [Kafka: a Distributed Messaging System for Log Processing](http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf)
2014-03-08 09:16:05 +00:00
2015-04-17 16:44:32 +00:00
* [Large-scale cluster management at Google with Borg](http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/43438.pdf)
2014-03-14 02:39:39 +00:00
* [Linearizability: A Correctness Condition for Concurrent Objects](http://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf)
2014-03-08 09:16:05 +00:00
* [Making Reliable Distributed Systems in the Presence of Software Errors](http://www.erlang.org/download/armstrong_thesis_2003.pdf)
2014-03-14 02:39:39 +00:00
* [Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System](http://zoo.cs.yale.edu/classes/cs422/2013/bib/terry95managing.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters](http://www.cs.duke.edu/courses/cps399.28/current/papers/sigmod07-YangDasdanEtAl-map_reduce_merge.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [MDCC: Multi-Data Center Consistency](https://amplab.cs.berkeley.edu/wp-content/uploads/2013/03/mdcc-eurosys13.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [MillWheel: Fault-Tolerant Stream Processing at Internet Scale](http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/41378.pdf)
2014-03-08 09:16:05 +00:00
2014-07-18 04:03:12 +00:00
* [Omega: flexible, scalable schedulers for large compute clusters](http://research.google.com/pubs/archive/41684.pdf)
2014-03-14 02:39:39 +00:00
* [Optimistic replication](http://pages.cs.wisc.edu/~remzi/Classes/739/Spring2004/Papers/optimistic-survey.pdf)
2014-03-08 09:16:05 +00:00
2014-08-06 13:46:32 +00:00
* [Orleans: Distributed Virtual Actors for Programmability and Scalability] (http://research.microsoft.com/apps/pubs/default.aspx?id=210931)
2014-03-14 02:39:39 +00:00
* [Paxos Made Live - An Engineering Perspective](http://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/papers/paper2-1.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [Practical Byzantine Fault Tolerance and Proactive Recovery](http://www.itu.dk/stud/speciale/bepjea/xwebtex/litt/practical-byzantine-fault-tolerance-and-proactive-recovery.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [Pregel: A System for Large-Scale Graph Processing](http://kowshik.github.io/JPregel/pregel_paper.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [Replication, History, and Grafting in the Ori File System](http://sigops.org/sosp/sosp13/papers/p151-mashtizadeh.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [Resilient Overlay Networks](http://nms.lcs.mit.edu/papers/ron-sosp2001.pdf)
2014-03-08 09:16:05 +00:00
2015-07-19 10:36:42 +00:00
* [Sinfonia: A New Paradigm for Building Scalable Distributed Systems](http://www.mshah.org/papers/sosp_2007_aguilera.pdf)
2014-03-08 09:16:05 +00:00
* [Sparrow: Distributed, Low Latency Scheduling](http://people.csail.mit.edu/matei/papers/2013/sosp_sparrow.pdf)
2014-03-14 02:39:39 +00:00
* [The Byzantine Generals Problem](http://www.andrew.cmu.edu/course/15-749/READINGS/required/resilience/lamport82.pdf)
2014-03-08 09:16:05 +00:00
* [:scroll:](the-chubby-lock-service-for-loosely-coupled-distributed-systems.pdf) [The Chubby Lock Service for Loosely-Coupled Distributed Systems](http://static.googleusercontent.com/media/research.google.com/en/us/archive/chubby-osdi06.pdf)
2014-07-18 04:03:12 +00:00
2014-03-14 02:39:39 +00:00
* [The Dangers of Replication and a Solution](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.21.2707&rep=rep1&type=pdf)
2014-03-08 09:16:05 +00:00
* [:scroll:](join-calculus.pdf) [The Join Calculus: a Language for Distributed Mobile Programming](http://research.microsoft.com/en-us/um/people/fournet/papers/join-tutorial.pdf)
2015-02-26 19:37:58 +00:00
2014-03-14 02:39:39 +00:00
* [The Part-Time Parliament](http://research.microsoft.com/en-us/um/people/lamport/pubs/lamport-paxos.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [There Is More Consensus in Egalitarian Parliaments](https://www.cs.cmu.edu/~dga/papers/epaxos-sosp2013.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [Towards a Next Generation Data Center Architecture: Scalability and Commoditization](http://research.microsoft.com/pubs/79348/presto27-greenberg.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [Transactional Client-Server Cache Consistency: Alternatives and Performance](http://www.cs.berkeley.edu/~franklin/Papers/p315-franklin.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [Unicorn: A System for Searching the Social Graph](http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p871-curtiss.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [Unikernels: Library Operating Systems for the Cloud](http://anil.recoil.org/papers/2013-asplos-mirage.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [Untraceable Electronic Mail, Return Addresses, and Digital Pseudonyms](http://www.cs.utexas.edu/~shmat/courses/cs395t_fall04/chaum81.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems](http://www.pmg.csail.mit.edu/papers/vr.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [VL2: A Scalable and Flexible Data Center Network](http://research.microsoft.com/pubs/80693/vl2-sigcomm09-final.pdf)
2015-02-26 19:37:58 +00:00
## Related Works
### [“On the Electrodynamics of Moving Bodies” (1905) — Einstein](../historical/physics/on-the-electrodynamics-of-moving-bodies.pdf)
2014-07-18 04:03:12 +00:00
By solving the [asymmetries](http://en.wikipedia.org/wiki/Moving_magnet_and_conductor_problem) that arise in Maxwells equations, Einsteins 1905 paper set the stage for current distributed systems work by demonstrating that there is no absolute frame of reference and by providing an upper bound on the speed of communication.
## Other Hosted Papers
* :scroll: [A History of the Virtual Synchrony Replication Model](a-history-of-the-virtual-synchrony-replication-model.pdf)
* :scroll: [A Hundred Impossibility Proofs for Distributed Systems](a-hundred-impossibility-proofs-for-distributed-computing.pdf)
* :scroll: [A response to Cheriton and Skeen's Criticism of Causal and Totally Ordered Communication](a-response-to-cheriton-and-skeens-criticism-of-causal-and-totally-ordered-communication.pdf)
* :scroll: [A Universal Modular ACTOR Formalism for Artificial Intelligence](a-universal-modular-actor-formalism-for-artificial-intelligence.pdf)
* :scroll: [A Versatile Scheme for Routing Highly Variable Traffic in Service Overlays and IP Backbones](a-versatile-scheme-for-routing-highly-variable-traffic-in-service-overlays-and-ip.pdf)
* :scroll: [Beehive: O(1) Lookup Performance for Power-Law Query Distributions in Peer-to-Peer Overlays](beehive-lookup-performance-for-power-law-query-distributions-in-peer-to-peer-overlays.pdf)
* :scroll: [Bigtable: A Distributed Storage System for Structured Data](bigtable-a-distributed-storage-system-for-structured-data.pdf)
* :scroll: [Byzantine Chain Replication](bizantine-chain-replication.pdf)
* :scroll: [A Byzantine Fault Tolerant Distributed Commit Protocol](bizantine-fault-tolerant-distributed-commit-protocol.pdf)
* :scroll: [Brewers Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services](brewers-conjecture.pdf)
* :scroll: [Chain Replication for Supporting High Throughput and Availability](chain-replication-for-supporting-high-throughput-and-availability.pdf)
* :scroll: [Commodifying Replicated State Machines with OpenReplica](commodifying-replicated-state-machines-with-openreplica.pdf)
* :scroll: [Consensusin the Presenceof Partial Synchrony](consensus-in-presence-of-partial-synchrony.pdf)
* :scroll: [Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms](consistent-global-states-of-distributed-systems-fundamental-concepts-and-mechanisms.pdf)
* :scroll: [Consistent Hashing and Random Trees:
Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web](consistent-hashing-and-random-trees.pdf)
* :scroll: [Copysets: Reducing the Frequency of Data Loss in Cloud Storage](copysets-reducing-the-frequency-of-data-loss-in-cloud-storage.pdf)
* :scroll: [Dapper, a Large-Scale Distributed Systems Tracing Infrastructure](dapper-a-large-scale-distributed-tracing-infrastructure.pdf)
* :scroll: [Database Metatheory: Asking Big Queries](database-metatheory--asking-the-big-queries.pdf)
* :scroll: [Distributed Snapshots: Determining Global States of Distributed Systems](distributed-snapshots-determining-global-states-of-distributed-systems.pdf)
* :scroll: [Dynamo: Amazons Highly Available Key-value Store](dynamo-amazons-highly-available-key-value-store.pdf)
* :scroll: [Eluding Carnivores: File Sharing with Strong Anonymity](eluding-carnivores-file-sharing-with-strong-anonymity.pdf)
* :scroll: [End-to-end arguments in system design](end-to-end-arguments-in-system-design.pdf)
* :scroll: [Epidemic Algorithms for Replicated Database Maintenance](epidemic-algorithms-for-replicated-database-maintenance.pdf)
* :scroll: [Flat Datacenter Storage](flat-datacenter-storage.pdf)
* :scroll: [Freenet: A Distributed Anonymous Information Storage and Retrieval System](freenet-a-distributed-anonymous-information-and-retrieval-system.pdf)
* :scroll: [Harvest, Yield, and Scalable Tolerant Systems](harvest-yield-and-scalable-tolerant-systems.pdf)
* :scroll: [Herbivore: A Scalable and Efficient Protocol for Anonymous Communication](herbivore-a-scalable-and-efficient-protocol-for-anonymous.pdf)
* :scroll: [High-Level Specifications: Lessons from Industry](high-level-specifications--lessons-from-industry.pdf)
* :scroll: [How the Hidden Hand Shapes the Market for Software Reliability](how-the-hidden-hand-shapes-the-market-for-software-reliability.pdf)
* :scroll: [Implementing the Omega failure detector in the crash-recovery failure model](implementing-the-omega-failure-detector-in-crash-recovery-failure-model.pdf)
* :scroll: [Impossibility of Distributed Consensuswith One Faulty Process](impossibility-of-consensus-with-one-faulty-process.pdf)
* :scroll: [In Search of an Understandable Consensus Algorithm](in-search-of-an-understandable-consensus-algorithm.pdf)
* :scroll: [Kelips*: Building an Efficient and Stable P2P DHT Through Increased Memory and Background Overhead](kelips-building-an-efficient-and-stable-p2p-dht-through-increased-memory-and-background-overhead.pdf)
* :scroll: [Large-scale Incremental Processing Using Distributed Transactions and Notifications](large-scale-incremental-processing-using-distributed-transactions-and-notifications.pdf)
* :scroll: [Life beyond Distributed Transactions: an Apostates Opinion](life-beyoud-distributed-transactions-an-apostates-opinion.pdf)
* :scroll: [MapReduce: Simplified Data Processing on Large Clusters](mapreduce-simplified-data-processing-on-large-clusters.pdf)
* :scroll: [Megastore: Providing Scalable, Highly Available Storage for Interactive Services](megastore-providing-scalable-highly-available-storage-for-interactive-services.pdf)
* :scroll: [Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center](mesos-a-platform-for-fine-grained-resource-sharing-in-the-data-center.pdf)
* :scroll: [A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster](network-challenges-of-data-recovery-in-erasure-coded-distributed-storage-systems.pdf)
* :scroll: [Oblivious routing of highly variable traffic in service overlays and IP backbones](oblivious-routing-of-highly-variable-traffic-in-service-overlays-and-ip-backbones.pdf)
* :scroll: [On proof and progress in mathematics](on-proof-and-progress-in-mathematics.pdf)
* :scroll: [P5: A Protocol for Scalable Anonymous Communication](p5-a-protocal-for-scalable-anonymous-communication.pdf)
* :scroll: [Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems](pastry-scalable-decentralized-object-location-and-routing-for-large-scale-peer-to-peer-systems.pdf)
* :scroll: [Paxos Made Moderately Complex](paxos-made-moderately-complex.pdf)
* :scroll: [Paxos Made Simple](paxos-made-simple.pdf)
* :scroll: [RADOS: A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters](rados-a-scalable-reliable-storage-service-for-petabyte-scale-storage-clusters.pdf)
* :scroll: [Self-stabilizing Systems in Spite of Distributed Control](self-stabilizing-systems-in-spite-of-distributed-control.pdf)
* :scroll: [SIFT: Design and Analysis of a Fault-Tolerant Computer for Aircraft Control](sift-design-and-analysis-of-a-fault-tolerant-computer-for-aircraft-contro.pdf)
* :scroll: [Signal/Collect: Graph Algorithms for the (Semantic) Web](signal-%26-collect-graph-algorithms-for-the-\(semantic\)-web.pdf)
* :scroll: [Slution of a Problem in
Concurrent Programming Control](solution-of-a-problem-in-concurrent-programming-control.pdf)
* :scroll: [Spanner: Googles Globally-Distributed Database](spanner-google's-globally-distributed-database.pdf)
* :scroll: [Sparse Partitions](sparse-partitions.pdf)
* :scroll: [Stronger Semantics for Low-Latency Geo-Replicated Storage](stronger-semantics-for-low-latency-geo-replicated-storage.pdf)
* :scroll: [The Akamai Network: A Platform for High-Performance Internet Applications](the-akamai-network.pdf)
* :scroll: [The Dining CryptographersProblem:
Unconditional Sender and Recipient Untraceability](the-dining-cryptographers-problem.pdf)
* :scroll: [Tor: The Second-Generation Onion Router](tor-the-second-generation-onion-router.pdf)
* :scroll: [Towards a cloud computing research agenda](towards-a-cloud-computing-research-agenda.pdf)
* :scroll: [Transactional storage for geo-replicated systems](transactional-storage-for-geo-replicated-systems.pdf)
* :scroll: [Understanding the Limitations of Causally and Totally Ordered Communication](understanding-the-limitations-of-causally-and-totally-ordered-communication.pdf)
* :scroll: [Viewing Control Structures as Patterns of Passing Messages](viewing-control-structures-as-patterns-of-passing-messages.pdf)
* :scroll: [Warp: Multi-Key Transactions for Key-Value Stores](warp-multi-key-transactions-for-key-value-stores.pdf)
* :scroll: [Zab: High-performance broadcast for primary-backup systems
](zab-high-performance-broadcast-for-primary-backup-systems.pdf)
* :scroll: [ZooKeeper: Wait-free coordination for Internet-scale systems](zookeeper-wait-free-coordination-for-internet-scale-systems.pdf)