papers-we-love_papers-we-love/distributed_systems/README.md

273 lines
17 KiB
Markdown
Raw Normal View History

# Distributed Systems
* General Papers
* Topics
* [Datastores](#datastores)
* [Physics](#physics)
2016-05-25 16:10:06 +00:00
* [Testing, Verification, and Correctness](#testing-verification-and-correctness)
## External Papers
2023-02-07 00:46:27 +00:00
* [:scroll:](a-note-on-distributed-computing.pdf) [A Note on Distributed Computing](https://www.researchgate.net/profile/Ellen-Isaacs/publication/220168963_Why_do_users_like_video/links/02e7e5186b67219c70000000/Why-do-users-like-video.pdf#page=89)
* [A simple totally ordered broadcast protocol](http://diyhpl.us/~bryan/papers2/distributed/distributed-systems/zab.totally-ordered-broadcast-protocol.2008.pdf)
2014-03-08 09:16:05 +00:00
* [Above the Clouds: A Berkeley View of Cloud Computing](http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf)
2014-03-14 02:39:39 +00:00
* [Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications](http://pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf)
2014-03-08 09:16:05 +00:00
2017-09-05 13:11:17 +00:00
* [Kafka: a Distributed Messaging System for Log Processing](http://notes.stephenholiday.com/Kafka.pdf)
2014-03-08 09:16:05 +00:00
2015-04-17 16:44:32 +00:00
* [Large-scale cluster management at Google with Borg](http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/43438.pdf)
2014-03-14 02:39:39 +00:00
* [Linearizability: A Correctness Condition for Concurrent Objects](http://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf)
2014-03-08 09:16:05 +00:00
* [Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial](https://www.cs.cornell.edu/fbs/publications/SMSurvey.pdf)
2014-03-08 09:16:05 +00:00
* [Hoard: A Scalable Memory Allocator for Multithreaded Applications](http://people.cs.umass.edu/~emery/pubs/berger-asplos2000.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [MillWheel: Fault-Tolerant Stream Processing at Internet Scale](http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/41378.pdf)
2014-03-08 09:16:05 +00:00
2014-07-18 04:03:12 +00:00
* [Omega: flexible, scalable schedulers for large compute clusters](http://research.google.com/pubs/archive/41684.pdf)
* [Orleans: Distributed Virtual Actors for Programmability and Scalability](http://research.microsoft.com/apps/pubs/default.aspx?id=210931)
2014-08-06 13:46:32 +00:00
2014-03-14 02:39:39 +00:00
* [Paxos Made Live - An Engineering Perspective](http://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/papers/paper2-1.pdf)
2014-03-08 09:16:05 +00:00
* [Practical Byzantine Fault Tolerance and Proactive Recovery](http://www.microsoft.com/research/wp-content/uploads/2017/01/p398-castro-bft-tocs.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [Pregel: A System for Large-Scale Graph Processing](http://kowshik.github.io/JPregel/pregel_paper.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [Replication, History, and Grafting in the Ori File System](http://sigops.org/sosp/sosp13/papers/p151-mashtizadeh.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [Resilient Overlay Networks](http://nms.lcs.mit.edu/papers/ron-sosp2001.pdf)
2014-03-08 09:16:05 +00:00
2015-07-19 10:36:42 +00:00
* [Sinfonia: A New Paradigm for Building Scalable Distributed Systems](http://www.mshah.org/papers/sosp_2007_aguilera.pdf)
2014-03-08 09:16:05 +00:00
* [Sparrow: Distributed, Low Latency Scheduling](http://people.csail.mit.edu/matei/papers/2013/sosp_sparrow.pdf)
2014-03-14 02:39:39 +00:00
* [The Byzantine Generals Problem](http://www.andrew.cmu.edu/course/15-749/READINGS/required/resilience/lamport82.pdf)
2014-03-08 09:16:05 +00:00
* [Hashgraph Consensus: Fair, Fast, Byzantine Fault Tolerance](https://swirlds.com/downloads/SWIRLDS-TR-2016-01.pdf)
* [:scroll:](the-chubby-lock-service-for-loosely-coupled-distributed-systems.pdf) [The Chubby Lock Service for Loosely-Coupled Distributed Systems](http://static.googleusercontent.com/media/research.google.com/en/us/archive/chubby-osdi06.pdf)
2014-07-18 04:03:12 +00:00
* [:scroll:](join-calculus.pdf) [The Join Calculus: a Language for Distributed Mobile Programming](http://research.microsoft.com/en-us/um/people/fournet/papers/join-tutorial.pdf)
2015-02-26 19:37:58 +00:00
2014-03-14 02:39:39 +00:00
* [The Part-Time Parliament](http://research.microsoft.com/en-us/um/people/lamport/pubs/lamport-paxos.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [There Is More Consensus in Egalitarian Parliaments](https://www.cs.cmu.edu/~dga/papers/epaxos-sosp2013.pdf)
2014-03-08 09:16:05 +00:00
* [Transactional Client-Server Cache Consistency: Alternatives and Performance](http://drum.lib.umd.edu/bitstream/handle/1903/751/CS-TR-3511.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [Unicorn: A System for Searching the Social Graph](http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p871-curtiss.pdf)
2014-03-08 09:16:05 +00:00
* [Unikernels: Library Operating Systems for the Cloud](http://unikernel.org/files/2013-asplos-mirage.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [Untraceable Electronic Mail, Return Addresses, and Digital Pseudonyms](http://www.cs.utexas.edu/~shmat/courses/cs395t_fall04/chaum81.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems](http://www.pmg.csail.mit.edu/papers/vr.pdf)
2014-03-08 09:16:05 +00:00
2014-03-14 02:39:39 +00:00
* [VL2: A Scalable and Flexible Data Center Network](http://research.microsoft.com/pubs/80693/vl2-sigcomm09-final.pdf)
## Other Hosted Papers
* :scroll: [A History of the Virtual Synchrony Replication Model](a-history-of-the-virtual-synchrony-replication-model.pdf)
* :scroll: [A Hundred Impossibility Proofs for Distributed Systems](a-hundred-impossibility-proofs-for-distributed-computing.pdf)
* :scroll: [A response to Cheriton and Skeen's Criticism of Causal and Totally Ordered Communication](a-response-to-cheriton-and-skeens-criticism-of-causal-and-totally-ordered-communication.pdf)
* :scroll: [A Universal Modular ACTOR Formalism for Artificial Intelligence](a-universal-modular-actor-formalism-for-artificial-intelligence.pdf)
* :scroll: [A Versatile Scheme for Routing Highly Variable Traffic in Service Overlays and IP Backbones](a-versatile-scheme-for-routing-highly-variable-traffic-in-service-overlays-and-ip.pdf)
* :scroll: [Beehive: O(1) Lookup Performance for Power-Law Query Distributions in Peer-to-Peer Overlays](beehive-lookup-performance-for-power-law-query-distributions-in-peer-to-peer-overlays.pdf)
2017-09-05 19:12:21 +00:00
* :scroll: [Byzantine Chain Replication](byzantine-chain-replication.pdf)
2017-09-05 19:12:21 +00:00
* :scroll: [A Byzantine Fault Tolerant Distributed Commit Protocol](byzantine-fault-tolerant-distributed-commit-protocol.pdf)
* :scroll: [Brewers Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services](brewers-conjecture.pdf)
* :scroll: [Chain Replication for Supporting High Throughput and Availability](chain-replication-for-supporting-high-throughput-and-availability.pdf)
* :scroll: [Commodifying Replicated State Machines with OpenReplica](commodifying-replicated-state-machines-with-openreplica.pdf)
* :scroll: [Consensusin the Presenceof Partial Synchrony](consensus-in-presence-of-partial-synchrony.pdf)
* :scroll: [Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms](consistent-global-states-of-distributed-systems-fundamental-concepts-and-mechanisms.pdf)
* :scroll: [Consistent Hashing and Random Trees:
Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web](consistent-hashing-and-random-trees.pdf)
* :scroll: [Copysets: Reducing the Frequency of Data Loss in Cloud Storage](copysets-reducing-the-frequency-of-data-loss-in-cloud-storage.pdf)
* :scroll: [Dapper, a Large-Scale Distributed Systems Tracing Infrastructure](dapper-a-large-scale-distributed-tracing-infrastructure.pdf)
* :scroll: [Distributed Snapshots: Determining Global States of Distributed Systems](distributed-snapshots-determining-global-states-of-distributed-systems.pdf)
* :scroll: [Eluding Carnivores: File Sharing with Strong Anonymity](eluding-carnivores-file-sharing-with-strong-anonymity.pdf)
* :scroll: [End-to-end arguments in system design](end-to-end-arguments-in-system-design.pdf)
* :scroll: [Epidemic Algorithms for Replicated Database Maintenance](epidemic-algorithms-for-replicated-database-maintenance.pdf)
* :scroll: [Harvest, Yield, and Scalable Tolerant Systems](harvest-yield-and-scalable-tolerant-systems.pdf)
* :scroll: [Herbivore: A Scalable and Efficient Protocol for Anonymous Communication](herbivore-a-scalable-and-efficient-protocol-for-anonymous.pdf)
* :scroll: [High-Level Specifications: Lessons from Industry](high-level-specifications--lessons-from-industry.pdf)
* :scroll: [How the Hidden Hand Shapes the Market for Software Reliability](how-the-hidden-hand-shapes-the-market-for-software-reliability.pdf)
* :scroll: [Implementing the Omega failure detector in the crash-recovery failure model](implementing-the-omega-failure-detector-in-crash-recovery-failure-model.pdf)
2022-03-29 03:56:00 +00:00
* :scroll: [Impossibility of Distributed Consensus with One Faulty Process](impossibility-of-consensus-with-one-faulty-process.pdf)
* :scroll: [In Search of an Understandable Consensus Algorithm](in-search-of-an-understandable-consensus-algorithm.pdf)
* :scroll: [Kelips*: Building an Efficient and Stable P2P DHT Through Increased Memory and Background Overhead](kelips-building-an-efficient-and-stable-p2p-dht-through-increased-memory-and-background-overhead.pdf)
* :scroll: [Large-scale Incremental Processing Using Distributed Transactions and Notifications](large-scale-incremental-processing-using-distributed-transactions-and-notifications.pdf)
2017-09-05 19:12:21 +00:00
* :scroll: [Life beyond Distributed Transactions: an Apostates Opinion](life-beyond-distributed-transactions-an-apostates-opinion.pdf)
* :scroll: [MapReduce: Simplified Data Processing on Large Clusters](mapreduce-simplified-data-processing-on-large-clusters.pdf)
* :scroll: [Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center](mesos-a-platform-for-fine-grained-resource-sharing-in-the-data-center.pdf)
* :scroll: [Oblivious routing of highly variable traffic in service overlays and IP backbones](oblivious-routing-of-highly-variable-traffic-in-service-overlays-and-ip-backbones.pdf)
* :scroll: [On proof and progress in mathematics](on-proof-and-progress-in-mathematics.pdf)
* :scroll: [P5: A Protocol for Scalable Anonymous Communication](p5-a-protocal-for-scalable-anonymous-communication.pdf)
* :scroll: [Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems](pastry-scalable-decentralized-object-location-and-routing-for-large-scale-peer-to-peer-systems.pdf)
* :scroll: [Paxos Made Moderately Complex](paxos-made-moderately-complex.pdf)
* :scroll: [Paxos Made Simple](paxos-made-simple.pdf)
* :scroll: [Self-stabilizing Systems in Spite of Distributed Control](self-stabilizing-systems-in-spite-of-distributed-control.pdf)
* :scroll: [SIFT: Design and Analysis of a Fault-Tolerant Computer for Aircraft Control](sift-design-and-analysis-of-a-fault-tolerant-computer-for-aircraft-contro.pdf)
* :scroll: [Signal/Collect: Graph Algorithms for the (Semantic) Web](signal-%26-collect-graph-algorithms-for-the-\(semantic\)-web.pdf)
2022-03-29 03:56:00 +00:00
* :scroll: [Solution of a Problem in
Concurrent Programming Control](solution-of-a-problem-in-concurrent-programming-control.pdf)
* :scroll: [Sparse Partitions](sparse-partitions.pdf)
* :scroll: [Stronger Semantics for Low-Latency Geo-Replicated Storage](stronger-semantics-for-low-latency-geo-replicated-storage.pdf)
* :scroll: [The Akamai Network: A Platform for High-Performance Internet Applications](the-akamai-network.pdf)
2022-03-29 03:56:00 +00:00
* :scroll: [The Dining Cryptographers Problem:
Unconditional Sender and Recipient Untraceability](the-dining-cryptographers-problem.pdf)
* :scroll: [Tor: The Second-Generation Onion Router](tor-the-second-generation-onion-router.pdf)
* :scroll: [Towards a cloud computing research agenda](towards-a-cloud-computing-research-agenda.pdf)
* :scroll: [Understanding the Limitations of Causally and Totally Ordered Communication](understanding-the-limitations-of-causally-and-totally-ordered-communication.pdf)
* :scroll: [Viewing Control Structures as Patterns of Passing Messages](viewing-control-structures-as-patterns-of-passing-messages.pdf)
* :scroll: [Warp: Multi-Key Transactions for Key-Value Stores](../datastores/warp-multi-key-transactions-for-key-value-stores.pdf)
* :scroll: [Zab: High-performance broadcast for primary-backup systems](zab-high-performance-broadcast-for-primary-backup-systems.pdf)
* :scroll: [ZooKeeper: Wait-free coordination for Internet-scale systems](zookeeper-wait-free-coordination-for-internet-scale-systems.pdf)
2016-06-13 16:06:49 +00:00
* :scroll: [Tiered Replication: A Cost-effective Alternative to
Full Cluster Geo-replication](tiered-replication-a-cost-effective-alternative-to-full-cluster-geo-replication.pdf)
## Topics
### Datastores
* [Calvin: Fast Distributed Transactions for Partitioned Database Systems](http://cs.yale.edu/homes/thomson/publications/calvin-sigmod12.pdf)
2021-12-31 19:05:11 +00:00
* [f4: Facebooks Warm BLOB Storage System](http://www.usenix.org/system/files/conference/osdi14/osdi14-paper-muralidhar.pdf)
* [The Case for Determinism in Database Systems](http://cs-www.cs.yale.edu/homes/dna/papers/determinism-vldb10.pdf)
* [Consistency Tradeoffs in Modern Distributed Database System Design](http://cs-www.cs.yale.edu/homes/dna/papers/abadi-pacelc.pdf)
* [Modularity and Scalability in Calvin](http://sites.computer.org/debull/A13june/calvin1.pdf)
* [Lightweight Locking for Main Memory Database Systems](http://cs-www.cs.yale.edu/homes/dna/papers/vll-vldb13.pdf)
* [Cassandra - A Decentralized Structured Storage System](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.161.6751&rep=rep1&type=pdf)
* [CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data](http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf)
* [Dont Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS](http://www.cs.cmu.edu/~dga/papers/cops-sosp2011.pdf)
* [Dremel: Interactive Analysis of Web-Scale Datasets](http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/36632.pdf)
* [F1: A Distributed SQL Database That Scales](http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/41344.pdf)
* [HaLoop: Efficient Iterative Data Processing on Large Clusters](http://homes.cs.washington.edu/~billhowe/pubs/HaLoop.pdf)
* [HyperDex: A Distributed, Searchable Key-Value Store](https://cs.uwaterloo.ca/~bernard/hyperdex.pdf)
* [Introduction to a System for Distributed Databases SDD-1](http://people.eecs.berkeley.edu/~wong/wong_pubs/wong73.pdf)
* [Making Reliable Distributed Systems in the Presence of Software Errors](http://www.erlang.org/download/armstrong_thesis_2003.pdf)
* [Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System](http://www.cs.utexas.edu/~lorenzo/corsi/cs380d/papers/p172-terry.pdf)
* [Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters](http://www.cs.duke.edu/courses/cps399.28/current/papers/sigmod07-YangDasdanEtAl-map_reduce_merge.pdf)
* [MDCC: Multi-Data Center Consistency](https://amplab.cs.berkeley.edu/wp-content/uploads/2013/03/mdcc-eurosys13.pdf)
* [Optimistic replication](http://pages.cs.wisc.edu/~remzi/Classes/739/Spring2004/Papers/optimistic-survey.pdf)
* [The Dangers of Replication and a Solution](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.21.2707&rep=rep1&type=pdf)
* [Towards a Next Generation Data Center Architecture: Scalability and Commoditization](http://research.microsoft.com/pubs/79348/presto27-greenberg.pdf)
* :scroll: [Bigtable: A Distributed Storage System for Structured Data](../datastores/bigtable-a-distributed-storage-system-for-structured-data.pdf)
* :scroll: [Database Metatheory: Asking Big Queries](../datastores/database-metatheory--asking-the-big-queries.pdf)
* :scroll: [Dynamo: Amazons Highly Available Key-value Store](../datastores/dynamo-amazons-highly-available-key-value-store.pdf)
* :scroll: [Flat Datacenter Storage](../datastores/flat-datacenter-storage.pdf)
* :scroll: [Freenet: A Distributed Anonymous Information Storage and Retrieval System](../datastores/freenet-a-distributed-anonymous-information-and-retrieval-system.pdf)
* :scroll: [Megastore: Providing Scalable, Highly Available Storage for Interactive Services](../datastores/megastore-providing-scalable-highly-available-storage-for-interactive-services.pdf)
* :scroll: [A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster](../datastores/network-challenges-of-data-recovery-in-erasure-coded-distributed-storage-systems.pdf)
* :scroll: [RADOS: A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters](../datastores/rados-a-scalable-reliable-storage-service-for-petabyte-scale-storage-clusters.pdf)
* :scroll: [Spanner: Googles Globally-Distributed Database](../datastores/spanner-google's-globally-distributed-database.pdf)
* :scroll: [TAO: Facebooks Distributed Data Store for the Social Graph'](../datastores/tao-facebook-distributed-datastore.pdf)
* :scroll: [Transactional storage for geo-replicated systems](../datastores/transactional-storage-for-geo-replicated-systems.pdf)
* :scroll: [Warp: Multi-Key Transactions for Key-Value Stores](../datastores/warp-multi-key-transactions-for-key-value-stores.pdf)
* :scroll: [Spartan: A distributed array framework with smart tiling](../datastores/spartan-a-distributed-array-framework-with-smart-tiling.pdf)
### Physics
* :scroll: [“On the Electrodynamics of Moving Bodies” (1905) — Einstein](../physics/on-the-electrodynamics-of-moving-bodies.pdf)
By solving the [asymmetries](http://en.wikipedia.org/wiki/Moving_magnet_and_conductor_problem) that arise in Maxwells equations, Einsteins 1905 paper set the stage for current distributed systems work by demonstrating that there is no absolute frame of reference and by providing an upper bound on the speed of communication.
2017-09-05 19:12:21 +00:00
2016-05-25 16:10:06 +00:00
### <a name="testing-verification-and-correctness"></a>Testing, Verification, and Correctness
* :scroll: [Simple Testing Can Prevent Most Critical Failures:
An Analysis of Production Failures in Distributed
Data-Intensive Systems](https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-yuan.pdf)
* :scroll: [IronFleet: Proving Practical Distributed Systems Correct](https://www.microsoft.com/en-us/research/wp-content/uploads/2015/10/ironfleet.pdf))