papers-we-love_papers-we-love/data_compression/README.md

24 lines
1.9 KiB
Markdown
Raw Normal View History

# Data Compression
* :scroll: [Data Compression](data-compression.pdf)
Math papers from original `isomorphisms` PR (#587) * Add gitter for community. * Update CODE_OF_CONDUCT.md * Add statecharts paper in a new systems modeling category (#565) * Rename "paradigm" and "plt" folders for findability (#561) * rename "language-paradigm" folder for findability lang para pluralize * rename PLT => languages-theory * fixed formatting * group pattern-* related papers (#564) * combine clustering algo into pattern matching * rename stringology with the pattern_ prefix * improved the README header info for paper related to patterns * consolidate org-sim and sw-eng dirs (#567) * consolidate org-sim and sw-eng dirs * typo and links * Fixed link (#568) * Update README.md * Fixed A Unified Theory of Garbage Collection link * Verification faults dirs (#566) * consolidate program verificaiton and program fault detection listings. * faults and validation gets header info * self-similarity by Tom Leinster Again on the topic of renormalisation. Dr Leinster has a nice, simple picture of self-similarity. * added new papers in Machine Learning dir. fixed-up references Truncation of Wavelet Matrices Understanding Deep Convolutional Networks General self-similarity: an overview cleanup url files (wrong repo format) * what has sphere packing to do with compression? • role of E8 & Leech lattice in optimal codes • mathematically best compression was never used • ikosahedron * surfaces ∑ I show this paper to college freshmen because • it’s pictorial • it’s about an object you mightn’t have considered mathematical • no calculus, crypto, ML, or pretentious notation • it’s short • it’s a classification proof: “How can it be that you know something about _all possible_ X, even the xϵX you haven’t seen yet?’ * good combinatorics Programmers are used to counting boring things. Why not count something more interesting for a change? * added comentaries from commit messages. more consistent formatting. * graphs Programmers work with graphs often (file system, greplin, trees, "graph isomorphism problem" (who cares) ). But have you ever tried to construct a simpler building-block (basis) with which graphs could be built? Or at least a different building block to build the same old things. This <10-page paper also uses 𝔰𝔩₂(ℂ), a simple mathematical object you haven’t heard of, but which is a nice lead-in to an area of real mathematics—rep theory—that (1) contains actual insights (1a) that you aren’t using (2) is simple (3) isn’t pretentious. * from dominoes to hexagons why is this super-smart guy interested in such simple drawings? * sorting You do sorting all the time. Are there smart ways to organise sub-sorts? * distributed robots!! Robots! And varying your dimensionality across a space. But also — distributed robots! * knitting Get into knitting. Learn a data structure that needs to be embedded in 3D to do its thing. Break your mind a bit. * female genius * On “On Invariants of Manifolds” 2 pages about how notation and algorithms are inferior to clarity and simplicity. * pretty robots You’ll understand calculus better after looking at these pretty 75 pages. * Farey Have another look at ye olde Int class. * renormalisation Stéphane Mallat thinks renormalisation has something to do with why deep nets work. * the torus trick, applied In Simons Foundation’s interview by Michael Hartley Freedman of Robion Kirby, Freedman mentions this paper in which MHF applied RK’s “torus trick” to compression via wavelets. * renormalisation Here is a video of a master (https://press.princeton.edu/titles/5669.html) talking about renormalisation. Which S Mallat has suggested is key to why deep learning works. * Cartan triality + Milnor fibre This is a higher-level paper, but still a survey (so more readable). It ties together disparate areas like Platonic solids (A-D-E), Milnor’s exceptional fibre, and algebra. It has pictures and you’ll get a better sense of what mathematics is like from skimming it. * Create see.machine.learning * tropical geometry Recently there have been some papers posted about tropical geometry of neural nets. Tropical is also said to be derived from CS. This is a good introduction. * self-similarity by Tom Leinster Again on the topic of renormalisation. Dr Leinster has a nice, simple picture of self-similarity. * rename papers accordingly, and add descriptive info remove dup maths papers * fixed crappy explanations * improved the annotations for papers in the Machine Learning readme * remediated descriptive wording for papers in the mathematics section * removed local copy and added link to Conway Zip Proof * removed local copy and added link to Packing of Spheres - Sloane * removed local copy and added link to Algebraic Topo - Hatcher * removed local copy and added link to Topo of Numbers - Hatcher * removed local copy and added link to Young Tableax - Yong * removed local copy and added link to Elements of A Topo * removed local copy and added link to Truncation of Wavlet Matrices Co-authored-by: Zeeshan Lakhani <202820+zeeshanlakhani@users.noreply.github.com> Co-authored-by: Wiktor Czajkowski <wiktor.czajkowski@gmail.com> Co-authored-by: keddad <keddad@yandex.ru> Co-authored-by: i <isomorphisms@sdf.org>
2019-12-26 04:36:58 +00:00
> This paper surveys a variety of data compression methods spanning almost 40 years of research, from the work of Shannon, Fano and Huffman in the 40's, to a technique developed in 1986.
## Scientific Data Compression
* :scroll: [Fast Error-bounded Lossy HPC Data Compression with SZ](fast_error_bounded_Lossy_hpc_data_compression_with_sz.pdf)
> This is the first version of SZ. In this paper, SZ is introduced to achieve data reduction using regression-based data point prediction.
* :scroll: [Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization](Significantly_Improving_Lossy_Compression_for_Scientific_Data_Sets_Based_on_Multidimensional_Prediction_and_Error-Controlled_Quantization.pdf)
> This work is known as SZ-1.4. In this work, SZ employs multi-dimensional data prediction so that data with dimension larger than 1 is no longer linearized into single dimension before compression. In this way, more data locality is preserved thus compression ratio is improved.
* :scroll: [Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets](Error-Controlled_Lossy_Compression_Optimized_for_High_Compression_Ratios_of_Scientific_Datasets.pdf)
> This work is known as SZ-2.0. In this work, authors proposed an online selection tool between 2 predictors, the mean-integrated Lorenzo predictor and linear regression-based predictor. Users can choose the predictor that yields larger compression ratio with higher prediction accuracy.
* :scroll: [Fixed-Rate Compressed Floating-Point Arrays](fixed-rate_compressed_floating_point_arrays.pdf)
* :scroll: [FPC: A High-Speed Compressor for Double-Precision Floating-Point Data](fpc_a_high_speed_compressor_for_double_precision_floating_point_data.pdf)