Questions tagged [compression]
The compression tag has no summary.
68 questions
5
votes
5
answers
771
views
How does data store compression speed up data warehouses?
I often see the claim that various data warehouse/analytical database systems derive significant performance benefits from compressing their data stores. On the face of it, though, this seems to be ...
4
votes
2
answers
754
views
How should I handle different hashes of identical files in .zip archive with different 'last changed' date?
We store zipped files in the storage of a cloud provider which contain certain fields (metadata). These files are derived from other, larger files. Every time we (re)generate these files, their 'last ...
0
votes
5
answers
293
views
Load and process (compressed) data from filesystem in the blink of an eye
We have a huge amount of queries hitting our API that request a minor or major extract of some huge files lying around on our mounted hard drives. The data needs to be extracted from the files and ...
30
votes
7
answers
17k
views
How can lossless compression ever exist?
If all data is essentially just a bit string, then all data can be represented as a number. Because a compression algorithm, c(x), must reduce or keep the same length of the input, then the compressed ...
1
vote
1
answer
2k
views
How to remove unused code from a jar file? [closed]
I have a jar file, for example foo.jar. My code contains a lot of libraries (almost 75 jar dependencies). I am not using anything like maven or gradle, I'm just using pure java with pure jar files as ...
2
votes
1
answer
1k
views
Are Flate compression in PDF and Deflate different algorithms?
I'm trying to make a program that produces pdf files. I've been studying the pdf format specification and specific pdf files whose format I'm trying to mimic. I found the line /FlateDecode in these ...
0
votes
1
answer
388
views
Short and compact barcode
I am writing a c# program where I need to print a lot of small barcodes in a 100x100 grid on a piece of paper. I then scan/photograph the paper and read the barcodes again. Each barcode only need to ...
4
votes
3
answers
283
views
Is it possible to store N bits of unique combinations, in N-1 bits? If not; why does MD5 get reprimanded for collissions?
Regarding cryptography and the issue of collisions, I posed a question as to whether it was ever possible to store every single possible combination of a bit array of a particular size, in a bit array ...
7
votes
2
answers
645
views
some misunderstanding in concept of Huffman algorithm
What is difference between Average length of codes and Average length of codewords in Huffman Algorithm? is both the same meaning? I get stuck in some facts:
I see a fact that marked as False:
for a ...
0
votes
1
answer
90
views
Design Question: Compression with Fast Lookup
I have a multiple files (one per CountryCode) which all get ~5000 entries added to it per day.
Each entry in the file looks like (256chars max):
{countryCode_customerId:{"ownerId": "...
7
votes
2
answers
887
views
How does conditional compilation impact product quality, security and code complexity? [closed]
Software libraries targetting resource constrained environments like embedded systems use conditional compilation to allow consumers to shave space by removing unused features from the final binaries ...
-2
votes
2
answers
590
views
Alternative to RLE for short, infrequent runs
I have 3 number arrays that I need to encode into a URL through query parameters. Example:
http://localhost:3000/?r=133223333302302040&y=10000000000000000000&b=13333332002100122331
This is a ...
2
votes
2
answers
571
views
Compressing EBCDIC file vs UTF8
Today I went across a weird case for which I have no explanation, so here I am.
I have two files with identical content, but one is encoded in UTF-8 and the other one is in IBM EBCDIC. Both of them ...
11
votes
4
answers
2k
views
How to review sql changes more effectively? [duplicate]
From my experience, sql code changes almost always tend to be NOT incremental: someone creates a new stored procedure, or modifies an entire embedded sql query for optimization purposes, or creates a ...
0
votes
2
answers
139
views
Is it possible to transfer data with a really unique seed of a psudo random number generator
I have thinking about this idea for over 5 years and i don't have the complete technical knowledge to fully grasp the idea I'm having.
The premise of the idea is to have an extremely high base number ...
6
votes
1
answer
9k
views
How is it possible to GZIP a stream before the entire contents are known?
Two things you can do in Java:
Send a gzipped JSON body in response to an HTTP request
Send a StreamingOutput response to an HTTP request, where you begin sending a response before you know the ...
30
votes
5
answers
8k
views
What is the most efficient way to store a numeric range?
This question is about how many bits are required to store a range. Or put another way, for a given number of bits, what is the maximum range that can be stored and how?
Imagine we want to store a ...
6
votes
1
answer
321
views
How can I reduce the amount of storage needed for a gravitational n-body simulation?
I am currently attempting to create a gravitational n-body simulation using a modified Barnes-Hut algorithm, to be more amicable to GPU computation. This is primarily as a learning project. My goal is ...
3
votes
2
answers
135
views
Non-precise Input/Using Probability in File Compression
I'm a high school student interested in topics of computer programming.
Recently I became interested in file compression, and in my head I tried to combine this with a completely different part of ...
0
votes
1
answer
547
views
A collision-free hash-like function for use in hash tables and other data structures?
A short introduction to the problem: I'm working with a small database where I have a table of strings (web URLs, to be precise) as pairs: hash|string. Another table references these strings by hash ...
2
votes
3
answers
2k
views
Algorithm for optimizing text compression
I am looking for text compression algorithms (natural language compression, rather than compression of arbitrary binary data).
I have seen for example An Efficient Compression Code for Text ...
5
votes
2
answers
16k
views
How to deal with large data in Websocket message?
I wrote a websocket server in Spring Boot and a client in Javascript. These work fine. I also wrote a second client in Java. When this one attempts to handle a frame after connecting to the host, I ...
-2
votes
1
answer
197
views
compression techniques for true random permutation of given integer N
Is it possible to compress true random permutation using low order polynomial interpolation? If yes, how it can be achieved?
7
votes
4
answers
9k
views
compression algorithm for non-repeating integers
I have an array of unique integers, for example: {1,3,,7,9,31,46,...}, which I want to compress. I have found compression techniques and algorithms for the list of integers, where some integers are ...
0
votes
1
answer
492
views
Find Randomized Sequence Seed To Compress files statistically
I was wondering if what I have in mind already exists in any known compression programs/algorithms or not. We know that Seed gives us constant sequence of random numbers. so if we be able to find seed ...
2
votes
1
answer
398
views
Why does WinRAR not compress picture duplicates?
I made a test today, how good WinRAR can compress a folder with several times the same picture in it. For that I just put one picture with 300 kB into a folder and copied it there 11 times, so that I ...
4
votes
1
answer
371
views
Advantages of application-level data compression?
This question was inspired by MessagePack, but I'm looking for a general answer about the advantages of in-app vs. external compression.
For network I/O, doesn't the transport protocol (at least ...
36
votes
5
answers
25k
views
Does storing plain text data take up less space than storing the equivalent message in binary?
As a web developer I have very little understanding of binary data.
If I take the sentence "Hello world.", convert it to binary, and store it as binary in an SQL database, it seems like the 1s and ...
2
votes
1
answer
105
views
I need to find a set of hierarchical symbols that can represent input binary data in near optimal space. What algorithms can I look into? [closed]
I have a stream of binary data. Assume no prior knowledge about the expected pattern in input data.
The symbols can represent binary data or other symbols, hence hierarchical.
The output should ...
2
votes
0
answers
107
views
Direction-free optimal encoding
Consider an alphabet of k symbols and a requirement to optimally encode a series of values of known frequency. The obvious choice for this is to use Huffman coding, which is known to be optimal for ...
4
votes
5
answers
639
views
Is saving disk space a valid reason to forgo migrating to a standard text format (e.g. JSON)?
A while ago I asked a question about custom text data formats, instead of using existing tools such as XML, JSON, YAML, etc. Now, in favor of converting our custom format to a relational database and ...
-1
votes
2
answers
9k
views
Fast and simple hex compression
I'm working on a project that requires a TCP connection between a client and server. The current protocol encodes the data into hex and then sends it. However, hex increases the length of the payload ...
3
votes
1
answer
521
views
PDF content : text or graphics?
Is there a possible test to check if a PDF file contains text or it is created by scanning paper sheets ?
text : plain text that, for example, I can copy & paste while I am reading the PDF. Not ...
14
votes
2
answers
4k
views
Fast, lossless compression of a video stream
I have a video coming from a stationary camera. Both the resolution and the FPS are quite high. The data I get is in Bayer format and uses 10 bit per pixel. As there's no 10 bit data type on my ...
5
votes
2
answers
585
views
finding optimal token definitions for compression
I have a collection of strings which have a lot of common substrings,
and I'm trying to find a good way to define tokens to compress them.
For instance, if my strings are:
s1 = "String"
s2 = "Bool"
...
2
votes
1
answer
94
views
Efficient saving of long strings with recurring substrings
following problem: I need to save a lot of xml strings of variable length and structure. As it is with xml, a lot of substrings are the same (some elements, attribute and value combination). Often the ...
9
votes
2
answers
1k
views
Best compression algorithm for timelapse photos
I have a folder containing about 9,000 JPEG photos (about 30Gb), which I want to archive with some sort of compression. I understand that compressing JPEGs is not normally very effective, but these ...
4
votes
2
answers
986
views
Why use last column of Burrows-Wheeler-Transform
The Burrows-Wheeler-Transform takes a string with length n, creates a matrix with n rows by shifting this string one position to the left for each row. Then the rows are sorted by the first column in ...
1
vote
1
answer
1k
views
Best two-way compression algorithm for 32-bit numbers
I need to compress an id for marketing campaigns. The current campaign id is 32-bit integer but obviously this is too long for a customer to type by hand. I would like to compress this to minimum ...
1
vote
0
answers
326
views
What is this compression algorithm? (Similar to RLE)
In Run Length Encoding (RLE), a large set of information is encoded by storing the quantity of consecutive sequences. A canonical example is:
...
3
votes
2
answers
277
views
Sparse set lossy compression algorithm
I am looking for algorithm or idea for the following problem.
Suppose we have a data type, say 64-bit integer. Now we have a relatively small set of such items, say few hundred at most. The simplest ...
4
votes
1
answer
447
views
Minimizing compression overhead in a simple compression algorithm
Note: this question has been re-written to simplify and generalize the problem. The original is available below.
Suppose I created a simple compression scheme for lists of 2-digit numbers. It has 2 ...
1
vote
0
answers
2k
views
Calculating uncompressed file size without uncompressing file in zlib
I am writing a python program which parses zip (currently only zlib, using DEFLATE compression) files and verifies the correctness of their headers and data. One of the things I'm trying to achieve is ...
2
votes
2
answers
4k
views
Turn on gzip on nginx, upstream (nodejs) or both? [closed]
I have an application running behind a proxy, both on the same machine. Which approach is more suited regarding compression, while preserving reasonable performance.
turn on compression at the ...
2
votes
2
answers
2k
views
What should I do when using Golomb/Rice code for large values?
When using Golomb/Rice code in image compression, it is inevitable for us to meet large values. Golomb coding uses a tunable parameter M to divide an input value N into two parts : q, the result of a ...
1
vote
4
answers
228
views
Why is there little use of filesharing as compression (outside of libraries)?
Recently I was looking for a program that will run as a daemon and find files that have the same size/type, check if they're the same, then make both a hard link to a single copy if they are. And I ...
2
votes
2
answers
2k
views
How does Yahoo's Smush.It work and why doesn't everyone use it?
I've recently come across an application by Yahoo called SmushIt. Apparently it does lossless compression on images. Sometimes the image size is reduced by as much as 90%. This of course has major ...
5
votes
2
answers
11k
views
How would you go about compressing a list of integers that are non unique and retain the original order?
Let's start out with an example
[1,1,1,5,3,1,1,2,78,2,3,1,1,...,1]
As you can tell by the example, 1 is repeated a lot, but there will be outliers (like 78, and really anything that isn't 1).
The ...
1
vote
2
answers
2k
views
Is hash calculated before/after compression?
I had a question regarding compression and calculation of checksum/hash of data.
I would like to know if checksum has to be calculated before or after the compression of data before transmission. ...
-1
votes
2
answers
410
views
Estimation of space is required to store 275305224 of 5x5 MagicSquares? [closed]
Here are some examples of 5x5 Magic Squares found by some good solvers :
Magic Square Generator by Marcel Roos
this program state using 2.4GHz Intel takes about 95 hours to generate all solutions.
...