We Believe in Equal Rating Mozilla seeks to make the full range of the Internet's extraordinary power and innovative potential available to all. We advocate...
Some speakers love Q & A. Others dread it. No matter which group you are in, this session will share tips for how to plan...
We Believe in Equal Rating Mozilla seeks to make the full range of the Internet's extraordinary power and innovative potential available to all. We advocate...
This is a weekly call with some of the Reps to discuss all matters about/affecting Reps and invite Reps to share their work with everyone.
We Believe in Equal Rating Mozilla seeks to make the full range of the Internet's extraordinary power and innovative potential available to all. We advocate...
“Are you passionate about the Open Web? Do you want to help protect the Internet’s tremendous socioeconomic benefits through policy and advocacy?”
Mozilla is looking for an Internet Policy Manager, to work with the wonderful Raegan in the EU. If you know anyone who might be suitable, please encourage them to apply :-)
CC BY-SA 3.0 Nick Youngson
TL;DR We need you! There are 3 ways to give feedback on the draft of Mozilla’s Community Participation Guidelines. The draft is available in both English and Spanish (and yes, more languages will be included in future!).
1. By joining our #Mozillians Telegram Group or on Twitter tomorrow, Thursday March 9th at 7 AM PST (your time).
2. Directly on Discourse – English or Spanish.
3. Via the Activate Campaign (which comes with recognition of completion) – English or Spanish
D&I lead at Mozilla Larissa Shapiro, and her team have been working very hard on a 2.0 version of our Community Participation Guidelines, you can find the current version online here.
Why are we revisiting and improving our guidelines.
Diversity is the mix of people, and Inclusion is getting the mix to work well together.
The Community Participation Guidelines draft is intended to support a diverse and inclusive Mozilla by laying out expected behavior, consequences and reporting. Please help us make this resources one that enriches our experiences – making Mozilla an empowered, safe and rewarding place to be.
There are 3 ways to give feedback (and hear from others)
- By joining our #Mozillians Telegram Group or on Twitter tomorrow, Thursday March 9th at 7 AM PST (your time).
- Directly on Discourse – English or Spanish.
- Via the Activate Campaign (which comes with recognition of completion) – English or Spanish
After writing my post "Why I'm Frequently Absent from Open Source", the Changelog podcast invited me to come talk more about it. It was great to dive into open source and talk about some problems it brings. If you are a maintainer of a project, I think you'll connect with a lot of what is said here.
We talk about the burden that is people on individuals that do open source in their spare time, the guilt that it brings, and try to find some answers.
Bleach is a Python library for sanitizing and linkifying text from untrusted sources for safe usage in HTML.Bleach v2.0 released!
Bleach 2.0 is a massive rewrite. Bleach relies on the html5lib library. html5lib 0.99999999 (8 9s) changed the APIs that Bleach was using to sanitize text. As such, in order to support html5lib >= 0.99999999 (8 9s), I needed to rewrite Bleach.
Before embarking on the rewrite, I improved the tests and added a set of tests based on XSS example strings from the OWASP site. Spending quality time with tests before a rewrite or refactor is both illuminating (you get a better understanding of what the requirements are) and also immensely helpful (you know when your rewrite/refactor differs from the original). That was time well spent.
Given that I was doing a rewrite anyways, I decided to take this opportunity to break the Bleach API to make it more flexible and easier to use:
- added Cleaner and Linkifier classes that you can create once and reuse to reduce redundant work--suggested in #125
- created BleachSanitizerFilter which is now an html5lib filter that can be used anywhere you can use an html5lib filter
- created LinkifyFilter as an html5lib filter that can be used anywhere you use an html5lib filter including as part of cleaning allowing you to clean and linkify in one pass--suggested in #46
- changed arguments for attribute callables and linkify callbacks
- and so on
During and after the rewrite, I improved the documentation converting all the examples to doctest format so they're testable and verifiable and adding examples where there weren't any. This uncovered bugs in the documentation and pointed out some annoyances with the new API.
As I rewrote and refactored code, I focused on making the code simpler and easier to maintain going forward and also documented the intentions so I and others can know what the code should be doing.
I also adjusted the internals to make it easier for users to extend, subclass, swap out and whatever else to adjust the functionality to meet their needs without making Bleach harder to maintain for me or less safe because of additional complexity.
For API-adjustment inspiration, I went through the Bleach issue tracker and tried to address every possible issue with this update: infinite loops, unintended behavior, inflexible APIs, suggested refactorings, features, bugs, etc.
The rewrite took a while. I tried to be meticulous because this is a security library and it's a complicated problem domain and I was working on my own during slow times on work projects. When working on one's own, you don't have benefit of review. Making sure to have good test coverage and waiting a day to self-review after posting a PR caught a lot of issues. I also go through the PR and add comments explaining why I did things to give context to future me. Those habits help a lot, but probably aren't as good as a code review by someone else.Some stats
OMG! This blog post is so boring! Walls of text everywhere so far!
There were 61 commits between v1.5 and v2.0:
- Vadim Kotov: 1
- Alexandr N. Zamaraev: 2
- me: 58
I closed out 22 issues--possibly some more.
The rewrite has the following git diff --shortstat:64 files changed, 2330 insertions(+), 1128 deletions(-)
Lines of code for Bleach 1.5:~/mozilla/bleach> cloc bleach/ tests/ 11 text files. 11 unique files. 0 files ignored. http://cloc.sourceforge.net v 1.60 T=0.07 s (152.4 files/s, 25287.2 lines/s) ------------------------------------------------------------------------------- Language files blank comment code ------------------------------------------------------------------------------- Python 11 353 200 1272 ------------------------------------------------------------------------------- SUM: 11 353 200 1272 ------------------------------------------------------------------------------- ~/mozilla/bleach>
Lines of code for Bleach 2.0:~/mozilla/bleach> cloc bleach/ tests/ 49 text files. 49 unique files. 36 files ignored. http://cloc.sourceforge.net v 1.60 T=0.13 s (101.7 files/s, 20128.5 lines/s) ------------------------------------------------------------------------------- Language files blank comment code ------------------------------------------------------------------------------- Python 13 545 406 1621 ------------------------------------------------------------------------------- SUM: 13 545 406 1621 ------------------------------------------------------------------------------- ~/mozilla/bleach> Some off-the-cuff performance benchmarks
I ran some timings between Bleach 1.5 and various uses of Bleach 2.0 on the Standup corpus.
Here's the results:what? time to clean and linkify Bleach 1.5 1m33s Bleach 2.0 (no code changes) 41s Bleach 2.0 (using Cleaner and Linker) 10s Bleach 2.0 (clean and linkify--one pass) 7s
How'd I compute the timings?
- I'm using the Standup corpus which has 42000 status messages in it. Each status message is like a tweet--it's short, has some links, possibly has HTML in it, etc.
- I wrote a timing harness that goes through all those status messages and times how long it takes to clean and linkify the status message content, accumulates those timings and then returns the total time spent cleaning and linking.
- I ran that 10 times and took the median. The timing numbers were remarkably stable and there was only a few seconds difference between the high and low for all of the sets.
- I wrote the median number down in that table above.
- Then I'd adjust the code as specified in the table and run the timings again.
I have several observations/thoughts:
First, holy moly--1m33s to 7s is a HUGE performance improvement.
Second, just switching from Bleach 1.5 to 2.0 and making no code changes (in other words, keeping your calls as bleach.clean and bleach.linkify rather than using Cleaner and Linker and LinkifyFilter), gets you a lot. Depending on whether your have attribute filter callables and linkify callbacks, you may be able to just upgrade the libs and WIN!
Third, switching to reusing Cleaner and Linker also gets you a lot.
Fourth, your mileage may vary depending on the nature of your corpus. For example, Standup status messages are short so if your text fragments are larger, you may see more savings by clean-and-linkify in one pass because HTML parsing takes more time.How to upgrade
Upgrading should be straight-forward.
Here's the minimal upgrade path:
- Update Bleach to 2.0 and html5lib to >= 0.99999999 (8 9s).
- If you're using attribute callables, you'll need to update them.
- If you're using linkify callbacks, you'll need to update them.
- Read through version 2.0 changes for any other backwards-incompatible changes that might affect you.
- Run your tests and see how it goes.
If you're using html5lib 1.0b8, then you have to explicitly upgrade the version. 1.0b8 is equivalent to html5lib 0.9999999 (7 9s) and that's not supported by Bleach 2.0.
You have to explicitly upgrade because pip will think that 1.0b8 comes after 0.99999999 (8 9s) and it doesn't. So it won't upgrade html5lib for you.
If you're doing 9s, make sure to upgrade to 0.99999999 (8 9s) or higher.
If you're doing 1.0bs, make sure to upgrade to 1.0b9 or higher.
If you want better performance:
- Switch to reusing bleach.sanitizer.Cleaner and bleach.linkifier.Linker.
If you have large text fragments:
- Switch to reusing bleach.sanitizer.Cleaner and set filters to include LinkifyFilter which lets you clean and linkify in one step.
Many thanks to James Socol (previous maintainer) for walking me through why things were the way they were.
Many thanks to Geoffrey Sneddon (html5lib maintainer) for answering questions, helping with problems I encountered and all his efforts on html5lib which is a huge library that he works on in his spare time for which he doesn't get anywhere near enough gratitude.
Many thanks to Lonnen (my manager) who heard me talk about html5lib zero point nine nine nine nine nine nine nine nine a bunch.
Also, many thanks to Mozilla for letting me work on this during slow periods of the projects I should be working on. A bunch of Mozilla sites use Bleach, but none of mine do.Where to go for more
For more specifics on this release, see here: https://bleach.readthedocs.io/en/latest/changes.html#version-2-0-march-8th-2017
Documentation and quickstart here: https://bleach.readthedocs.org/en/v2.0/
Source code and issue tracker here: https://github.com/mozilla/bleach
mconley livehacks on real Firefox bugs while thinking aloud.
After some consideration, I’ve decided to discontinue development of Positron.
Positron was an experimental runtime for creating desktop apps using web technologies. It was based on Firefox, and its principal feature was that it was Electron-compatible. I started working on it—in conjunction with several colleagues—to enable Tofino to run on Gecko.
But Tofino is dead (long live the Browser Futures Group!), and Electron compatibility isn’t essential for a viable Gecko runtime. It’s also hard, since Electron has a large API surface area, is a moving target, requires Node.js integration (itself a moving target), and is designed for Chromium’s process architecture, which is substantially different from Firefox’s.
I’ve previously written about the utility of desktop runtimes (among other embedding projects). I still think they’re valuable for a variety of use cases, and Gecko can provide unique value to desktop application development. I’ll continue to pursue the realization of that value. I just won’t do it through Positron.
With the release of Firefox 52 to all users worldwide, we now have the final Windows XP-supported Firefox release out the door.
This isn’t to say that support is done. As I’ve mentioned before, Windows XP users will be transitioned to the ESR update channel where they’ll continue to receive security updates for the next year or so.
And I don’t expect this to be the end of me having to blog about weird clients that are inexplicably on Windows XP.
However, this does take care of one of the longest-standing data questions I’ve looked at on this blog and in my career at Mozilla. So I feel that it’s worth taking a moment to mark the occasion.
Windows XP is dead. Long live Windows XP.
This is the sumo weekly call
I think I first heard about the Zstandard compression algorithm at a Mercurial developer sprint in 2015. At one end of a large table a few people were uttering expletives out of sheer excitement. At developer gatherings, that's the universal signal for something is awesome. Long story short, a Facebook engineer shared a link to the RealTime Data Compression blog operated by Yann Collet (then known as the author of LZ4 - a compression algorithm known for its insane speeds) and people were completely nerding out over the excellent articles and the data within showing the beginnings of a new general purpose lossless compression algorithm named Zstandard. It promised better-than-deflate/zlib compression ratios and performance on both compression and decompression. This being a Mercurial meeting, many of us were intrigued because zlib is used by Mercurial for various functionality (including on-disk storage and compression over the wire protocol) and zlib operations frequently appear as performance hot spots.
Before I continue, if you are interested in low-level performance and software optimization, I highly recommend perusing the RealTime Data Compression blog. There are some absolute nuggets of info in there.
Anyway, over the months, the news about Zstandard (zstd) kept getting better and more promising. As the 1.0 release neared, the Facebook engineers I interact with (Yann Collet - Zstandard's author - is now employed by Facebook) were absolutely ecstatic about Zstandard and its potential. I was toying around with pre-release versions and was absolutely blown away by the performance and features. I believed the hype.
Zstandard 1.0 was released on August 31, 2016. A few days later, I started the python-zstandard project to provide a fully-featured and Pythonic interface to the underlying zstd C API while not sacrificing safety or performance. The ulterior motive was to leverage those bindings in Mercurial so Zstandard could be a first class citizen in Mercurial, possibly replacing zlib as the default compression algorithm for all operations.
Fast forward six months and I've achieved many of those goals. python-zstandard has a nearly complete interface to the zstd C API. It even exposes some primitives not in the C API, such as batch compression operations that leverage multiple threads and use minimal memory allocations to facilitate insanely fast execution. (Expect a dedicated post on python-zstandard from me soon.)
Mercurial 4.1 ships with the python-zstandard bindings. Two Mercurial 4.1 peers talking to each other will exchange Zstandard compressed data instead of zlib. For a Firefox repository clone, transfer size is reduced from ~1184 MB (zlib level 6) to ~1052 MB (zstd level 3) in the default Mercurial configuration while using ~60% of the CPU that zlib required on the compressor end. When cloning from hg.mozilla.org, the pre-generated zstd clone bundle hosted on a CDN using maximum compression is ~707 MB - ~60% the size of zlib! And, work is ongoing for Mercurial to support Zstandard for on-disk storage, which should bring considerable performance wins over zlib for local operations.
I've learned a lot working on python-zstandard and integrating Zstandard into Mercurial. My primary takeaway is Zstandard is awesome.
In this post, I'm going to extol the virtues of Zstandard and provide reasons why I think you should use it.Why Zstandard
The main objective of lossless compression is to spend one resource (CPU) so that you may reduce another (I/O). This trade-off is usually made because data - either at rest in storage or in motion over a network or even through a machine via software and memory - is a limiting factor for performance. So if compression is needed for your use case to mitigate I/O being the limiting resource and you can swap in a different compression algorithm that magically reduces both CPU and I/O requirements, that's pretty exciting. At scale, better and more efficient compression can translate to substantial cost savings in infrastructure. It can also lead to improved application performance, translating to better end-user engagement, sales, productivity, etc. This is why companies like Facebook (Zstandard), Google (brotli, snappy, zopfli), and Pied Piper (middle-out) invest in compression.
Today, the most widely used compression algorithm in the world is likely DEFLATE. And, software most often interacts with DEFLATE via what is likely the most widely used software library in the world, zlib.
Being at least 27 years old, DEFLATE is getting a bit long in the tooth. Computers are completely different today than they were in 1990. The Pentium microprocessor debuted in 1993. If memory serves (pun intended), it used PC66 DRAM, which had a transfer rate of 533 MB/s. For comparison, a modern NVMe M.2 SSD (like the Samsung 960 PRO) can read at 3000+ MB/s and write at 2000+ MB/s. In other words, persistent storage today is faster than the RAM from the era when DEFLATE was invented. And of course CPU and network speeds have increased as well. We also have completely different instruction sets on CPUs for well-designed algorithms and software to take advantage of. What I'm trying to say is the market is ripe for DEFLATE and zlib to be dethroned by algorithms and software that take into account the realities of modern computers.
(For the remainder of this post I'll use zlib as a stand-in for DEFLATE because it is simpler.)
Zstandard initially piqued my attention by promising better-than-zlib compression and performance in both the compression and decompression directions. That's impressive. But it isn't unique. Brotli achieves the same, for example. But what kept my attention was Zstandard's rich feature set, tuning abilities, and therefore versatility.
In the sections below, I'll describe some of the benefits of Zstandard in more detail.
Before I do, I need to throw in an obligatory disclaimer about data and numbers that I use. Benchmarking is hard. Benchmarks should not be trusted. There are so many variables that can influence performance and benchmarks. (A recent example that surprised me is the CPU frequency/power ramping properties of Xeon versus non-Xeon Intel CPUs. tl;dr a Xeon won't hit max CPU frequency if only a core or two is busy, meaning that any single or low-threaded benchmark is likely misleading on Xeons unless you change power settings to mitigate its conservative power ramping defaults. And if you change power settings, does that reflect real-life usage?)
Reporting useful and accurate performance numbers for compression is hard because there are so many variables to care about. For example:
- Every corpus is different. Text, JSON, C++, photos, numerical data, etc all exhibit different properties when fed into compression and could cause compression ratios or speeds to vary significantly.
- Few large inputs versus many smaller inputs (some algorithms work better on large inputs; some libraries have high per-operation overhead).
- Memory allocation and use strategy. Performance can vary significantly depending on how a compression library allocates, manages, and uses memory. This can be an implementation specific detail as opposed to a core property of the compression algorithm.
All performance data was obtained on an i7-6700K running Ubuntu 16.10 (Linux 4.8.0) with a mostly stock config. Benchmarks were performed in memory to mitigate storage I/O or filesystem interference. Memory used is DDR4-2133 with a cycle time of 35 clocks.
While I'm pretty positive about Zstandard, it isn't perfect. There are corpora for which Zstandard performs worse than other algorithms, even ones I compare it directly to in this post. So, your mileage may vary. Please enlighten me with your counterexamples by leaving a comment.
With that (rather large) disclaimer out of the way, let's talk about what makes Zstandard awesome.Flexibility for Speed Versus Size Trade-offs
Compression algorithms typically contain parameters to control how much work to do. You can choose to spend more CPU to (hopefully) achieve better compression or you can spend less CPU to sacrifice compression. (OK, fine, there are other factors like memory usage at play too. I'm simplifying.) This is commonly exposed to end-users as a compression level. (In reality there are often multiple parameters that can be tuned. But I'll just use level as a stand-in to represent the concept.)
But even with adjustable compression levels, the performance of many compression algorithms and libraries tend to fall within a relatively narrow window. In other words, many compression algorithms focus on niche markets. For example, LZ4 is super fast but doesn't yield great compression ratios. LZMA yields terrific compression ratios but is extremely slow.
This can be visualized in the following chart showing results when compressing a mozilla-unified Mercurial bundle:
This chart plots the logarithmic compression speed in megabytes per second against achieved compression ratio. The further right a data point is, the better the compression and the smaller the output. The higher up a point is, the faster compression is.
The ideal compression algorithm lives in the top right, which means it compresses well and is fast. But the powers of mathematics push compression algorithms away from the top right.
On to the observations.
LZ4 is highly vertical, which means its compression ratios are limited in variance but it is extremely flexible in speed. So for this data, you might as well stick to a lower compression level because higher values don't buy you much.
Bzip2 is the opposite: a horizontal line. That means it is consistently the same speed while yielding different compression ratios. In other words, you might as well crank bzip2 up to maximum compression because it doesn't have a significant adverse impact on speed.
LZMA and zlib are more interesting because they exhibit more variance in both the compression ratio and speed dimensions. But let's be frank, they are still pretty narrow. LZMA looks pretty good from a shape perspective, but its top speed is just too slow - only ~26 MB/s!
This small window of flexibility means that you often have to choose a compression algorithm based on the speed versus size trade-off you are willing to make at that time. That choice often gets baked into software. And as time passes and your software or data gains popularity, changing the software to swap in or support a new compression algorithm becomes harder because of the cost and disruption it will cause. That's technical debt.
What we really want is a single compression algorithm that occupies lots of space in both dimensions of our chart - a curve that has high variance in both compression speed and ratio. Such an algorithm would allow you to make an easy decision choosing a compression algorithm without locking you into a narrow behavior profile. It would allow you make a completely different size versus speed trade-off in the future by only adjusting a config knob or two in your application - no swapping of compression algorithms needed!
As you can guess, Zstandard fulfills this role. This can clearly be seen in the following chart (which also adds brotli for comparison).
The advantages of Zstandard (and brotli) are obvious. Zstandard's compression speeds go from ~338 MB/s at level 1 to ~2.6 MB/s at level 22 while covering compression ratios from 3.72 to 6.05. On one end, zstd level 1 is ~3.4x faster than zlib level 1 while achieving better compression than zlib level 9! That fastest speed is only 2x slower than LZ4 level 1. On the other end of the spectrum, zstd level 22 runs ~1 MB/s slower than LZMA at level 9 and produces a file that is only 2.3% larger.
It's worth noting that zstd's C API exposes several knobs for tweaking the compression algorithm. Each compression level maps to a pre-defined set of values for these knobs. It is possible to set these values beyond the ranges exposed by the default compression levels 1 through 22. I've done some basic experimentation with this and have made compression even faster (while sacrificing ratio, of course). This covers the gap between Zstandard and brotli on this end of the tuning curve.
The wide span of compression speeds and ratios is a game changer for compression. Unless you have special requirements such as lightning fast operations (which LZ4 can provide) or special corpora that Zstandard can't handle well, Zstandard is a very safe and flexible choice for general purpose compression.Multi-threaded Compression
Zstd 1.1.3 contains a multi-threaded compression API that allows a compression operation to leverage multiple threads. The output from this API is compatible with the Zstandard frame format and doesn't require any special handling on the decompression side. In other words, a compressor can switch to the multi-threaded API and decompressors won't care.
This is a big deal for a few reasons. First, today's advancements in computer processors tend to yield more capacity from more cores not from faster clocks and better cycle efficiency (although many cases do benefit greatly from modern instruction sets like AVX and therefore better cycle efficiency). Second, so many compression libraries are only single-threaded and require consumers to invent their own framing formats or storage models to facilitate multi-threading. (See Blosc for such a library.) Lack of a multi-threaded API in the compression library means trusting another piece of software or writing your own multi-threaded code.
The following chart adds a plot of Zstandard multi-threaded compression with 4 threads.
The existing curve for Zstandard basically shifted straight up. Nice!
The ~338 MB/s speed for single-threaded compression on zstd level 1 increases to ~1,376 MB/s with 4 threads. That's ~4.06x faster. And, it is ~2.26x faster than the previous fastest entry, LZ4 at level 1! The output size only increased by ~4 MB or ~0.3% over single-threaded compression.
The scaling properties for multi-threaded compression on this input are terrific: all 4 cores are saturated and the output size barely changed.
Because Zstandard's multi-threaded compression API produces data compatible with any Zstandard decompressor, it can logically be considered an extension of compression levels. This means that the already extremely flexible speed vs ratio curve becomes even wider in the speed axis. Zstandard was already a justifiable choice with its extreme versatility. But when you throw in native multi-threaded compression API support, the flexibility for tuning compression performance is just absurd. With enough cores, you are likely to run into I/O limits long before you exhaust the CPU, at which point you can crank up the compression level and sacrifice as much CPU as you are willing to burn. That's a good position to be in.Decompression Speed
Compression speed and ratios only tell half the story about a compression algorithm. Except for archiving scenarios where you write once and read rarely, you probably care about decompression performance.
Popular compression algorithms like zlib and bzip2 have less than stellar decompression speeds. On my i7-6700K, zlib decompression can deliver many decompressed data sets at the output end at 200+ MB/s. However, on the input/compressed end, it frequently fails to reach 100 MB/s or even 80 MB/s. This is significant because if your application is reading data over a 1 Gbps network or from a local disk (modern SSDs can read at several hundred MB/s or more), then your application has a CPU bottleneck at decoding the data - and that's before you actually do anything useful with the data in the application layer! (Remember: the idea behind compression is to spend CPU to mitigate an I/O bottleneck. So if compression makes you CPU bound, you've undermined the point of compression!) And if my Skylake CPU running at 4.0 GHz is CPU - not I/O - bound, A Xeon in a data center will be even slower and even more CPU bound (Xeons tend to run at much lower clock speeds - the laws of thermodynamics require that in order to run more cores in the package). In short, if you are using zlib for high throughput scenarios, there's a good chance it is a bottleneck and slowing down your application.
We again measure the speed of algorithms using a Firefox Mercurial bundle. The following charts plot decompression speed versus ratio for this file. The first chart measures decompression speed on the input end of the decompressor. The second measures speed at the output end.
Zstandard matches its great compression speed with great decompression speed. Zstandard can deliver decompressed output at 1000+ MB/s while consuming input at 200-275MB/s. Furthermore, decompression speed is mostly independent of the compression level. (Although higher compression levels require more memory in the decompressor.) So, if you want to throw more CPU at re-compression later so data at rest takes less space, you can do that without sacrificing read performance. I haven't done the math, but there is probably a break-even point where having dedicated machines re-compress terabytes or petabytes of data at rest offsets the costs of those machine through reduced storage costs.
While Zstandard is not as fast decompressing as LZ4 (which can consume compressed input at 500+ MB/s), its performance is often ~4x faster than zlib. On many CPUs, this puts it well above 1 Gbps, which is often desirable to avoid a bottleneck at the network layer.
It's also worth noting that while Zstandard and brotli were comparable on the compression half of this data, Zstandard has a clear advantage doing decompression.
Finally, you don't appear to pay a price for multi-threaded Zstandard compression on the decompression side (zstdmt in the chart).Dictionary Support
The examples so far in this post have used a single 4,457 MB piece of input data to measure behavior. Large data can behave completely differently from small data. This is because so much of what compression algorithms do is find patterns that came before so incoming data can be referenced to old data instead of uniquely stored. And if data is small, there isn't much of it that came before to reference!
This is often why many small, independent chunks of input compress poorly compared to a single large chunk. This can be demonstrated by comparing the widely-used zip and tar archive formats. On the surface, both do the same thing: they are a container of files. But they employ compression at different phases. A zip file will zlib compress each entry independently. However, a tar file doesn't use compression internally. Instead, the tar file itself is fed into a compression algorithm and compressed as a whole.
A more extreme example of the differences between zip and tar is the files in the Firefox source checkout. On revision a08ec245fa24 of the Firefox Mercurial repository, a zip file of all files in version control is 430,446,549 bytes versus 322,916,403 bytes for a tar.gz file (1,177,430,383 bytes uncompressed spanning 180,912 files). Using Zstandard, compressing each file discretely at compression level 3 yields 391,387,299 bytes of compressed data versus 294,926,418 as a single stream (without the tar container). Same compression algorithm. Different application method. Drastically different results. That's the impact of input size on compression performance.
While the compression ratio and speed of a single large stream is often better than multiple smaller chunks, there are still use cases that either don't have enough data or prefer independent access to each piece of input (like Firefox's omni.ja file). So a robust compression algorithm should handle small inputs as well as it does large inputs.
Zstandard helps offset the inherent inefficiencies of small inputs by supporting dictionary compression. A dictionary is essentially data used to seed the compressor's state. If the compressor sees data that exists in the dictionary, it references the dictionary instead of storing new data in the compressed output stream. This results in smaller output sizes and better compression ratios. One drawback to this is the dictionary has to be used to decompress data, which means you need to figure out how to distribute the dictionary and ensure it remains in sync with all data producers and consumers. This isn't always trivial.
Dictionary compression only works if there is enough repeated data and patterns in the inputs that can be extracted to yield a useful dictionary. Examples of this include markup languages, source code, or pieces of similar data (such as JSON payloads from HTTP API requests or telemetry data), which often have many repeated keywords and patterns.
Dictionaries are typically produced by training them on existing data. Essentially, you feed a bunch of samples into an algorithm that spits out a meaningful and useful dictionary. The more coherency in the data that will be compressed, the better the dictionary and the better the compression ratios.
Dictionaries can have a significant effect on compression ratios and speed.
Let's go back to Firefox's omni.ja file. Compressing each file discretely at zstd level 12 yields 9,177,410 bytes of data. But if we produce a 131,072 byte dictionary by training it on all files within omni.ja, the total size of each file compressed discretely is 7,942,886 bytes. Including the dictionary, the total size is 8,073,958 bytes, 1,103,452 bytes smaller than non-dictionary compression! (The zlib-based omni.ja is 9,783,749 bytes.) So Zstandard plus dictionary compression would likely yield a meaningful ~1.5 MB size reduction to the omni.ja file. This would make the Firefox distribution smaller and may improve startup time (since many files inside omni.ja are accessed at startup), which would make a number of people very happy. (Of course, Firefox doesn't yet contain the zstd C library. And adding it just for this use case may not make sense. But Firefox does ship with the brotli library and brotli supports dictionary compression and has similar performance characteristics as Zstandard, so, uh, someone may want to look into transitioning omni.jar to not zlib.)
But the benefits of dictionary compression don't end at compression ratios: operations with dictionaries can be faster as well!
The following chart shows performance when compressing Mercurial changeset data (describes a Mercurial commit) for the Firefox repository. There are 382,530 discrete inputs spanning 221,429,458 bytes (mean: 579 bytes, median: 306 bytes). (Note: measurements were conducted in Python and therefore may introduce some overhead.)
Aside from zstd level 3 dictionary compression, Zstandard is faster than zlib level 6 across the board (I suspect this one-off is an oddity with the zstd compression parameters at this level and this corpus because zstd level 4 is faster than level 3, which is weird).
It's also worth noting that non-dictionary zstandard compression has similar compression ratios to zlib. Again, this demonstrates the intrinsic difficulties of compressing small inputs.
But the real takeaway from this data are the speed differences with dictionary compression enabled. Dictionary decompression is 2.2-2.4x faster than non-dictionary decompression. Already respectable ~240 MB/s decompression speed (measured at the output end) becomes ~530 MB/s. Zlib level 6 was ~140 MB/s, so swapping in dictionary compression makes things ~3.8x faster. It takes ~1.5s of CPU time to zlib decompress this corpus. So if Mercurial can be taught to use Zstandard dictionary compression for changelog data, certain operations on this corpus will complete ~1.1s faster. That's significant.
It's worth stating that Zstandard isn't the only compression algorithm or library to support dictionary compression. Brotli and zlib do as well, for example. But, Zstandard's support for dictionary compression seems to be more polished than other libraries I've seen. It has multiple APIs for training dictionaries from sample data. (Brotli has none nor does brotli's documentation say how to generate dictionaries as far as I can tell.)
Dictionary compression is definitely an advanced feature, applicable only to certain use cases (lots of small, similar data). But there's no denying that if you can take advantage of dictionary compression, you may be rewarded with significant performance wins.A Versatile C API
As part of writing python-zstandard, I've spent a lot of time interfacing with the zstd C API. And, as part of evaluating other compression libraries for use in Mercurial, I've been looking at C APIs for other libraries and the Python bindings to them. A takeaway from this is an appreciation for the quality of zstd's C API.
Many compression library APIs are either too simple or too complex. Zstandard's is in the Goldilocks zone. Aside from a few minor missing features, its C API was more than adequate in its 1.0 release.
What I really appreciate about the zstd C API is that it provides high, medium, and low-level APIs. From the highest level, you throw it pointers to input and output buffers and it does an operation. From the medium level, you use a reusable context holding state and other parameters and it does an operation. From the low-level, you are calling multiple functions and shuffling bytes around, maintaining your own state and potentially bypassing the Zstandard framing format in the process. The different levels give you almost total control over everything. This is critical for performance optimization and when writing bindings for higher-level languages that may have different expectations on the behavior of software. The performance I've achieved in python-zstandard just isn't (easily) possible with other compression libraries because of their lacking API design.
Oftentimes when interacting with a C library I think if only there were a function to let me do X my life would be much easier. I rarely have this experience with Zstandard. The C API is well thought out, has almost all the features I want/need, and is pretty easy to use. While most won't notice this difference, it should be a significant advantage for Zstandard in the long run, as more bindings are written and more people have a high-quality experience with it because the C API allows them to.Zstandard Isn't Perfect
I've been pretty positive about Zstandard so far in this post. In fear of sounding like a fanboy who is so blinded by admiration that he can't see faults and because nothing is perfect, I need to point out some negatives about Zstandard. (Aside: put little faith in the words uttered by someone who can't find a fault in something they praise.)
First, the framing format is a bit heavyweight in some scenarios. The frame header is at least 6 bytes. For input of 256-65791 bytes, recording the original source size and its checksum will result in a 12 byte frame. Zlib, by contrast, is only 6 bytes for this scenario. When storing tens of thousands of compressed records (this is a use case in Mercurial), the frame overhead can matter and this can make it difficult for compressed Zstandard data to be as small as zlib for very small inputs. (It's worth noting that zlib doesn't store the decompressed size in its header. There are pros and cons to this, which I'll discuss in my eventual post about python-zstandard and how it achieves optimal performance.) If the frame overhead matters to you, the zstd C API does expose a block API that operates at a level below the framing format, allowing you to roll your own framing protocol. I also filed a GitHub issue to make the 4 byte magic number optional, which would go a long way to cutting down on frame overhead.
Second, the C API is not yet fully stabilized. There are a number of functions marked as experimental that aren't exported from the shared library and are only available via static linking. There's a ton of useful functionality in there, including low-level compression parameter adjustment, digested dictionaries (for reusing computed dictionaries across multiple contexts), and the multi-threaded compression API. python-zstandard makes heavy use of these experimental APIs. This requires bundling zstd with python-zstandard and statically linking with this known version because functionality could change at any time. This is a bit annoying, especially for distro packagers.
Third, the low-level compression parameters are under-documented. I think I understand what a lot of them do. But it isn't obvious when I should consider adjusting what. The default compression levels seem to work pretty well and map to reasonable compression parameters. But a few times I've noticed that tweaking things slightly can result in desirable improvements. I wish there were a guide of sorts to help you tune these parameters.
Fourth, dictionary compression is still a bit too complicated and hand-wavy for my liking. I can measure obvious benefits when using it largely out of the box with some corpora. But it isn't always a win and the cost for training dictionaries is too high to justify using it outside of scenarios where you are pretty sure it will be beneficial. When I do use it, I'm not sure which compression levels it works best with, how many samples need to be fed into the dictionary trainer, which training algorithm to use, etc. If that isn't enough, there is also the concept of content-only dictionaries where you use a fulltext as the dictionary. This can be useful for delta-encoding schemes (where compression effectively acts like a diff/delta generator instead of using something like Myers diff). If this topic interests you, there is a thread on the Mercurial developers list where Yann Collet and I discuss this.
Fifth, the patent rights grant. There is some wording in the PATENTS file in the Zstandard project that may... concern lawyers. While Zstandard is covered by the standard BSD 3-Clause license, that supplemental PATENTS file may scare some lawyers enough that you won't be able to use Zstandard. You may want to talk to a lawyer before using Zstandard, especially if you or your company likes initiating patent lawsuits against companies (or wishes to reserve that right - as many companies do), as that is the condition upon which the license terminates. Note that there is a long history between Facebook and consumers of its open source software regarding this language in the PATENTS file. Do a search for React patent grant to read more.
Sixth and finally, Zstandard is still relatively new. I can totally relate to holding off until something new and shiny proves itself. That being said, the Zstandard framing protocol has some escape hatches for future needs. And, the project proved during its pre-1.0 days that it knows how to handle backwards and future compatibility issues. And considering Facebook and others are using Zstandard in production, I wouldn't be too worried. I think the biggest risk is to people (like me) who are writing code against the experimental C APIs. But even then, the changes to the experimental APIs in the past several months have been minor. I'm not losing sleep over it.
That may seem like a long and concerning list. Most of the issues are relatively minor. The language in the PATENTS file may be a showstopper to some. From my perspective, the biggest thing Zstandard has going against it is its youth. But that will only improve with age. While I'm usually pretty conservative about adopting new technology (I've gotten burned enough times that I prefer the neophytes do the field testing for me), the upside to using Zstandard is potentially drastic performance and efficiency gains. And that can translate to success versus failure or millions of dollars in saved infrastructure costs and productivity gains. I'm willing to take my chances.Conclusion
For the corpora I've thrown at it, Zstandard handily outperforms zlib in almost every dimension. And, it even manages to best other modern compression algorithms like brotli in many tests.
The underlying algorithm and techniques used by Zstandard are highly parameterized, lending themselves to a variety of use cases from embedded hardware to massive data crunching machines with hundreds of gigabytes of memory and dozens of CPU cores.
The C API is well-designed and facilitates high performance and adaptability to numerous use cases. It is batteries included, providing functions to train dictionaries and perform multi-threaded compression.
Zstandard is backed by Facebook and seems to have a healthy open source culture on Github. My interactions with Yann Collet have been positive and he seems to be a great project maintainer.
Zstandard is an exciting advancement for data compression and therefore for the entire computing field. As someone who has lived in the world of zlib for years, was a casual user of compression, and thought zlib was good enough for most use cases, I can attest that Zstandard is game changing. After being enlightened to all the advantages of Zstandard, I'll never casually use zlib again: it's just too slow and inflexible for the needs of modern computing. If you use compression, I highly recommend investigating Zstandard.
(I updated the post on 2017-03-08 to include a paragraph about the supplemental license in the PATENTS file.)
Today, the organization WikiLeaks published a compendium of information alleged to be documents from the U.S. Central Intelligence Agency (CIA) pertaining to tools and techniques to compromise the security of mobile phones, computers, and internet-connected devices. We released the following statement on these reports:
If the information released in today’s reports are accurate, then it proves the CIA is undermining the security of the internet – and so is Wikileaks. We’ve said before that cybersecurity is a shared responsibility, and this is true in this example, regarding the disclosure of security vulnerabilities. It appears that neither the CIA nor Wikileaks are living up to that standard – the CIA seems to be stockpiling vulnerabilities, and Wikileaks seems to be using that trove for shock value rather than coordinating disclosure to the affected companies to give them a chance to fix it and protect users.
The government may have legitimate intelligence or law enforcement reasons for delaying disclosure of vulnerabilities (for example, to enable lawful hacking), but these same vulnerabilities can endanger the security of billions of people. These two interests must be balanced, and recent incidents demonstrate just how easily stockpiling vulnerabilities can go awry without proper policies and procedures in place.
Once governments become aware of a security vulnerability, they have a responsibility to consider how and when (not whether) to disclose the vulnerability to the affected company so they can fix the problem and protect users.
We have been advocating for broader, open conversations about disclosure of security vulnerabilities and although today’s disclosures are jarring, we hope this raises awareness of the severity of these issues and the urgency of collaborating on reforms.
Here is the presentation material for my talk entitled The Dark Arts of SSH. Please note this is a single HTML rendering that incldues presenter’s notes.
Once a month web developers across the Mozilla community get together (in person and virtually) to share what cool stuff we've been working on. This...
David Bryant: Why WebAssembly is a game changer for the web — and a source of pride for Mozilla and Firefox
With today’s release of Firefox, we are the first browser to support WebAssembly. If you haven’t yet heard of WebAssembly, it’s an emerging standard inspired by our research to enable near-native performance for web applications.
WebAssembly is one of the biggest advances to the Web Platform over the past decade.
To get a quick understanding of WebAssembly, and to get an idea of how some companies are looking at using it, check out this video. You’ll hear from engineers at Mozilla, and partners such as Autodesk, Epic, and Unity.https://medium.com/media/1858e816355bfa288aa7294e39278e67/href
It’s been a long, winding, and exciting road getting here.
The asm.js sub-language worked impressively well, and we knew the approach could work even better as a first-class web standard. So, using asm.js as a proof of concept, we set out to collaborate with other browser makers to establish such a standard that could run as part of browsers. Together with expert engineers across browser makers, we established consensus on WebAssembly. We expect support for it will soon start shipping in other browsers.
In some ways, WebAssembly changes what it means to be a web developer, as well as the fundamental abilities of the web. With WebAssembly and an accompanying set of tools, programs written in languages like C/C++ can be ported to the web so they run with near-native performance. We expect that, as WebAssembly continues to evolve, you’ll also be able to use it with programming languages often used for mobile apps, like Java, Swift, and C#.
If you’re interested in hearing more about the backstory of WebAssembly, check out this behind-the-scenes look.https://medium.com/media/7f594db82cecacb4cffaac7932ae1ac9/href
WebAssembly is shipping today in Firefox on Windows, MacOS, Linux, and Android. We’re particularly excited about the potential on mobile — do all those apps really need to be native?
If you’d like to try out some applications that use WebAssembly, upgrade to Firefox 52, and check out this demo of Zen Garden by Epic. For your convenience, we’ve embedded a video of the demo below.https://medium.com/media/9c771666d7a80886c78da81479420ee7/href
If you’re a developer interested in working with WebAssembly, check out WebAssembly documentation on MDN. You might also want to see this series of blog posts by Lin Clark that explain WebAssembly through some cool cartoons.
Here at Mozilla we’re focused on moving the web forward and on making Firefox the best browser, hands down. With WebAssembly shipping today and Project Quantum well underway, we’re more bullish about the web — and about Firefox — than ever.
Why WebAssembly is a game changer for the web — and a source of pride for Mozilla and Firefox was originally published in Mozilla Tech on Medium, where people are continuing the conversation by highlighting and responding to this story.
Over the last month we had a higher rate of commits, failures, and fixes. One large thing is that we turned on stylo specific tests and that was a slightly rocky road. Last month we suggested disabling tests after 2 weeks of seeing the failures. We ended up disabling many tests, but fixing many more.
In addition to more disabling of tests, we implemented a set bugzilla whiteboard entries to track our progress:
* [stockwell fixed] – a fix went in (even if it partially fixed the problem)
* in the last 2 months, we have 106
* [stockwell disabled] – we disabled the test in at least one config and no fix
* in the last 2 months, we have 61
* [stockwell infra] – Infra issues are usually externally driven
* in the last 2 months, we have 11
* [stockwell unknown] – this became less intermittent with no clear reason
* in the last 2 months, we have 44
* [stockwell needswork] – bugs in progress
* in the last 2 months, we have 24
We have also been tracking the orange factor and number of high frequency intermittents:Week starting: Jan 02, 2017 Jan 30, 2017 Feb 27, 2017 Orange Factor (OF) 13.76 10.75 9.06 # priority intermittents 42 61 32 OF – priority intermittents 7.25 5.78 4.78
I added a new row here, tracking the Orange Factor assuming all of the high frequency intermittent bugs didn’t exist. This is what the long tail looks like and I am really excited to see that number going down over time. For me a healthy spot would be OF <5.0 and the long tail <3.0.
We also looked at the number of unique bugs and repeat bugs/week. Most bugs have a lifecycle of 2 weeks and 2/3 of the bugs we see in a given week were high frequency (HF) the week prior. For example this past week we had 32 HF bugs and 21 of them were from the previous week (11 were still HF 2 weeks prior).
While it is nice to assume we should just disable all tests, we find that many developers are actively working on these issues and it shows that we have many more fixed bugs than disabled bugs. The main motivation for disabling tests is to reduce the confusion for developers on try and to reduce the work the sheriffs need to do. Taking this data into account we are looking to adjust our policy for disabling slightly:
- all high frequency bugs (>=30 times/week) will be triaged and expected to be resolved in 2 weeks, otherwise we will start the process of disabling the test that is causing the bug
- if a bug occurs >75 times/week, it will be triaged but expectations are that it will be resolved in 1 week, otherwise we will start the process of disabling the test that is causing the bug
- if a bug is reduced below a high frequency (< 30 times/week), we will be happy to make a note of that and keep an eye on it- but will not look at disabling the test.
The big change here is we will be more serious on disabling tests specifically when a test is >= 75 times/week. We have had many tests failing at least 50% of the time for weeks, these show up on almost all try pushes that run these tests. Developers should not be seeing failures like these. Since we are tracking fixed vs disabled, if we determine that we are disabling too much, we can revisit this policy next month.
Outside of numbers and policy, our goal is to have a solid policy, process, and toolchain available for self triaging as the year goes on. We are refining the policy and process via manual triage. The toolchain is the other work we are doing, here are some updates:
- adding BUG_COMPONENTS to all files in m-c (bug 1328351) – slow and steady progress, thanks for the reviews to date! We got behind to get SETA completed, but much of the heavy lifting is already done
- retrigger an existing job with additional debugging arguments (bug 1322433) – main discussion is done, figuring out small details, we have a prototype working with little work remaining. Next steps would be to implement the top 3 or 4 use cases.
- add a test-lint job to linux64/mochitest (bug 1323044) – no progress yet- this got put on the backburner as we worked on SETA and focused on triage, whiteboard tags, and BUG_COMPONENTS. We have landed code for using the ‘when’ clause for test jobs (bug 1342963) which is a small piece of this. Getting this initially working will move up in priority soon, and making this work on all harnesses/platforms will most likely be a Google Summer of Code project.
Are there items we should be working on or looking into? Please join our meetings.