mozilla

Mozilla Nederland LogoDe Nederlandse
Mozilla-gemeenschap

Subscribe to Mozilla planet feed
Planet Mozilla - https://planet.mozilla.org/
Updated: 3 wiken 6 dagen ferlyn

William Lachance: A principled reorganization of docs.telemetry.mozilla.org

mo, 11/05/2020 - 17:44

(this post is aimed primarily at an internal audience, but I thought I’d go ahead and make it public as a blog post)

I’ve been thinking a bunch over the past few months about the Mozilla data organization’s documentation story. We have a first class data platform here at Mozilla, but using it to answer questions, especially for newer employees, can be quite intimidating. As we continue our collective journey to becoming a modern data-driven organization, part of the formula for unlocking this promise is making the tools and platforms we create accessible to a broad internal audience.

My data peers are a friendly group of people and we have historically been good at answering questions on forums like the #fx-metrics slack channel: we’ll keep doing this. That said, our time is limited: we need a common resource for helping bring people up to speed on how to use the data platform to answer common questions.

Our documentation site, docs.telemetry.mozilla.org, was meant to be this resource: however in the last couple of years an understanding of its purpose has been (at least partially) lost and it has become somewhat overgrown with content that isn’t very relevant to those it’s intended to help.

This post’s goal is to re-establish a mission for our documentation site — towards the end, some concrete proposals on what to change are also outlined.

Setting the audience

docs.telemetry.mozilla.org was and is meant to be a resource useful for data practitioners within Mozilla.

Examples of different data practioners and their use cases:

  • A data scientist performing an experiment analysis
  • A data analyst producing a report on the effectiveness of a recent marketing campaign
  • A Firefox engineer trying to understand the performance characteristics of a new feature
  • A technical product manager trying to understand the characteristics of a particular user segment
  • A quality assurance engineer trying to understand the severity of a Firefox crash

There are a range of skills that these different groups bring to the table, but there are some common things we expect a data practitioner to have, whatever their formal job description:

  • At least some programming and analytical experience
  • Comfortable working with and understanding complex data systems with multiple levels of abstraction (for example: the relationship between the Firefox browser which produces Telemetry data and the backend system which processes it)
  • The time necessary to dig into details

This also excludes a few groups:

  • Senior leadership or executives: they are of course free to use docs.telemetry.mozilla.org if helpful, but it is expected that the type of analytical work covered by the documentation will normally be done by a data practitioner and that relevant concepts and background information will be explained to them (in the form of high-level dashboards, presentations, etc.).
  • Data Engineering: some of the material on docs.telemetry.mozilla.org may be incidentally useful to this internal audience, but explaining the full details of the data platform itself belongs elsewhere.
What do these users need?

In general, a data practitioner is trying to answer a specific set of questions in the context of an exploration. There are a few things that they need:

  • A working knowledge of how to use the technological tools to answer the questions they might have: for example, how to create a SQL query counting the median value of a particular histogram for a particular usage segment.
  • A set of guidelines on best practices on how to measure specific things: for example, we want our people using our telemetry systems to use well-formulated practices for measuring things like “Monthly Active Users” rather than re-inventing such things themselves.
What serves this need?

A few years ago, Ryan Harter did an extensive literature review on writing documentation on technical subjects - the take away from this exploration is that the global consensus is that we should focus most of our attention on writing practical tutorials which enables our users to perform specific tasks in the service of the above objective.

There is a proverb, allegedly attributed to Confucius which goes something like this:

“I hear and I forget. I see and I remember. I do and I understand.”

The understanding we want to build is how to use our data systems and tools to answer questions. Some knowledge of how our data platform works is no doubt necessary to accomplish this, but it is mostly functional knowledge we care about imparting to data practitioners: the best way to build this understanding is to guide users in performing tasks.

This makes sense intuitively, but it is also borne out by the data that this is what our users are looking for. Looking through the top pages on Google Analytics, virtually all of them1 refer either to a cookbook or howto guide:

Happily, this allows us to significantly narrow our focus for docs.telemetry.mozilla.org. We no longer need to worry about:

  • Providing lists or summaries of data tools available inside Mozilla2: we can talk about tools only as needed in the context of tasks they want to accomplish. We may want to keep such a list handy elsewhere for some other reason (e.g. internal auditing purposes), but we can safely say that it belongs somewhere else, like the Mozilla wiki, mana, or the telemetry.mozilla.org portal.
  • Detailed reference on the technical details of the data platform implementation. Links to this kind of material can be surfaced inside the documentation where relevant, but it is expected that an implementation reference will normally be stored and presented within the context of the tools themselves (a good example would be the existing documentation for GCP ingestion).
  • Detailed reference on all data sets, ping types, or probes: hand-written documentation for this kind of information is difficult to keep up to date with manual processes3 and is best generated automatically and browsed with tools like the probe dictionary.
  • Detailed reference material on how to submit Telemetry. While an overview of how to think about submitting telemetry may be in scope (recall that we consider Firefox engineers a kind of data practitioner), the details are really a seperate topic that is better left to another resource which is closer to the implementation (for example, the Firefox Source Documentation or the Glean SDK reference).

Scanning through the above, you’ll see a common theme: avoid overly detailed reference material. The above is not to say that we should avoid background documentation altogether. For example, an understanding of how our data pipeline works is key to understanding how up-to-date a dashboard is expected to be. However, this type of documentation should be written bearing in mind the audience (focusing on what they need to know as data practitioners) and should be surfaced towards the end of the documentation as supporting material.

As an exception, there is also a very small amount of reference documentation which we want to put at top-level because it is so important: for example the standard metrics page describes how we define “MAU” and “DAU”: these are measures that we want to standardize in the organization, and not have someone re-invent every time they produce an analysis or report. However, we should be very cautious about how much of this “front facing” material we include: if we overwhelm our audience with details right out of the gate, they are apt to ignore them.

Concrete actions
  • We should continue working on tutorials on how to perform common tasks: this includes not only the low-level guides that we currently have (e.g. BigQuery and SQL tutorials) but also information on how to effectively use our higher-level, more accessible tools like GLAM and GUD to answer questions.
  • Medium term, we should remove the per-dataset documentation and replace it with a graphical tool for browsing this type of information (perhaps Google Data Catalog). Since this is likely to be a rather involved project, we can keep the existing documentation for now — but for new datasets, we should encourage their authors to write tutorials on how to use them effectively (assuming they are of broad general interest) instead of hand-creating schema definitions that are likely to go out of date quickly.
  • We should set clear expectations and guidelines of what does and doesn’t belong on docs.telemetry.mozilla.org as part of a larger style guide. This style guide should be referenced somewhere prominent (perhaps as part of a pull request template) so that historical knowledge of what this resource is for isn’t lost.
Footnotes
  1. For some reason which I cannot fathom, a broad (non-Mozilla) audience seems unusually interested in our SQL style guide. 

  2. The current Firefox data documentation has a project glossary that is riddled with links to obsolete and unused projects. 

  3. docs.telemetry.mozilla.org has a huge section devoted to derived datasets (20+), many of which are obsolete or not recommended. At the same time, we are missing explicit reference material for the most commonly used tables in BigQuery (e.g. telemetry.main). 

Categorieën: Mozilla-nl planet

Daniel Stenberg: Manual cURL cURL

mo, 11/05/2020 - 08:46

The HP Color LaserJet CP3525 Printer looks like any other ordinary printer done by HP. But there’s a difference!

A friend of mine fell over this gem, and told me.

TCP/IP Settings

If you go to the machine’s TCP/IP settings using the built-in web server, the printer offers the ordinary network configure options but also one that sticks out a little exta. The “Manual cURL cURL” option! It looks like this:

I could easily confirm that this is genuine. I did this screenshot above by just googling for the string and printer model, since there appears to exist printers like this exposing their settings web server to the Internet. Hilarious!

What?

How on earth did that string end up there? Certainly there’s no relation to curl at all except for the actual name used there? Is it a sign that there’s basically no humans left at HP that understand what the individual settings on that screen are actually meant for?

Given the contents in the text field, a URL containing the letters WPAD twice, I can only presume this field is actually meant for Web Proxy Auto-Discovery. I spent some time trying to find the user manual for this printer configuration screen but failed. It would’ve been fun to find “manual cURL cURL” described in a manual! They do offer a busload of various manuals, maybe I just missed the right one.

Does it use curl?

Yes, it seems HP generally use curl at least as I found the “Open-Source Software License Agreements for HP LaserJet and ScanJet Printers” and it contains the curl license:

<figcaption>The curl license as found in the HP printer open source report.</figcaption> HP using curl for Print-Uri?

Independently, someone else recently told me about another possible HP + curl connection. This user said his HP printer makes HTTP requests using the user-agent libcurl-agent/1.0:

I haven’t managed to get this confirmed by anyone else (although the license snippet above certainly implies they use curl) and that particular user-agent string has been used everywhere for a long time, as I believe it is copied widely from the popular libcurl example getinmemory.c where I made up the user-agent and put it there already in 2004.

Credits

Frank Gevaerts tricked me into going down this rabbit hole as he told me about this string.

Categorieën: Mozilla-nl planet

Ludovic Hirlimann: Recommendations are moving entities

sn, 09/05/2020 - 15:02

At my new job we publish an open source webapp map systems uxing a mix of technologies, we also offer it as SAS. Last Thursday I looked at how our Nginx server was configured TLS wise.

I was thrilled to see the comment in our nginx code saying the configuration had been built using mozilla's ssl config tool. At the same time I was shocked to see that the configuration that dated from early 2018 was completely out of date. Half of the ciphers were gone. So we took a modern config and applied it.

Once done we turned ourselves to the observatory to check out our score, and me and my colleague were disappointed to get an F. So we fixed what we could easily (the cyphers) and added an issue to our product to make it more secure for our users.

We'll also probably add a calendar entry to check our score on a regular basis, as the recommendation will change, our software configuration will change too.

Categorieën: Mozilla-nl planet

Mozilla Security Blog: May 2020 CA Communication

fr, 08/05/2020 - 19:05

Mozilla has sent a CA Communication and Survey to inform Certification Authorities (CAs) who have root certificates included in Mozilla’s program about current expectations. Additionally this survey will collect input from CAs on potential changes to Mozilla’s Root Store Policy. This CA communication and survey has been emailed to the Primary Point of Contact (POC) and an email alias for each CA in Mozilla’s program, and they have been asked to respond to the following items:

  1. Review guidance about actions a CA should take if they realize that mandated restrictions regarding COVID-19 will impact their audits or delay revocation of certificates.
  2. Inform Mozilla if their CA’s ability to fulfill the commitments that they made in response to the January 2020 CA Communication has been impeded.
  3. Provide input into potential policy changes that are under consideration, such as limiting maximum lifetimes for TLS certificates and limiting the re-use of domain name verification.

The full communication and survey can be read here. Responses to the survey will be automatically and immediately published by the CCADB.

With this CA Communication, we reiterate that participation in Mozilla’s CA Certificate Program is at our sole discretion, and we will take whatever steps are necessary to keep our users safe. Nevertheless, we believe that the best approach to safeguard that security is to work with CAs as partners, to foster open and frank communication, and to be diligent in looking for ways to improve.

The post May 2020 CA Communication appeared first on Mozilla Security Blog.

Categorieën: Mozilla-nl planet

Data@Mozilla: This Week in Glean: mozregression telemetry (part 2)

fr, 08/05/2020 - 17:16

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean. You can find an index of all TWiG posts online.)

This is a special guest post by non-Glean-team member William Lachance!

This is a continuation of an exploration of adding Glean-based telemetry to a python application, in this case mozregression, a tool for automatically finding the source of Firefox regressions (breakage).

When we left off last time, we had written some test scripts and verified that the data was visible in the debug viewer.

Adding Telemetry to mozregression itself

In many ways, this is pretty similar to what I did inside the sample application: the only significant difference is that these are shipped inside a Python application that is meant to be be installable via pip. This means we need to specify the pings.yaml and metrics.yaml (located inside the mozregression subirectory) as package data inside setup.py:

setup( name="mozregression", ... package_data={"mozregression": ["*.yaml"]}, ... )

There were also a number of Glean SDK enhancements which we determined were necessary. Most notably, Michael Droettboom added 32-bit Windows wheels to the Glean SDK, which we need to make building the mozregression GUI on Windows possible. In addition, some minor changes needed to be made to Glean’s behaviour for it to work correctly with a command-line tool like mozregression — for example, Glean used to assume that Telemetry would always be disabled via a GUI action so that it would send a deletion ping, but this would obviously not work in an application like mozregression where there is only a configuration file — so for this case, Glean needed to be modified to check if it had been disabled between runs.

Many thanks to Mike (and others on the Glean team) for so patiently listening to my concerns and modifying Glean accordingly.

Getting Data Review

At Mozilla, we don’t just allow random engineers like myself to start collecting data in a product that we ship (even a semi-internal like mozregression). We have a process, overseen by Data Stewards to make sure the information we gather is actually answering important questions and doesn’t unnecessarily collect personally identifiable information (e.g. email addresses).

You can see the specifics of how this worked out in the case of mozregression in bug 1581647.

Documentation

Glean has some fantastic utilities for generating markdown-based documentation on what information is being collected, which I have made available on GitHub:

https://github.com/mozilla/mozregression/blob/master/docs/glean/metrics.md

The generation of this documentation is hooked up to mozregression’s continuous integration, so we can sure it’s up to date.

I also added a quick note to mozregression’s web site describing the feature, along with (very importantly) instructions on how to turn it off.

Enabling Data Ingestion

Once a Glean-based project has passed data review, getting our infrastructure to ingest it is pretty straightforward. Normally we would suggest just filing a bug and let us (the data team) handle the details, but since I’m on that team, I’m going to go a (little bit) of detail into how the sausage is made.

Behind the scenes, we have a collection of ETL (extract-transform-load) scripts in the probe-scraper repository which are responsible for parsing the ping and probe metadata files that I added to mozregression in the step above and then automatically creating BigQuery tables and updating our ingestion machinery to insert data passed to us there.

There’s quite a bit of complicated machinery being the scenes to make this all work, but since it’s already in place, adding a new thing like this is relatively simple. The changeset I submitted as part of a pull request to probe-scraper was all of 9 lines long:

diff --git a/repositories.yaml b/repositories.yaml index dffcccf..6212e55 100644 --- a/repositories.yaml +++ b/repositories.yaml @@ -239,3 +239,12 @@ firefox-android-release: - org.mozilla.components:browser-engine-gecko-beta - org.mozilla.appservices:logins - org.mozilla.components:support-migration +mozregression: + app_id: org-mozilla-mozregression + notification_emails: + - wlachance@mozilla.com + url: 'https://github.com/mozilla/mozregression' + metrics_files: + - 'mozregression/metrics.yaml' + ping_files: + - 'mozregression/pings.yaml' A Pretty Graph

With the probe scraper change merged and deployed, we can now start querying! A number of tables are automatically created according to the schema outlined above: notably “live” and “stable” tables corresponding to the usage ping. Using sql.telemetry.mozilla.org we can start exploring what’s out there. Here’s a quick query I wrote up:

SELECT DATE(submission_timestamp) AS date, metrics.string.usage_variant AS variant, count(*), FROM `moz-fx-data-shared-prod`.org_mozilla_mozregression_stable.usage_v1 WHERE DATE(submission_timestamp) >= '2020-04-14' AND client_info.app_display_version NOT LIKE '%.dev%' GROUP BY date, variant;

Generating a chart like this:

This chart represents the absolute volume of mozregression usage since April 14th 2020 (around the time when we first released a version of mozregression with Glean telemetry), grouped by mozregression “variant” (GUI, console, and mach) and date – you can see that (unsurprisingly?) the GUI has the highest usage.

Next Steps

We’re not done yet! Next time, we’ll look into making a public-facing dashboard demonstrating these results and making an aggregated version of the mozregression telemetry data publicly accessible to researchers and the general public. If we’re lucky, there might even be a bit of data science. Stay tuned!

(( This is a syndicated copy of the original post. ))

Categorieën: Mozilla-nl planet

William Lachance: This Week in Glean: mozregression telemetry (part 2)

fr, 08/05/2020 - 16:32

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean. You can find an index of all TWiG posts online.)

This is a special guest post by non-Glean-team member William Lachance!

This is a continuation of an exploration of adding Glean-based telemetry to a python application, in this case mozregression, a tool for automatically finding the source of Firefox regressions (breakage).

When we left off last time, we had written some test scripts and verified that the data was visible in the debug viewer.

Adding Telemetry to mozregression itself

In many ways, this is pretty similar to what I did inside the sample application: the only significant difference is that these are shipped inside a Python application that is meant to be be installable via pip. This means we need to specify the pings.yaml and metrics.yaml (located inside the mozregression subirectory) as package data inside setup.py:

setup( name="mozregression", ... package_data={"mozregression": ["*.yaml"]}, ... )

There were also a number of Glean SDK enhancements which we determined were necessary. Most notably, Michael Droettboom added 32-bit Windows wheels to the Glean SDK, which we need to make building the mozregression GUI on Windows possible. In addition, some minor changes needed to be made to Glean’s behaviour for it to work correctly with a command-line tool like mozregression — for example, Glean used to assume that Telemetry would always be disabled via a GUI action so that it would send a deletion ping, but this would obviously not work in an application like mozregression where there is only a configuration file — so for this case, Glean needed to be modified to check if it had been disabled between runs.

Many thanks to Mike (and others on the Glean team) for so patiently listening to my concerns and modifying Glean accordingly.

Getting Data Review

At Mozilla, we don’t just allow random engineers like myself to start collecting data in a product that we ship (even a semi-internal like mozregression). We have a process, overseen by Data Stewards to make sure the information we gather is actually answering important questions and doesn’t unnecessarily collect personally identifiable information (e.g. email addresses).

You can see the specifics of how this worked out in the case of mozregression in bug 1581647.

Documentation

Glean has some fantastic utilities for generating markdown-based documentation on what information is being collected, which I have made available on GitHub:

https://github.com/mozilla/mozregression/blob/master/docs/glean/metrics.md

The generation of this documentation is hooked up to mozregression’s continuous integration, so we can sure it’s up to date.

I also added a quick note to mozregression’s web site describing the feature, along with (very importantly) instructions on how to turn it off.

Enabling Data Ingestion

Once a Glean-based project has passed data review, getting our infrastructure to ingest it is pretty straightforward. Normally we would suggest just filing a bug and let us (the data team) handle the details, but since I’m on that team, I’m going to go a (little bit) of detail into how the sausage is made.

Behind the scenes, we have a collection of ETL (extract-transform-load) scripts in the probe-scraper repository which are responsible for parsing the ping and probe metadata files that I added to mozregression in the step above and then automatically creating BigQuery tables and updating our ingestion machinery to insert data passed to us there.

There’s quite a bit of complicated machinery being the scenes to make this all work, but since it’s already in place, adding a new thing like this is relatively simple. The changeset I submitted as part of a pull request to probe-scraper was all of 9 lines long:

diff --git a/repositories.yaml b/repositories.yaml index dffcccf..6212e55 100644 --- a/repositories.yaml +++ b/repositories.yaml @@ -239,3 +239,12 @@ firefox-android-release: - org.mozilla.components:browser-engine-gecko-beta - org.mozilla.appservices:logins - org.mozilla.components:support-migration +mozregression: + app_id: org-mozilla-mozregression + notification_emails: + - wlachance@mozilla.com + url: 'https://github.com/mozilla/mozregression' + metrics_files: + - 'mozregression/metrics.yaml' + ping_files: + - 'mozregression/pings.yaml' A Pretty Graph

With the probe scraper change merged and deployed, we can now start querying! A number of tables are automatically created according to the schema outlined above: notably “live” and “stable” tables corresponding to the usage ping. Using sql.telemetry.mozilla.org we can start exploring what’s out there. Here’s a quick query I wrote up:

SELECT DATE(submission_timestamp) AS date, metrics.string.usage_variant AS variant, count(*), FROM `moz-fx-data-shared-prod`.org_mozilla_mozregression_stable.usage_v1 WHERE DATE(submission_timestamp) >= '2020-04-14' AND client_info.app_display_version NOT LIKE '%.dev%' GROUP BY date, variant;

… which generates a chart like this:

This chart represents the absolute volume of mozregression usage since April 14th 2020 (around the time when we first released a version of mozregression with Glean telemetry), grouped by mozregression “variant” (GUI, console, and mach) and date - you can see that (unsurprisingly?) the GUI has the highest usage. I’ll talk about this more in an upcoming installment, speaking of…

Next Steps

We’re not done yet! Next time, we’ll look into making a public-facing dashboard demonstrating these results and making an aggregated version of the mozregression telemetry data publicly accessible to researchers and the general public. If we’re lucky, there might even be a bit of data science. Stay tuned!

Categorieën: Mozilla-nl planet

Wladimir Palant: What data does Xiaomi collect about you?

fr, 08/05/2020 - 13:43

A few days ago I published a very technical article confirming that Xiaomi browsers collect a massive amount of private data. This fact was initially publicized in a Forbes article based on the research by Gabriel Cîrlig and Andrew Tierney. After initially dismissing the report as incorrect, Xiaomi has since updated their Mint and Mi Pro browsers to include an option to disable this tracking in incognito mode.

Xiaomi demonstrating a privacy fig leaf<figcaption> Image credits: 1mran IN, Openclipart </figcaption>

Is the problem solved now? Not really. There is now exactly one non-obvious setting combination where you can have your privacy with these browsers: “Incognito Mode” setting on, “Enhanced Incognito Mode” setting off. With these not being the default and the users not informed about the consequences, very few people will change to this configuration. So the browsers will continue spying on the majority of their user base.

In this article I want to provide a high-level overview of the data being exfiltrated here. TL;DR: Lots and lots of it.

Contents

Disclaimer: This article is based entirely on reverse engineering Xiaomi Mint Browser 3.4.3. I haven’t seen the browser in action, so some details might be wrong. Update (2020-05-08): From a quick glance at Xiaomi Mint Browser 3.4.4 which has been released in the meantime, no further changes to this functionality appear to have been implemented.

Event data

When allowed, Xiaomi browsers will send information about a multitude of different events, sometimes with specific data attached. For example, an event will typically be generated when some piece of the user interface shows up or is clicked, an error occurs or the current page’s address is copied to clipboard. There are more interesting events as well however, for example:

  • A page started or finished loading, with the page address attached
  • Change of default search engine, with old and new search engines attached
  • Search via the navigation bar, with the search query attached
  • Reader mode switched on, with the page address attached
  • A tab clicked, with the tab name attached
  • A page being shared, with the page address attached
  • Reminder shown to switch on Incognito Mode, with the porn site that triggered the reminder attached
  • YouTube searches, with the search query attached
  • Video details for a YouTube video opened or closed, with video ID attached
  • YouTube video played, with video ID attached
  • Page or file downloaded, with the address attached
  • Speed dial on the homepage clicked, added or modified, with the target address attached
Generic annotations

Some pieces of data will be attached to every event. These are meant to provide the context, and to group related events of course. This data includes among other things:

  • A randomly generated identifier that is unique to your browser instance. While this identifier is supposed to change every 90 days, this won’t actually happen due to a bug. In most cases, it should be fairly easy to recognize the person behind the identifier.
  • An additional device identifier (this one will stay unchanged even if app data is cleared)
  • If you are logged into your Mi Account: the identifier of this account
  • The exact time of the event
  • Device manufacturer and model
  • Browser version
  • Operating system version
  • Language setting of your Android system
  • Default search engine
  • Mobile network operator
  • Network type (wifi, mobile)
  • Screen size
Conclusions

Even with the recent changes, Xiaomi browsers are massively invading users’ privacy. The amount of data collected by default goes far beyond what’s necessary for application improvement. Instead, Xiaomi appears to be interested in where users go, what they search for and which videos they watch. Even with a fairly obscure setting to disable this tracking, the default behavior isn’t acceptable. If you happen to be using a Xiaomi device, you should install a different browser ASAP.

Categorieën: Mozilla-nl planet

Daniel Stenberg: video: common mistakes when using libcurl

fr, 08/05/2020 - 08:02

As I posted previously, I did a webinar and here’s the recording and the slides I used for it.

The slides.

Categorieën: Mozilla-nl planet

The Talospace Project: Firefox 76 on POWER

fr, 08/05/2020 - 05:46
Firefox 76 is released. Besides other CSS, HTML and developer features, it refines that somewhat obnoxious zooming bar a bit, improves Picture-in-Picture further (great for livestreams: using it a lot for church), and most notably adds critical alerts for website breaches and improved password security (both generating good secure passwords and notifying you when a password used on one or other sites may have been stolen). The .mozconfigs are unchanged from Firefox 67, which is good news, because we've been stable without changing build options for quite a while at this point and we might be able to start investigating why some build options fail which should function. In particular, PGO and LTO would be nice to get working.
Categorieën: Mozilla-nl planet

Daniel Stenberg: qlog with curl

to, 07/05/2020 - 23:38

I want curl to be on the very bleeding edge of protocol development to aid the Internet protocol development community to test out protocols early and to work out kinks in the protocols and server implementations using curl’s vast set of tools and switches.

For this, curl supported HTTP/2 really early on and helped shaping the protocol and testing out servers.

For this reason, curl supports HTTP/3 already since August 2019. A convenient and well-known client that you can then use to poke on your brand new HTTP/3 servers too and we can work on getting all the rough edges smoothed out before the protocol is reaching its final state.

QUIC tooling

One of the many challenges QUIC and HTTP/3 have is that with a new transport protocol comes entirely new paradigms. With new paradigms like this, we need improved or perhaps even new tools to help us understand the network flows back and forth, to make sure we all have a common understanding of the protocols and to make sure we implement our end-points correctly.

QUIC only exists as an encrypted-only protocol, meaning that we can no longer easily monitor and passively investigate network traffic like before, QUIC also encrypts more of the protocol than TCP + TLS do, leaving even less for an outsider to see.

The current QUIC analyzer tool lineup gives us two options.

Wireshark

We all of course love Wireshark and if you get a very recent version, you’ll be able to decrypt and view QUIC network data.

With curl, and a few other clients, you can ask to get the necessary TLS secrets exported at run-time with the SSLKEYLOGFILE environment variable. You’ll then be able to see every bit in every packet. This way to extract secrets works with QUIC as well as with the traditional TCP+TLS based protocols.

qvis/qlog

The qvis/qlog site. If you find the Wireshark network view a little bit too low level and leaving a lot for you to understand and draw conclusions from, the next-level tool here is the common QUIC logging format called qlog. This is an agreed-upon common standard to log QUIC traffic, which the accompanying qvis web based visualizer tool that lets you upload your logs and get visualizations generated. This becomes extra powerful if you have logs from both ends!

Starting with this commit (landed in the git master branch on May 7, 2020), all curl builds that support HTTP/3 – independent of what backend you pick – can be told to output qlogs.

Enable qlogging in curl by setting the new standard environment variable QLOGDIR to point to a directory in which you want qlogs to be generated. When you run curl then, you’ll get files creates in there named as [hex digits].log, where the hex digits is the “SCID” (Source Connection Identifier).

Credits

qlog and qvis are spear-headed by Robin Marx. qlogging for curl with Quiche was pushed for by Lucas Pardue and Alessandro Ghedini. In the ngtcp2 camp, Tatsuhiro Tsujikawa made it very easy for me to switch it on in curl.

The top image is snapped from the demo sample on the qvis web site.

Categorieën: Mozilla-nl planet

The Mozilla Blog: Mozilla research shows some machine voices score higher than humans

to, 07/05/2020 - 17:36

This blog post is to accompany the publication of the paper Choice of Voices: A Large-Scale Evaluation of Text-to-Speech Voice Quality for Long-Form Content in the Proceedings of CHI’20, by Julia Cambre and Jessica Colnago from CMU, Jim Maddock from Northwestern, and Janice Tsai and Jofish Kaye from Mozilla. 

In 2019, Mozilla’s Voice team developed a method to evaluate the quality of text-to-speech voices. It turns out there was very little that had been done in the world of text to speech to evaluate voice for listening to long-form content — things like articles, book chapters, or blog posts. A lot of the existing work answered the core question ofcan you understand this voice?” So a typical test might use a syntactically correct but meaningless sentence, like “The masterly serials withdrew the collaborative brochure”, and have a listener type that in. That way, the listener couldn’t guess missed words from other words in the sentence. But now that we’ve reached a stage of computerized voice quality where so many voices can pass the comprehension test with flying colours, what’s the next step?

How can we determine if a voice is enjoyable to listen to, particularly for long-form content — something you’d listen to for more than a minute or two? Our team had a lot of experience with this: we had worked closely with our colleagues at Pocket to develop the Pocket Listen feature, so you can listen to articles you’ve saved, while driving or cooking. But we still didn’t know how to definitively say that one voice led to a better listening experience than another.

The method we used was developed by our intern Jessica Colnago during her internship at Mozilla, and it’s pretty simple in concept. We took one article, How to Reduce Your Stress in Two Minutes a Day, and we recorded each voice reading that article. Then we had 50 people on Mechanical Turk listen to each recording — 50 different people each time. (You can also listen to clips from most of these recordings to make your own judgement.). Nobody heard the article more than once. And at the end of the article, we’d ask them a couple of questions to check they were actually listening, and to see what they thought about the voice.

For example, we’d ask them to rate how much they liked the voice on a scale of one to five, and how willing they’d be to listen to more content recorded by that voice. We asked them why they thought that voice might be pleasant or unpleasant to listen to. We evaluated 27 voices, and here’s one graph which represents the results. (The paper has lots more rigorous analysis, and we explored various methods to sort the ratings, but the end results are all pretty similar. We also added a few more voices after the paper was finished, which is why there’s different numbers of voices in different places in this research.)

voice comparison graph As you can see, some voices rated better than others. The ones at the left are the ones people consistently rated positively, and the ones at the right are the ones that people liked less: just as examples, you’ll notice that the default (American) iOS female voice is pretty far to the right, although the Mac default voice has a pretty respectable showing. I was proud to find that the Mozilla Judy Wave1 voice, created by Mozilla research engineer Eren Gölge, is rated up there along with some of the best ones in the field. It turns out the best electronic voices we tested are Mozilla’s voices and the Polly Neural voices from Amazon. And while we still have some licensing questions to figure out, making sure we can create sustainable, publicly accessible, high quality voices, it’s exciting to see that we can do something in an open source way that is competitive with very well funded voice efforts out there, which don’t have the same aim of being private, secure and accessible to all.

We found there were some generalizable experiences. Listeners were 54% more likely to give a higher experience rating to the male voices we tested than the female voices. We also looked at the number of words spoken in a minute. Generally, our results indicated that there is a “just right speed” in the range of 163 to 177 words per minute, and people didn’t like listening to voices that were much faster or slower than that.

But the more interesting result comes from one of the things we did at a pretty late stage in the process, which was to include some humans reading the article directly into a microphone. Those are the voices circled in red:

voice comparison graph humans marked

What we found was that some of our human voices were being rated lower than some of the robot voices. And that’s fascinating. That suggests we are at a point in technology, in society right now, where there are mechanically generated voices that actually sound better than humans. And before you ask, I listened to those recordings of human voices. You can do the same. Janice (the recording labelled Human 2 in the dataset) has a perfectly normal voice that I find pleasant to listen to. And yet some people were finding these mechanically generated voices better.

That raises a whole host of interesting questions, concerns and opportunities. This is a snapshot of computerized voices, in the last two years or so. Even since we’ve done this study, we’ve seen the quality of voices improve. What happens when computers are more pleasant to listen to than our own voices? What happens when our children might prefer to listen to our computer reading a story than ourselves?

A potentially bigger ethical question comes with the question of persuasion. One question we didn’t ask in this study was whether people trusted or believed the content that was read to them. What happens when we can increase the number of people who believe something simply by changing the voice that it is read in? There are entire careers exploring the boundaries of influence and persuasion; how does easy access to “trustable” voices change our understanding of what signals point to trustworthiness? The BBC has been exploring British attitudes to regional accents in a similar way — drawing, fascinatingly, from a study of how British people reacted to different voices on the radio in 1927. We are clearly continuing a long tradition of analyzing the impact of voice and voices on how we understand and feel about information.

The post Mozilla research shows some machine voices score higher than humans appeared first on The Mozilla Blog.

Categorieën: Mozilla-nl planet

Hacks.Mozilla.Org: High Performance Web Audio with AudioWorklet in Firefox

to, 07/05/2020 - 17:10
Audio Worklets arrive in Firefox

AudioWorklet was first introduced to the web in 2018. Ever since, Mozilla has been investigating how to deliver a “no-compromises” implementation of this feature in the WebAudio API. This week, Audio Worklets landed in the release of Firefox 76. We’re ready to start bridging the gap between what can be done with audio in native applications and what is available on the web.

Now developers can leverage AudioWorklet to write arbitrary audio processing code, enabling the creation of web apps that weren’t possible before. This exciting new functionality raises the bar for emerging web experiences like 3D games, VR, and music production.

Audio worklets bring power and flexibility to general purpose real-time audio synthesis and processing. This begins with the addModule() method to specify a script that can generate audio on the fly or perform arbitrary processing of audio. Various kinds of sources can now be connected through the Web Audio API to an AudioWorkletNode for immediate processing. Source examples include an HTMLMediaElement resource, a local microphone, or remote audio.  Alternatively, the AudioWorklet script itself can be the source of audio.

Benefits

The audio processing code runs on a dedicated real-time system thread for audio processing. This frees the audio from pauses that in the past might have been caused by all the other things happening in the browser.

A process() method registered by the script is called at regular intervals on the real-time thread. Each call provides input and output buffers of PCM (pulse-code modulation) audio samples corresponding to a single AudioContext rendering block.  Processing of input samples produces output samples synchronously. With no latency added to the audio pipeline, we can build more responsive applications. The approach will look familiar to developers experienced with native audio APIs. In native development, this model of registering a callback is ubiquitous. The code registers a callback, which is called by the system to fill in buffers.

Loading a worklet script in an AudioContext, via its AudioWorklet property:

<button>Play</button> <audio src="t.mp3" controls></audio> <input type=range min=0.5 max=10 step=0.1 value=0.5></input> <script> let ac = new AudioContext; let audioElement = document.querySelector("audio"); let source = ac.createMediaElementSource(audioElement); async function play() { await ac.audioWorklet.addModule('clipper.js'); ac.resume(); audioElement.play(); let softclipper = new AudioWorkletNode(ac, 'soft-clipper-node'); source.connect(softclipper).connect(ac.destination); document.querySelector("input").oninput = function(e) { console.log("Amount is now " + e.target.value); softclipper.parameters.get("amount").value = e.target.value; } }; document.querySelector("button").onclick = function() { play(); } </script>

clipper.js: Implementing a soft-clipper that can produce a configurable distortion effect. This is simple with an Audio Worklet, but would use lots of memory done without it:

class SoftClipper extends AudioWorkletProcessor { constructor() { super() } static get parameterDescriptors() { return [{ name: 'amount', defaultValue: 0.5, minValue: 0, maxValue: 10, automationRate: "k-rate" }]; } process(input, output, parameters) { // `input` is an array of input ports, each having multiple channels. // For each channel of each input port, a Float32Array holds the audio // input data. // `output` is an array of output ports, each having multiple channels. // For each channel of each output port, a Float32Array must be filled // to output data. // `parameters` is an object having a property for each parameter // describing its value over time. let amount = parameters["amount"][0]; let inputPortCount = input.length; for (let portIndex = 0; portIndex < input.length; portIndex++) { let channelCount = input[portIndex].length; for (let channelIndex = 0; channelIndex < channelCount; channelIndex++) { let sampleCount = input[portIndex][channelIndex].length; for (let sampleIndex = 0; sampleIndex < sampleCount; sampleIndex++) { output[0][channelIndex][sampleIndex] = Math.tanh(amount * input[portIndex][channelIndex][sampleIndex]); } } } return true; } } registerProcessor('soft-clipper-node', SoftClipper); Real-time performance

With low latency, however, comes significant responsibility. Let’s draw a parallel from the graphics world, where 60 Hz is the common default screen refresh rate for mobile and desktop devices. Code that determines what to display is expected to run in less than

1000 / 60 = 16.6̇ ms

to ensure no dropped frames.

There are comparable expectations in the audio world. A typical audio system outputs 48000 audio frames per second, and the Web Audio API processes frames in blocks of 128. Thus, all audio computations for 128 frames (the current size of a block in the Web Audio API) must be performed in less than

128 * 1000 / 48000 ≅ 3 ms.

This includes all the process() calls of all the AudioWorkletProcessors in a Web Audio API graph, plus all of the native AudioNode processing.

On modern computers and mobile devices, 3 ms is plenty of time, but some programming patterns are better suited than others for this task. Missing this deadline will cause stuttering in the audio output, which is much more jarring than a dropped frame here and there on a display.

In order to always stay under your time budget, the number one rule of real-time audio programming is “avoid anything that can result in non-deterministic computation time”. Minimize or avoid anything beyond arithmetic operations, other math functions, and reading and writing from buffers.

In particular, for consistent processing times, scripts should keep the frequency of memory allocations to an absolute minimum.  If a working buffer is required, then allocate once and re-use the same buffer for each block of processing. MessagePort communication involves memory allocations, so we suggest you minimize complexity in copied data structures.  Try to do things on the real-time AudioWorklet thread only if absolutely necessary.

Garbage collection

Finally, because JavaScript is a garbage-collected language, and garbage collectors in today’s web browsers are not real-time safe, it’s necessary to minimize the creation of objects that are garbage collectable. This will minimize the non-determinism on the real-time thread.

With that said, the JavaScript JIT compilers and the garbage collectors of current generation JavaScript engines are advanced enough to allow many workloads to just work reliably, with a minimum of care in writing the code. In turn, this allows for rapid prototyping of ideas, or quick demos.

Firefox’s implementation

The principle of minimizing memory allocations, and only doing what is strictly necessary in audio processing, also applies to browser implementations of AudioWorklet.

A mistake in the Web Audio API specification accidentally required creation of new objects on each call to process() for its parameters. This requirement is to be removed from the specification for the sake of performance.  To allow developers to maximize the performance of their apps, Firefox does not create new objects for process() calls unless needed for a change in configuration. Currently, Firefox is the only major browser offering this feature.

If developers are careful to write JavaScript that does not create garbage collectable objects, then the garbage collector in Firefox will never be triggered on the real-time audio processing thread. This is simpler than it sounds, and it’s great for performance. You can use typed arrays, and reuse objects, but don’t use fancy features like promises. These simple pieces of advice go a long way, and only apply to the code that runs on the real-time audio thread.

When building Firefox’s implementation of AudioWorklet, we were extremely critical of the native code paths involved in processing audio.  Great care has been taken to allow developers to ship reliable audio applications on the web. We aim to deliver experiences that are as fast and stable as possible, on all operating systems where Firefox is available.

Several technical investigations supported our performance goals. Here are a few noteworthy ones: Profiling Firefox’s native memory allocation speed; only using threads with real-time priority on the critical path of the audio; and investigating the innards of SpiderMonkey. (SpiderMonkey is the JavaScript virtual machine of Firefox.) This ensures that our JavaScript engine isn’t doing any unbounded operation on the real-time audio threads.

WASM and Workers

The performance and potential of WebAssembly (WASM) is a perfect fit for complex audio processing or synthesis. WASM is available with AudioWorklet. In the professional audio industry, existing signal processing code is overwhelmingly implemented in languages that compile to WASM. Very often, this code is straightforward to compile to WASM and run on the web, because it’s solely doing audio processing. In addition, it is typically designed for a callback interface like what AudioWorklet offers.

For algorithms that need a large batch of processing, and cover significantly more data than a 128-frame block, it is better to split the processing across multiple blocks or perform it in a separate Web Worker thread.  When passing particularly large ArrayBuffers between Worker and AudioWorklet scripts, be sure to transfer ownership to avoid large copies. Then transfer the arrays back to avoid freeing memory on the real-time thread. This approach also avoids the need to allocate new buffers each time.

What’s next for web audio processing

AudioWorklet is the first of three features that will bridge the gap between native and web apps for low-latency audio processing. SharedArrayBuffer and WebAssembly SIMD are two other features that are coming soon to Firefox, and that are very interesting in combination with AudioWorklet. The former, SharedArrayBuffer, enables lock-free programming on the web, which is a technique audio programmers often rely on to reduce non-determinism of their real-time code. The latter, WebAssembly SIMD, will allow speeding up a variety of audio processing algorithms. It’s a technique very frequently found in audio software.

Want to take a closer look at how to use AudioWorklet in your web development work? You’ll find documentation and details of the spec on MDN. To share ideas for the spec, you can visit this WebAudio repo on github. And if you want to get more involved in the WebAudio community, there’s an active webaudio slack for that.

The post High Performance Web Audio with AudioWorklet in Firefox appeared first on Mozilla Hacks - the Web developer blog.

Categorieën: Mozilla-nl planet

Daniel Stenberg: Review: curl programming

to, 07/05/2020 - 09:11

Title: Curl Programming
Author: Dan Gookin
ISBN: 9781704523286
Weight: 181 grams

<figcaption>A book for my library is a book about my library!</figcaption>

Not long ago I discovered that someone had written this book about curl and that someone wasn’t me! (I believe this is a first) Thrilled of course that I could check off this achievement from my list of things I never thought would happen in my life, I was also intrigued and so extremely curious that I simply couldn’t resist ordering myself a copy. The book is dated October 2019, edition 1.0.

I don’t know the author of this book. I didn’t help out. I wasn’t aware of it and I bought my own copy through an online bookstore.

First impressions

It’s very thin! The first page with content is numbered 13 and the last page before the final index is page 110 (6-7 mm thick). Also, as the photo shows somewhat: it’s not a big format book either: 225 x 152 mm. I suppose a positive spin on that could be that it probably fits in a large pocket.

<figcaption>Size comparison with the 2018 printed version of Everything curl.</figcaption> I’m not the target audience

As the founder of the curl project and my role as lead developer there, I’m not really a good example of whom the author must’ve imagined when he wrote this book. Of course, my own several decades long efforts in documenting curl in hundreds of man pages and the Everything curl book makes me highly biased. When you read me say anything about this book below, you must remember that.

A primary motivation for getting this book was to learn. Not about curl, but how an experienced tech author like Dan teaches curl and libcurl programming, and try to use some of these lessons for my own writing and manual typing going forward.

What’s in the book?

Despite its size, the book is still packed with information. It contains the following chapters after the introduction:

  1. The amazing curl … 13
  2. The libcurl library … 25
  3. Your basic web page grab … 35
  4. Advanced web page grab … 49
  5. curl FTP … 63
  6. MIME form data … 83
  7. Fancy curl tricks … 97

As you can see it spends a total of 12 pages initially on explanations about curl the command line tool and some of the things you can do with it and how before it moves on to libcurl.

The book is explanatory in its style and it is sprinkled with source code examples showing how to do the various tasks with libcurl. I don’t think it is a surprise to anyone that the book focuses on HTTP transfers but it also includes sections on how to work with FTP and a little about SMTP. I think it can work well for someone who wants to get an introduction to libcurl and get into adding Internet transfers for their applications (at least if you’re into HTTP). It is not a complete guide to everything you can do, but then I doubt most users need or even want that. This book should get you going good enough to then allow you to search for the rest of the details on your own.

I think maybe the biggest piece missing in this book, and I really thing it is an omission mr Gookin should fix if he ever does a second edition: there’s virtually no mention of HTTPS or TLS at all. On the current Internet and web, a huge portion of all web pages and page loads done by browsers are done with HTTPS and while it is “just” HTTP with TLS on top, the TLS part itself is worth some special attention. Not the least because certificates and how to deal with them in a libcurl world is an area that sometimes seems hard for users to grasp.

A second thing I noticed no mention of, but I think should’ve been there: a description of curl_easy_getinfo(). It is a versatile function that provides information to users about a just performed transfer. Very useful if you ask me, and a tool in the toolbox every libcurl user should know about.

The author mentions that he was using libcurl 7.58.0 so that version or later should be fine to use to use all the code shown. Most of the code of course work in older libcurl versions as well.

Comparison to Everything curl

Everything curl is a free and open document describing everything there is to know about curl, including the project itself and curl internals, so it is a much wider scope and effort. It is however primarily provided as a web and PDF version, although you can still buy a printed edition.

Everything curl spends more space on explanations of features and discussion how to do things and isn’t as focused around source code examples as Curl Programming. Everything curl on paper is also thicker and more expensive to buy – but of course much cheaper if you’re fine with the digital version.

Where to buy?

First: decide if you need to buy it. Maybe the docs on the curl site or in Everything curl is already good enough? Then I also need to emphasize that you will not sponsor or help out the curl project itself by buying this book – it is authored and sold entirely on its own.

But if you need a quick introduction with lots of examples to get your libcurl usage going, by all means go ahead. This could be the book you need. I will not link to any online retailer or anything here. You can get it from basically anyone you like.

Mistakes or errors?

I’ve found some mistakes and ways of phrasing the explanations that I maybe wouldn’t have used, but all in all I think the author seems to have understood these things and describes functionality and features accurately and with a light and easy-going language.

Finally: I would never capitalize curl as Curl or libcurl as Libcurl, not even in a book. Just saying…

Categorieën: Mozilla-nl planet

The Rust Programming Language Blog: Announcing Rust 1.43.1

to, 07/05/2020 - 02:00

The Rust team has published a new point release of Rust, 1.43.1. Rust is a programming language that is empowering everyone to build reliable and efficient software.

If you have a previous version of Rust installed via rustup, getting Rust 1.43.1 is as easy as:

rustup update stable

If you don't have it already, you can get rustup from the appropriate page on our website, and check out the detailed release notes for 1.43.1 on GitHub.

What's in Rust 1.43.1

Rust 1.43.1 addresses two regressions introduced in the 1.43.0 stable release, and updates the OpenSSL version used by Cargo.

Fixed undetectable CPU features

Rust 1.27.0 introduced support for detecting x86 CPU features in the standard library, thanks to the is_x86_feature_detected! macro. Due to an internal refactoring, Rust 1.43.0 prevented the detection of features that can't be used on stable yet (such as AVX-512), even though detecting them was allowed in the past. Rust 1.43.1 fixes this regression. More information on the regression in available in issue #71473.

Fixed broken cargo package --list

Rust 1.43.0 broke support for listing the files included in packages published with Cargo, when inside a workspace with path dependencies or unpublished versions. A fix for the issue is included in Rust 1.43.1. More information on the bug is available in Cargo issue #8151.

OpenSSL updated to 1.1.1g

OpenSSL, one of the dependencies of Cargo, recently released a security advisory. Unfortunately we were not able to include the fix in time for Rust 1.43.0, so we upgraded OpenSSL in Rust 1.43.1. We have no evidence this vulnerability could compromise the security of Cargo users (if you do, please follow our security policy).

Contributors to 1.43.1

Many people came together to create Rust 1.43.1. We couldn't have done it without all of you. Thanks!

Categorieën: Mozilla-nl planet

Nicholas Nethercote: How to speed up the Rust compiler in 2020

to, 07/05/2020 - 00:52

I last wrote in December 2019 about my work on speeding up the Rust compiler. Time for another update.

Incremental compilation

I started the year by profiling incremental compilation and making several improvements there.

#68914: Incremental compilation pushes a great deal of data through a hash function, called SipHasher128, to determine what code has changed since the last compiler invocation. This PR greatly improved the extraction of bytes from the input byte stream (with a lot of back and forth to ensure it worked on both big-endian and little-endian platforms), giving incremental compilation speed-ups of up to 13% across many benchmarks. It also added a lot more comments to explain what is going on in that code, and removed multiple uses of unsafe.

#69332: This PR reverted the part of #68914 that changed the u8to64_le function in a way that made it simpler but slower. This didn’t have much impact on performance because it’s not a hot function, but I’m glad I caught it in case it gets used more in the future. I also added some explanatory comments so nobody else will make the same mistake I did!

#69050: LEB128 encoding is used extensively within Rust crate metadata. Michael Woerister had previously sped up encoding and decoding in #46919, but there was some fat left. This PR carefully minimized the number of operations in the encoding and decoding loops, almost doubling their speed, and giving wins on many benchmarks of up to 5%. It also removed one use of unsafe. In the PR I wrote a detailed description of the approach I took, covering how I found the potential improvement via profiling, the 18 different things I tried (10 of which improved speed), and the final performance results.

LLVM bitcode

Last year I noticed from profiles that rustc spends some time compressing the LLVM bitcode it produces, especially for debug builds. I tried changing it to not compress the bitcode, and that gave some small speed-ups, but also increased the size of compiled artifacts on disk significantly.

Then Alex Crichton told me something important: the compiler always produces both object code and bitcode for crates. The object code is used when compiling normally, and the bitcode is used when compiling with link-time optimization (LTO), which is rare. A user is only ever doing one or the other, so producing both kinds of code is typically a waste of time and disk space.

In #66598 I tried a simple fix for this: add a new flag to rustc that tells it to omit the LLVM bitcode. Cargo could then use this flag whenever LTO wasn’t being used. After some discussion we decided it was too simplistic, and filed issue #66961 for a more extensive change. That involved getting rid of the use of compressed bitcode by instead storing uncompressed bitcode in a section in the object code (a standard format used by clang), and introducing the flag for Cargo to use to disable the production of bitcode.

The part of rustc that deals with all this was messy. The compiler can produce many different kinds of output: assembly code, object code, LLVM IR, and LLVM bitcode in a couple of possible formats. Some of these outputs are dependent on other outputs, and the choices on what to produce depend on various command line options, as well as details of the particular target platform. The internal state used to track output production relied on many boolean values, and various nonsensical combinations of these boolean values were possible.

When faced with messy code that I need to understand, my standard approach is to start refactoring. I wrote #70289, #70345, and #70384 to clean up code generation, #70297, #70729 , and #71374 to clean up command-line option handling, and #70644 to clean up module configuration. Those changes gave me some familiarity with the code, simplifed it, and I was then able to write #70458 which did the main change.

Meanwhile, Alex Crichton wrote the Cargo support for the new -Cembed-bitcode=no option (and also answered a lot of my questions). Then I fixed rustc-perf so it would use the correct revisions of rustc and Cargo together, without which the the change would erroneously look like a performance regression on CI. Then we went through a full compiler-team approval and final comment period for the new command-line option, and it was ready to land.

Unfortunately, while running the pre-landing tests we discovered that some linkers can’t handle having bitcode in the special section. This problem was only discovered at the last minute because only then are all tests run on all platforms. Oh dear, time for plan B. I ended up writing #71323 which went back to the original, simple approach, with a flag called -Cbitcode-in-rlib=no. [EDIT: note that libstd is still compiled with -Cbitcode-in-rlib=yes, which means that libstd rlibs will still work with both LTO and non-LTO builds.]

The end result was one of the bigger performance improvements I have worked on. For debug builds we saw wins on a wide range of benchmarks of up to 18%, and for opt builds we saw wins of up to 4%. The size of rlibs on disk has also shrunk by roughly 15-20%. Thanks to Alex for all the help he gave me on this!

Anybody who invokes rustc directly instead of using Cargo might want to use -Cbitcode-in-rlib=no to get the improvements.

[EDIT (May 7, 2020): Alex subsequently got the bitcode-in-object-code-section approach working in #71528 by adding the appropriate “ignore this section, linker” incantations to the generated code. He then changed the option name back to the original -Cembed-bitcode=no in #71716. Thanks again, Alex!]

Miscellaneous improvements

#67079: Last year in #64545 I introduced a variant of the shallow_resolved function that was specialized for a hot calling pattern. This PR specialized that function some more, winning up to 2% on a couple of benchmarks.

#67340: This PR shrunk the size of the Nonterminal type from 240 bytes to 40 bytes, reducing the number of memcpy calls (because memcpy is used to copy values larger than 128 bytes), giving wins on a few benchmarks of up to 2%.

#68694: InferCtxt is a type that contained seven different data structures within RefCells. Several hot operations would borrow most or all of the RefCells, one after the other. This PR grouped the seven data structures together under a single RefCell in order to reduce the number of borrows performed, for wins of up to 5%.

#68790: This PR made a couple of small improvements to the merge_from_succ function, giving 1% wins on a couple of benchmarks.

#68848: The compiler’s macro parsing code had a loop that instantiated a large, complex value (of type Parser) on each iteration, but most of those iterations did not modify the value. This PR changed the code so it initializes a single Parser value outside the loop and then uses Cow to avoid cloning it except for the modifying iterations, speeding up the html5ever benchmark by up to 15%. (An aside: I have used Cow several times, and while the concept is straightforward I find the details hard to remember. I have to re-read the documentation each time. Getting the code to work is always fiddly, and I’m never confident I will get it to compile successfully… but once I do it works flawlessly.)

#69256: This PR marked with #[inline] some small hot functions relating to metadata reading and writing, for 1-5% improvements across a number of benchmarks.

#70837: There is a function called find_library_crate that does exactly what its name suggests. It did a lot of repetitive prefix and suffix matching on file names stored as PathBufs. The matching was slow, involving lots of re-parsing of paths within PathBuf methods, because PathBuf isn’t really designed for this kind of thing. This PR pre-emptively extracted the names of the relevant files as strings and stored them alongside the PathBufs, and changed the matching to use those strings instead, giving wins on various benchmarks of up to 3%.

#70876: Cache::predecessors is an oft-called function that produces a vector of vectors, and the inner vectors are usually small. This PR changed the inner vector to a SmallVec for some very small wins of up to 0.5% on various benchmarks.

Other stuff

I added support to rustc-perf for the compiler’s self-profiler. This gives us one more profiling tool to use on the benchmark suite on local machines.

I found that using LLD as the linker when building rustc itself reduced the time taken for linking from about 93 seconds to about 41 seconds. (On my Linux machine I do this by preceding the build command with RUSTFLAGS="-C link-arg=-fuse-ld=lld".) LLD is a really fast linker! #39915 is the three-year old issue open for making LLD the default linker for rustc, but unfortunately it has stalled. Alexis Beingessner wrote a nice summary of the current situation. If anyone with knowledge of linkers wants to work on that issue, it could be a huge win for many Rust users.

Failures

Not everything I tried worked. Here are some notable failures.

#69152: As mentioned above, #68914 greatly improved SipHasher128, the hash function used by incremental compilation. That hash function is a 128-bit version of the default 64-bit hash function used by Rust hash tables. I tried porting those same improvements to the default hasher. The goal was not to improve rustc’s speed, because it uses FxHasher instead of default hashing, but to improve the speed of all Rust programs that do use default hashing. Unfortunately, this caused some compile-time regressions for complex reasons discussed in detail in the PR, and so I abandoned it. I did manage to remove some dead code in the default hasher in #69471, though.

#69153: While working on #69152, I tried switching from FxHasher back to the improved default hasher (i.e. the one that ended up not landing) for all hash tables within rustc. The results were terrible; every single benchmark regressed! The smallest regression was 4%, the largest was 85%. This demonstrates (a) how heavily rustc uses hash tables, and (b) how much faster FxHasher is than the default hasher when working with small keys.

I tried using ahash for all hash tables within rustc. It is advertised as being as fast as FxHasher but higher quality. I found it made rustc a tiny bit slower. Also, ahash is also not deterministic across different builds, because it uses const_random! when initializing hasher state. This could cause extra noise in perf runs, which would be bad. (Edit: It would also prevent reproducible builds, which would also be bad.)

I tried changing the SipHasher128 function used for incremental compilation from the Sip24 algorithm to the faster but lower-quality Sip13 algorithm. I got wins of up to 3%, but wasn’t confident about the safety of the change and so didn’t pursue it further.

#69157: Some follow-up measurements after #69050 suggested that its changes to LEB128 decoding were not as clear a win as they first appeared. (The improvements to encoding were still definitive.) The performance of decoding appears to be sensitive to non-local changes, perhaps due to differences in how the decoding functions are inlined throughout the compiler. This PR reverted some of the changes from #69050 because my initial follow-up measurements suggested they might have been pessimizations. But then several sets of additional follow-up measurements taken after rebasing multiple times suggested that the reversions sometimes regressed performance. The reversions also made the code uglier, so I abandoned this PR.

#66405: Each obligation held by ObligationForest can be in one of several
states, and transitions between those states occur at various points. This
PR reduced the number of states from five to three, and greatly reduced the
number of state transitions, which won up to 4% on a few benchmarks. However, it ended up causing some drastic regressions for some users, so in #67471 I reverted those changes.

#60608: This issue suggests using FxIndexSet in some places where currently an FxHashMap plus a Vec are used. I tried it for the symbol table and it was a significant regression for a few benchmarks.

Progress

Since my last blog post, compile times have seen some more good improvements. The following screenshot shows wall-time changes on the benchmark suite since then (2019-12-08 to 2020-04-22).

Table of compiler performance results.

The biggest changes are in the synthetic stress tests await-call-tree-debug, wf-projection-stress-65510, and ctfe-stress-4, which aren’t representative of typical code and aren’t that important.

Overall it’s good news, with many improvements (green), some in the double digits, and relatively few regressions (red). Many thanks to everybody who helped with all the performance improvements that landed during this period.

Categorieën: Mozilla-nl planet

Dustin J. Mitchell: Debugging Docker Connection Reset by Peer

wo, 06/05/2020 - 23:00

(this post is co-written with @imbstack and cross-posted on his blog)

Symptoms

At the end of January this year the Taskcluster team was alerted to networking issues in a user’s tasks. The first report involved ETIMEDOUT but later on it became clear that the more frequent issue was involving ECONNRESET in the middle of downloading artifacts necessary to run the tests in the tasks. It seemed it was only occurring on downloads from Google (https://dl.google.com) on our workers running in GCP, and only with relatively large artifacts. This led us to initially blame some bit of infrastructure outside of Taskcluster but eventually we found the issue to be with how Docker was handling networking on our worker machines.

Investigation

The initial stages of the investigation were focused on exploring possible causes of the error and on finding a way to reproduce the error.

Investigation of an intermittent error in a high-volume system like this is slow and difficult work. It’s difficult to know if an intervention fixed the issue just because the error does not recur. And it’s difficult to know if an intervention did not fix the issue, as “Connection reset by peer” can be due to transient network hiccups. It’s also difficult to gather data from production systems as the quantity of data per failure is unmanageably high.

We explored a few possible causes of the issue, all of which turned out to be dead ends.

  • Rate Limiting or Abuse Prevention - The TC team has seen cases where downloads from compute clouds were limited as a form of abuse prevention. Like many CI processes, the WPT jobs download Chrome on every run, and it’s possible that a series of back-to-back tasks on the same worker could appear malicious to an abuse-prevention device.
  • Outages of the download server - This was unlikely, given Google’s operational standards, but worth exploring since the issues seemed limited to dl.google.com.
  • Exhaustion of Cloud NAT addresses - Resource exhaustion in the compute cloud might have been related. This was easily ruled out with the observation that workers are not using Cloud NAT.

At the same time, several of us were working on reproducing the issue in more controlled circumstances. This began with interactive sessions on Taskcluster workers, and soon progressed to a script that reproduced the issue easily on a GCP instance running the VM image used to run workers. An important observation here was that the issue only reproduced inside of a docker container: downloads from the host worked just fine. This seemed to affect all docker images, not just the image used in WPT jobs.

At this point, we were able to use Taskcluster itself to reproduce the issue at scale, creating a task group of identical tasks running the reproduction recipe. The “completed” tasks in that group are the successful reproductions.

Armed with quick, reliable reproduction, we were able to start capturing dumps of the network traffic. From these, we learned that the downloads were failing mid-download (tens of MB into a ~65MB file). We were also able to confirm that the error is, indeed, a TCP RST segment from the peer.

Searches for similar issues around this time found a blog post entitled “Fix a random network Connection Reset issue in Docker/Kubernetes”, which matched our issue in many respects. It’s a long read, but the summary is that conntrack, which is responsible for maintaining NAT tables in the Linux kernel, sometimes gets mixed up and labels a valid packet as INVALID. The default configuration of iptables forwarding rules is to ignore INVALID packets, meaning that they fall through to the default ACCEPT for the FILTER table. Since the port is not open on the host, the host replies with an RST segment. Docker containers use NAT to translate between the IP of the container and the IP of the host, so this would explain why the issue only occurs in a Docker container.

We were, indeed, seeing INVALID packets as revealed by conntrack -S, but there were some differences from our situation, so we continued investigating. In particular, in the blog post, the connection errors are seen there in the opposite direction, and involved a local server for which the author had added some explicit firewall rules.

Since we hypothesized that NAT was involved, we captured packet traces both inside the Docker container and on the host interface, and combined the two. The results were pretty interesting! In the dump output below, 74.125.195.136 is dl.google.com, 10.138.0.12 is the host IP, and 172.17.0.2 is the container IP. 10.138.0.12 is a private IP, suggesting that there is an additional layer of NAT going on between the host IP and the Internet, but this was not the issue.

A “normal” data segment looks like

22:26:19.414064 ethertype IPv4 (0x0800), length 26820: 74.125.195.136.https > 10.138.0.12.60790: Flags [.], seq 35556934:35583686, ack 789, win 265, options [nop,nop,TS val 2940395388 ecr 3057320826], length 26752 22:26:19.414076 ethertype IPv4 (0x0800), length 26818: 74.125.195.136.https > 172.17.0.2.60790: Flags [.], seq 35556934:35583686, ack 789, win 265, options [nop,nop,TS val 2940395388 ecr 3057320826], length 26752

here the first line is outside the container and the second line is inside the container; the SNAT translation has rewritten the host IP to the container IP. The sequence numbers give the range of bytes in the segment, as an offset from the initial sequence number, so we are almost 34MB into the download (from a total of about 65MB) at this point.

We began by looking at the end of the connection, when it failed.

A 22:26:19.414064 ethertype IPv4 (0x0800), length 26820: 74.125.195.136.https > 10.138.0.12.60790: Flags [.], seq 35556934:35583686, ack 789, win 265, options [nop,nop,TS val 2940395388 ecr 3057320826], length 26752 22:26:19.414076 ethertype IPv4 (0x0800), length 26818: 74.125.195.136.https > 172.17.0.2.60790: Flags [.], seq 35556934:35583686, ack 789, win 265, options [nop,nop,TS val 2940395388 ecr 3057320826], length 26752 B 22:26:19.414077 ethertype IPv4 (0x0800), length 2884: 74.125.195.136.https > 10.138.0.12.60790: Flags [.], seq 34355910:34358726, ack 789, win 265, options [nop,nop,TS val 2940395383 ecr 3057320821], length 2816 C 22:26:19.414091 ethertype IPv4 (0x0800), length 56: 10.138.0.12.60790 > 74.125.195.136.https: Flags [R], seq 821696165, win 0, length 0 ... X 22:26:19.416605 ethertype IPv4 (0x0800), length 66: 172.17.0.2.60790 > 74.125.195.136.https: Flags [.], ack 35731526, win 1408, options [nop,nop,TS val 3057320829 ecr 2940395388], length 0 22:26:19.416626 ethertype IPv4 (0x0800), length 68: 10.138.0.12.60790 > 74.125.195.136.https: Flags [.], ack 35731526, win 1408, options [nop,nop,TS val 3057320829 ecr 2940395388], length 0 Y 22:26:19.416715 ethertype IPv4 (0x0800), length 56: 74.125.195.136.https > 10.138.0.12.60790: Flags [R], seq 3900322453, win 0, length 0 22:26:19.416735 ethertype IPv4 (0x0800), length 54: 74.125.195.136.https > 172.17.0.2.60790: Flags [R], seq 3900322453, win 0, length 0

Segment (A) is a normal data segment, forwarded to the container. But (B) has a much lower sequence number, about 1MB earlier in the stream, and it is not forwarded to the docker container. Notably, (B) is also about 1/10 the size of the normal data segments – we never figured out why that is the case. Instead, we see an RST segment (C) sent back to dl.google.com. This situation repeats a few times: normal segment forwarded, late segment dropped, RST segment sent to peer.

Finally, the docker container sends an ACK segment (X) for the segments it has received so far, and this is answered by an RST segment (Y) from the peer, and that RST segment is forwarded to the container. This final RST segment is reasonable from the peer’s perspective: we have already reset its connection, so by the time it gets (X) the connection has been destroyed. But this is the first the container has heard of any trouble on the connection, so it fails with “Connection reset by peer”.

So it seems that the low-sequence-number segments are being flagged as INVALID by conntrack and causing it to send RST segments. That’s a little surprising – why is conntrack paying attention to sequence numbers at all? From this article it appears this is a security measure, helping to protect sockets behind the NAT from various attacks on TCP.

The second surprise here is that such late TCP segments are present. Scrolling back through the dump output, there are many such packets – enough that manually labeling them is infeasible. However, graphing the sequence numbers shows a clear pattern:

sequence number graph

Note that this covers only the last 16ms of the connection (the horizontal axis is in seconds), carrying about 200MB of data (the vertical axis is sequence numbers, indicating bytes). The “fork” in the pattern shows a split between the up-to-date segments, which seem to accelerate, and the delayed segments. The delayed segments are only slightly delayed - 2-3ms. But a spot-check of a few sequence ranges in the dump shows that they had already been retransmitted by the time they were delivered. When such late segments were not dropped by conntrack, the receiver replied to them with what’s known as a duplicate ACK, a form of selective ACK that says “I have received that segment, and in fact I’ve received many segments since then.”

Our best guess here is that some network intermediary has added a slight delay to some packets. But since the RTT on this connection is so short, that delay is relatively huge and puts the delayed packets outside of the window where conntrack is willing to accept them. That helps explain why other downloads, from hosts outside of the Google infrastructure, do not see this issue: either they do not traverse the intermediary delaying these packets, or the RTT is long enough that a few ms is not enough to result in packets being marked INVALID.

Resolution

After we posted these results in the issue, our users realized these symptoms looked a lot like a Moby libnetwork bug. We adopted a workaround mentioned there where we use conntrack to drop invalid packets in iptables rather than trigger RSTs

iptables -I INPUT -m conntrack --ctstate INVALID -j DROP

The drawbacks of that approach listed in the bug are acceptable for our uses. After baking a new machine images we tried to reproduce the issue at scale as we had done during the debugging of this issue and were not able to. We updated all of our worker pools to use this image the next day and it seems like we’re now in the clear.

Security Implications

As we uncovered this behavior, there was some concern among the team that this represented a security issue. When conntrack marks a packet as INVALID and it is handled on the host, it’s possible that the same port on the host is in use, and the packet could be treated as part of that connection. However, TCP identifies connections with a “four-tuple” of source IP and port + destination IP and port. But the tuples cannot match, or the remote end would have been unable to distinguish the connection “through” the NAT from the connection terminating on the host. So there is no issue of confusion between connections here.

However, there is the possibility of a denial of service. If an attacker can guess the four-tuple for an existing connection and forge an INVALID packet matching it, the resulting RST would destroy the connection. This is probably only an issue if the attacker is on the same network as the docker host, as otherwise reverse-path filtering would discard such a forged packet.

At any rate, this issue appears to be fixed in more recent distributions.

Thanks

@hexcles, @djmitche, @imbstack, @stephenmcgruer

Categorieën: Mozilla-nl planet

The Mozilla Blog: More on COVID Surveillance: Mobile Phone Location

wo, 06/05/2020 - 20:01

Previously I wrote about the use of mobile apps for COVID contact tracing. This idea has gotten a lot of attention in the tech press — probably because there are some quite interesting privacy issues — but there is another approach to monitoring people’s locations using their devices that has already been used in Taiwan and Israel, namely mobile phone location data. While this isn’t something that people think about a lot, your mobile phone has to be in constant contact with the mobile system and the system can use that information to determine your location. Mobile phones already use network-based location to provide emergency location services and for what’s called assisted GPS, in which mobile-tower based location is used along with satellite-based GPS, but it can, of course, be used for services the user might be less excited about, such as real-time surveillance of their location. In addition to measurements taken from the tower, a number of mobile services share location history with service providers, for instance to provide directions in mapping applications or as part of your Google account.

If what you are trying to do is as much COVID surveillance as cheaply as possible, this kind of data has several big advantages over mobile phone apps. First, it’s already being collected, so you don’t need to get anyone to install an app. Second, it’s extremely detailed because it has everyone’s location and not just who they have been in contact with. The primary disadvantage of mobile phone location data is accuracy; in some absolute sense, assisted GPS is amazingly accurate, especially to those old enough to remember when handheld GPS was barely a thing, but generally we’re talking about accuracies to the scale of meters to tens of meters, which is not good enough to tell whether you have been in close contact with someone. This is still useful enough for many applications and we’re seeing this kind of data used for a number of anti-COVID purposes such as detecting people crowding in a given location, determining when people have broken quarantine and measuring bulk movements.

But of course, all of this is only possible because everyone is already carrying around a tracking device in their pocket all the time and they don’t even think about it. These systems just routinely log information about your location whether you downloaded some app or not, and it’s just a limitation of the current technology that that information isn’t precise down to the meter (and this kind of positioning technology has gotten better over time because precise localization of mobile devices is key to getting good performance). By contrast, nearly all of the designs for mobile contact tracing explicitly prioritize privacy. Even the centralized designs like BlueTrace that have the weakest privacy properties still go out of their way to avoid leaking information, mostly by not collecting it. So, for instance, if you test positive BlueTrace tells the government who you have been in contact with, if you aren’t exposed to Coronavirus the government doesn’t learn much about you1.

The important distinction to draw here is between policy controls to protect privacy and technical controls to protect privacy. Although the mobile network gets to collect a huge amount of data on you, this data is to some extent protected by policy: laws, regulations, and corporate commitments constraining how that data can be used2 and you have to trust that those policies will be followed. By contrast, the privacy protections in the various COVID-19 contact tracing apps are largely technical: they don’t rely on trusting the health authority to behave properly because the health authority doesn’t have the information in its hands in the first place. Another way to think about this is that technical controls are “rigid” in that they don’t depend on human discretion: this is obviously an advantage for users who don’t want to have to trust government, big tech companies, etc. but it’s also a disadvantage in that it makes it difficult to respond to new circumstances. For instance, Google was able to quickly take mobility measurements using stored location history because people were already sharing that with them, but the new Apple/Google contact tracing will require people to download new software and maybe opt-in, which can be slow and result in low uptake.

The point here isn’t to argue that one type of control is necessarily better or worse than another. In fact, it’s quite common to have systems which depend on a mix of these3. However, when you are trying to evaluate the privacy and security properties of a system, you need to keep this distinction firmly in mind: every policy control depends on someone or a set of someones behaving correctly, and therefore either requires that you trust them to do so or have some mechanism for ensuring that they in fact are.

  1. Except that whenever you contact the government servers for new TempIDs it learns something about your current location. 
  2. For instance, the United States Supreme Court recently ruled that the government requires a warrant to get mobile phone location records. 
  3. For instance, the Web certificate system, which but relies extensively on procedural but is increasingly backed up by technical safeguards such as Certificate Transparency

The post More on COVID Surveillance: Mobile Phone Location appeared first on The Mozilla Blog.

Categorieën: Mozilla-nl planet

The Mozilla Blog: Mozilla announces the first three COVID-19 Solutions Fund Recipients

wo, 06/05/2020 - 15:59

In less than two weeks, Mozilla received more than 160 applications from 30 countries for its COVID-19 Solutions Fund Awards. Today, the Mozilla Open Source Support Program (MOSS) is excited to announce its first three recipients. This Fund was established at the end of March, to offer up to $50,000 each to open source technology projects responding to the COVID-19 pandemic.

VentMon, created by Public Invention in Austin, Texas, improves testing of open-source emergency ventilator designs that are attempting to address the current and expected shortage of ventilators.

The same machine and software will also provide monitoring and alarms for critical care specialists using life-critical ventilators. It is a simple inline device plugged into the airway of an emergency ventilator, that measures flow and pressure (and thereby volume), making sure the ventilator is performing to specification, such as the UK RVMS spec. If a ventilator fails, VentMon raises an audio and internet alarm. It can be used for testing before deployment, as well as ICU patient monitoring. The makers received a $20,000 award which enables them to buy parts for the Ventmon to support more than 20 open source engineering teams trying to build ventilators.

Based in the Bay Area, Recidiviz is a tech non-profit that’s built a modeling tool that helps prison administrators and government officials forecast the impact of COVID-19 on their prisons and jails. This data enables them to better assess changes they can make to slow the spread, like reducing density in prison populations or granting early release to people who are deemed to pose low risk to public safety.

It is impossible to physically distance in most prison settings, and so incarcerated populations are at dangerous risk of COVID-19 infection. Recidiviz’s tool was downloaded by 47 states within 48hrs of launch. The MOSS Committee approved a $50,000 award.

“We want to make it easier for data to inform everything that criminal justice decision-makers do,” said Clementine Jacoby, CEO and Co-Founder of Recidiviz. “The pandemic made this mission even more critical and this funding will help us bring our COVID-19 model online. Already more than thirty states have used the tool to understand where the next outbreak may happen or how their decisions can flatten the curve and reduce impact on community hospital beds, incarcerated populations, and staff.”

COVID-19 Supplies NYC is a project created by 3DBrooklyn, producing around 2,000 face shields a week, which are urgently needed in the city. They will use their award to make and distribute more face shields, using 3D printing technology and an open source design. They also maintain a database that allows them to collect requests from institutions that need face shields as well as offers from people with 3D printers to produce parts for the face shields. The Committee approved a $20,000 award.

“Mozilla has long believed in the power of open source technology to better the internet and the world,” said Jochai Ben-Avie, Head of International Public Policy and Administrator of the Program. “It’s been inspiring to see so many open source developers step up and collaborate on solutions to increase the capacity of healthcare systems to cope with this crisis.”

In the coming weeks Mozilla will announce the remaining winning applicants. The application form has been closed for now, owing to the high number of submissions already being reviewed.

The post Mozilla announces the first three COVID-19 Solutions Fund Recipients appeared first on The Mozilla Blog.

Categorieën: Mozilla-nl planet

Hacks.Mozilla.Org: Firefox 76: Audio worklets and other tricks

ti, 05/05/2020 - 16:30

Note: This post is also available in: 简体中文 (Chinese (Simplified)), 繁體中文 (Chinese (Traditional)), Español (Spanish).

Hello folks, hope you are all doing well and staying safe.

A new version of your favourite browser is always worth looking forward to, and here we are with Firefox 76! Web platform support sees some great new additions in this release, such as Audio Worklets and Intl improvements, on the JavaScript side. Also, we’ve added a number of nice improvements into Firefox DevTools to make development easier and quicker.

As always, read on for the highlights, or find the full list of additions in the following articles:

Developer tools additions

There are interesting DevTools updates in this release throughout every panel. And upcoming features can be previewed now in Firefox Dev Edition.

More JavaScript productivity tricks

Firefox JavaScript debugging just got even better.

Ignore entire folders in Debugger

Oftentimes, debugging efforts only focus on specific files that are likely to contain the culprit. With “blackboxing” you can tell the Debugger to ignore the files you don’t need to debug.

Now it’s easier to do this for folders as well, thanks to Stepan Stava‘s new context menu in the Debugger’s sources pane. You can limit “ignoring” to files inside or outside of the selected folder. Combine this with “Set directory root” for a laser-focused debugging experience.

Animation showing how we've combined ignoring files in folders and with directory root for focused debugging.

Collapsed output for larger console snippets

The Console‘s multi-line editor mode is great for iterating on longer code snippets. Early feedback showed that users didn’t want the code repeated in the Console output, to avoid clutter. Thanks to thelehhman‘s contribution, code snippets with multiple lines are neatly collapsed and can be expanded on demand.

Animation showing how to iterate on long script expressions with Console's multi-line input mode.

Copy full URLs in call stack

Copying stacks in the Debugger makes it possible to share snapshots during stepping. This helps you file better bugs, and facilitates handover to your colleagues. In order to provide collaborators the full context of a bug, the call stack pane‘s “Copy stack trace” menu now copies full URLs, not just filenames.

screenshot of 'copy stack trace' in action in the Debugger

Always offer “Expand All” in Firefox’s JSON preview

Built-in previews for JSON files make it easy to search through responses and explore API endpoints. This also works well for large files, where data can be expanded as needed. Thanks to a contribution from zacnomore, the “Expand All” option is now always visible.

More network inspection tricks

Firefox 76 provides even easier access to network information via the Network Monitor.

Action Cable support in WebSocket inspection

WebSocket libraries use a variety of formats to encode their messages. We want to make sure that their payloads are properly parsed and formatted, so you can read them. Over the past releases, we added support for Socket.IO, SignalR, and WAMP WebSocket message inspection. Thanks to contributor Uday Mewada, Action Cable messages are now nicely formatted too.

action cable websocket message formatting in devtools

Hiding WebSocket Control Frames

WebSocket control frames are used by servers and browsers to manage real-time connections but don’t contain any data. Contributor kishlaya.j jumped in to hide control frames by default, cutting out a little more noise from your debugging. In case you need to see them, they can be enabled in the sent/received dropdown.

Resize Network table columns to fit content

Network request and response data can be overwhelming as you move from scanning real-time updates to focus on specific data points. Customizing the visible Network panel columns lets you adapt the output to the problem at hand. In the past, this required a lot of dragging and resizing. Thanks to Farooq AR, you can now double-click the table’s resize handles to scale a column’s width to fit its content, as in modern data tables.

Animation showing how to double-click column headers for quickly fitting column sized to their content

Better Network response details and copying

We’ve received feedback that it should be easier to copy parts of the network data for further analysis.

Now the “Response” section of Network details has been modernized to make inspection and copying easier, by rendering faster and being more reliable. We’ll be adding more ease of use improvements to Network analysis in the near future, thanks to your input.

Community contributions Fresh in Dev Edition: CSS Compatibility Panel

Developer Edition is Firefox’s pre-release channel, which offers early access to tooling and platform features. Its settings enable more functionality for developers by default. We like to bring new features quickly to Developer Edition to gather your feedback, including the following highlights.

Foremost, in the release of Dev Edition 77 we are seeking input for our new compatibility panel. This panel will inform you about any CSS properties that might not be supported in other browsers, and will be accessible from the Inspector.

Compatibility panel summarizing 2 issues for the current element

Please try it out and use the built-in “Feedback” link to report how well it works for you and how we can further improve it.

Web platform updates

Let’s explore what Firefox 76 brings to the table in terms of web platform updates.

Audio worklets

Audio worklets offer a useful way of running custom JavaScript audio processing code. The difference between audio worklets and their predecessor — ScriptProcessorNodes — worklets run off the main thread in a similar way to web workers, solving the performance problems encountered previously.

The basic idea is this: You define a custom AudioWorkletProcessor, which will handle the processing. Next, register it.

// white-noise-processor.js class WhiteNoiseProcessor extends AudioWorkletProcessor { process (inputs, outputs, parameters) { const output = outputs[0] output.forEach(channel => { for (let i = 0; i < channel.length; i++) { channel[i] = Math.random() * 2 - 1 } }) return true } } registerProcessor('white-noise-processor', WhiteNoiseProcessor)

Over in your main script, you then load the processor, create an instance of AudioWorkletNode, and pass it the name of the processor. Finally, you connect the node to an audio graph.

async function createAudioProcessor() { const audioContext = new AudioContext() await audioContext.audioWorklet.addModule('white-noise-processor.js') const whiteNoiseNode = new AudioWorkletNode(audioContext, 'white-noise-processor') whiteNoiseNode.connect(audioContext.destination) }

Read our Background audio processing using AudioWorklet guide for more information.

Other updates

Aside from worklets, we’ve added some other web platform features.

HTML <input>s

The HTML <input> element’s min and max attributes now work correctly when the value of min is greater than the value of max, for control types whose values are periodic. (Periodic values repeat in regular intervals, wrapping around from the end back to the start again.) This is particularly helpful with date and time inputs for example, where you might want to specify a time range of 11 PM to 2 AM.

Intl improvements

The numberingSystem and calendar options of the Intl.NumberFormat, Intl.DateTimeFormat, and Intl.RelativeTimeFormat constructors are now enabled by default.

Try these examples:

const number = 123456.789; console.log(new Intl.NumberFormat('en-US', { numberingSystem: 'latn' }).format(number)); console.log(new Intl.NumberFormat('en-US', { numberingSystem: 'arab' }).format(number)); console.log(new Intl.NumberFormat('en-US', { numberingSystem: 'thai' }).format(number)); var date = Date.now(); console.log(new Intl.DateTimeFormat('th', { calendar: 'buddhist' }).format(date)); console.log(new Intl.DateTimeFormat('th', { calendar: 'gregory' }).format(date)); console.log(new Intl.DateTimeFormat('th', { calendar: 'chinese' }).format(date)); Intersection observer

The IntersectionObserver() constructor now accepts both Document and Element objects as its root. In this context, the root is the area whose bounding box is considered the viewport for the purposes of observation.

Browser extensions

The Firefox Profiler is a tool to help analyze and improve the performance of your site in Firefox. Now it will show markers when network requests are suspended by extensions’ blocking webRequest handlers. This is especially useful to developers of content blocker extensions, enabling them to ensure that Firefox remains at top speed.

Here’s a screenshot of the Firefox profiler in action:

Firefox profiler extension UI

Summary

And that’s it for the newest  edition of Firefox — we hope you enjoy the new features! As always, feel free to give feedback and ask questions in the comments.

The post Firefox 76: Audio worklets and other tricks appeared first on Mozilla Hacks - the Web developer blog.

Categorieën: Mozilla-nl planet

The Firefox Frontier: More reasons you can trust Firefox with your passwords

ti, 05/05/2020 - 15:00

There’s no doubt that during the last couple of weeks you’ve been signing up for new online services like streaming movies and shows, ordering takeout or getting produce delivered to … Read more

The post More reasons you can trust Firefox with your passwords appeared first on The Firefox Frontier.

Categorieën: Mozilla-nl planet

Pages