mozilla

Mozilla Nederland LogoDe Nederlandse
Mozilla-gemeenschap

Subscribe to Mozilla planet feed
Planet Mozilla - http://planet.mozilla.org/
Updated: 1 wike 3 oeren ferlyn

The Mozilla Blog: Firefox and Emerging Markets Leadership

to, 25/04/2019 - 22:05

Building on the success of Firefox Quantum, we have a renewed focus on better enabling people to take control of their internet-connected lives as their trusted personal agent — through continued evolution of the browser and web platform — and with new products and services that provide enhanced security, privacy and user agency across connected life.

To accelerate this work, we’re announcing some changes to our senior leadership team:

Dave Camp has been appointed SVP Firefox. In this new role, Dave will be responsible for overall Firefox product and web platform development.

As a long time Mozillian, Dave joined Mozilla in 2006 to work on Gecko, building networking and security features and was a contributor to the release of Firefox 3. After a short stint at a startup he rejoined Mozilla in 2011 as part of the Firefox Developer Tools team. Dave has since served in a variety of senior leadership roles within the Firefox product organization, most recently leading the Firefox engineering team through the launch of Firefox Quantum.

Under Dave’s leadership the new Firefox organization will pull together all product management, engineering, technology and operations in support of our Firefox products, services and web platform. As part of this change, we are also announcing the promotion of Marissa (Reese) Wood to VP Firefox Product Management, and Joe Hildebrand to VP Firefox Engineering. Both Joe and Reese have been key drivers of the continued development of our core browser across platforms, and the expansion of the Firefox portfolio of products and services globally.

In addition, we are increasing our investment and focus in emerging markets, building on the early success of products like Firefox Lite which we launched in India earlier this year, we are also formally establishing an emerging markets team based in Taipei:

Stan Leong appointed as VP and General Manager, Emerging Markets. In this new role, Stan will be responsible for our product development and go-to-market strategy for the region. Stan joins us from DCX Technology where he was Global Head of Emerging Product Engineering. He has a great combination of start-up and large company experience having spent years at Hewlett Packard, and he has worked extensively in the Asian markets.

As part of this, Mark Mayo, who has served as our Chief Product Officer (CPO), will move into a new role focused on strategic product development initiatives with an initial emphasis on accelerating our emerging markets strategy. We will be conducting an executive search for a CPO to lead the ongoing development and evolution of our global product portfolio.

I’m confident that with these changes, we are well positioned to continue the evolution of the browser and web platform and introduce new products and services that provide enhanced security, privacy and user agency across connected life.

The post Firefox and Emerging Markets Leadership appeared first on The Mozilla Blog.

Categorieën: Mozilla-nl planet

Nathan Froyd: an unexpected benefit of standardizing on clang-cl

to, 25/04/2019 - 18:45

I wrote several months ago about our impending decision to switch to clang-cl on Windows.  In the intervening months, we did that, and we also dropped MSVC as a supported compiler.  (We still build on Linux with GCC, and will probably continue to do that for some time.)  One (extremely welcome) consequence of the switch to clang-cl has only become clear to me in the past couple of weeks: using assembly language across platforms is no longer painful.

First, a little bit of background: GCC (and Clang) support a feature called inline assembly, which enables you to write little snippets of assembly code directly in your C/C++ program.  The syntax is baroque, it’s incredibly easy to shoot yourself in the foot with it, and it’s incredibly useful for a variety of low-level things.  MSVC supports inline assembly as well, but only on x86, and with a completely different syntax than GCC.

OK, so maybe you want to put your code in a separate assembly file instead.  The complementary assembler for GCC (courtesy of binutils) is called gas, with its own specific syntax for various low-level details.  If you give gcc an assembly file, it knows to pass it directly to gas, and will even run the C preprocessor on the assembly before invoking gas if you request that.  So you only ever need to invoke gcc to compile everything, and the right thing will just happen. MSVC, by contrast, requires you to invoke a separate, differently-named assembler for each architecture, with different assembly language syntaxes (e.g. directives for the x86-64 assembler are quite different than the arm64 assembler), and preprocessing files beforehand requires you to jump through hoops.  (To be fair, a number of these details are handled for you if you’re building from inside Visual Studio; the differences are only annoying to handle in cross-platform build systems.)

In short, dealing with assembler in a world where you have to support MSVC is somewhat painful.  You have to copy-and-paste code, or maybe you write Perl scripts to translate from the gas syntax to whatever flavor of syntax the Microsoft assembler you’re using is.  Your build system needs to handle Windows and non-Windows differently for assembly files, and may even need to handle different architectures for Windows differently.  Things like our ICU data generation have been made somewhat more complex than necessary to support Windows platforms.

Enter clang-cl.  Since clang-cl is just clang under the hood, it handles being passed assembly files on the command line in the same way and will even preprocess them for you.  Additionally, clang-cl contains a gas-compatible assembly syntax parser, so assembly files that you pass on the command line are parsed by clang-cl and therefore you can now write a single assembly syntax that works on Unix-y and Windows platforms.  (You do, of course, have to handle differing platform calling conventions and the like, but that’s simplified by having a preprocessor available all the time.)  Finally, clang-cl supports GCC-style inline assembly, so you don’t even have to drop into separate assembly files if you don’t want to.

In short, clang-cl solves every problem that made assembly usage painful on Windows. Might we have a future world where open source projects that have to deal with any amount of assembly standardize on clang-cl for their Windows support, and declare MSVC unsupported?

Categorieën: Mozilla-nl planet

Henri Sivonen: It’s Time to Stop Adding New Features for Non-Unicode Execution Encodings in C++

to, 25/04/2019 - 15:30

Henri Sivonen, 2019-04-24

Disclosure: I work for Mozilla, and my professional activity includes being the Gecko module owner for character encodings.

Disclaimer: Even though this document links to code and documents written as part of my Mozilla actitivities, this document is written in personal capacity.

Summary

Text processing facilities in the C++ standard library have been mostly agnostic of the actual character encoding of text. The few operations that are sensitive to the actual character encoding are defined to behave according to the implementation-defined “narrow execution encoding” (for buffers of char) and the implementation-defined “wide execution encoding” (for buffers of wchar_t).

Meanwhile, over the last two decades, a different dominant design has arisen for text processing in other programming languages as well as in C and C++ usage despite what the C and C++ standard-library facilities provide: Representing text as Unicode, and only Unicode, internally in the application even if some other representation is required externally for backward compatibility.

I think the C++ standard should adopt the approach of “Unicode-only internally” for new text processing facilities and should not support non-Unicode execution encodings in newly-introduced features. This allows new features to have less abstraction obfuscation for Unicode usage, avoids digging legacy applications deeper into non-Unicode commitment, and avoids the specification and implementation effort of adapting new features to make sense for non-Unicode execution encodings.

Concretely, I suggest:

  • In new features, do not support numbers other than Unicode scalar values as a numbering scheme for abstract characters, and design new APIs to be aware of Unicode scalar values as appropriate instead of allowing other numbering schemes. (I.e. make Unicode the only coded character set supported for new features.)
  • Use char32_t directly as the concrete type for an individual Unicode scalar value without allowing for parametrization of the type that conceptually represents a Unicode scalar value. (For sequences of Unicode scalar values, UTF-8 is preferred.)
  • When introducing new text processing facilities (other than the next item on this list), support only UTF in-memory text representations: UTF-8 and, potentially, depending on feature, also UTF-16 or also UTF-16 and UTF-32. That is, do not seek to make new text processing features applicable to non-UTF execution encodings. (This document should not be taken as a request to add features for UTF-16 or UTF-32 beyond iteration over string views by scalar value. To avoid distraction from the main point, this document should also not be taken as advocating against providing any particular feature for UTF-16 or UTF-32.)
  • Non-UTF character encodings may be supported in a conversion API whose purpose is to convert from a legacy encoding into a UTF-only representation near the IO boundary or at the boundary between a legacy part (that relies on execution encoding) and a new part (that uses Unicode) of an application. Such APIs should be std::span-based instead of iterator-based.
  • When an operation logically requires a valid sequence of Unicode scalar values, the API must either define the operation to fail upon encountering invalid UTF-8/16/32 or must replace each error with a U+FFFD REPLACEMENT CHARACTER as follows: What constitutes a single error in UTF-8 is defined in the WHATWG Encoding Standard (which matches the “best practice” from the Unicode Standard). In UTF-16, each unpaired surrogate is an error. In UTF-32, each code unit whose numeric value isn’t a valid Unicode scalar value is an error.
  • Instead of standardizing Text_view as proposed, standardize a way to obtain a Unicode scalar value iterator from std::u8string_view, std::u16string_view, and std::u32string_view.
Context

This write-up is in response to (and in disagreement with) the “Character Types” section in the P0244R2 Text_view paper:

This library defines a character class template parameterized by character set type used to represent character values. The purpose of this class template is to make explicit the association of a code point value and a character set.

It has been suggested that char32_t be supported as a character type that is implicitly associated with the Unicode character set and that values of this type always be interpreted as Unicode code point values. This suggestion is intended to enable UTF-32 string literals to be directly usable as sequences of character values (in addition to being sequences of code unit and code point values). This has a cost in that it prohibits use of the char32_t type as a code unit or code point type for other encodings. Non-Unicode encodings, including the encodings used for ordinary and wide string literals, would still require a distinct character type (such as a specialization of the character class template) so that the correct character set can be inferred from objects of the character type.

This suggestion raises concerns for the author. To a certain degree, it can be accommodated by removing the current members of the character class template in favor of free functions and type trait templates. However, it results in ambiguities when enumerating the elements of a UTF-32 string literal; are the elements code point or character values? Well, the answer would be both (and code unit values as well). This raises the potential for inadvertently writing (generic) code that confuses code points and characters, runs as expected for UTF-32 encodings, but fails to compile for other encodings. The author would prefer to enforce correct code via the type system and is unaware of any particular benefits that the ability to treat UTF-32 string literals as sequences of character type would bring.

It has also been suggested that char32_t might suffice as the only character type; that decoding of any encoded string include implicit transcoding to Unicode code points. The author believes that this suggestion is not feasible for several reasons:

  1. Some encodings use character sets that define characters such that round trip transcoding to Unicode and back fails to preserve the original code point value. For example, Shift-JIS (Microsoft code page 932) defines duplicate code points for the same character for compatibility with IBM and NEC character set extensions.
    https://support.microsoft.com/en-us/kb/170559 [sic; dead link]
  2. Transcoding to Unicode for all non-Unicode encodings would carry non-negligible performance costs and would pessimize platforms such as IBM’s z/OS that use EBCIDC by default for the non-Unicode execution character sets.

To summarize, it raises three concerns:

  1. Ambiguity between code units and scalar values (the paper says “code points”, but I say “scalar values” to emphasize the exclusion of surrogates) in the UTF-32 case.
  2. Some encodings, particularly Microsoft code page 932, can represent one Unicode scalar value in more than one way, so the distinction of which way does not round-trip.
  3. Transcoding non-Unicode execution encodings has a performance cost that pessimizes particularly IBM z/OS.
Terminology and Background

(This section and the next section should not be taken as ’splaining to SG16 what they already know. The over-explaining is meant to make this document more coherent for a broader audience of readers who might be interested in C++ standardization without full familiarity with text processing terminology or background, or the details of Microsoft code page 932.)

An abstract character is an atomic unit of text. Depending on writing system, the analysis of what constitutes an atomic unit may differ, but a given implementation on a computer has to identify some things as atomic units. Unicode’s opinion of what is an abstract character is the most widely applied opinion. In fact, Unicode itself has multiple opinions on this, and Unicode Normalization Forms bridge these multiple opinions.

A character set is a set of abstract characters. In principle, a set of characters can be defined without assigning numbers to them.

A coded character set assigns numbers, code points, to the items in the character set to each abstract character.

When the Unicode code space was extended beyond the Basic Multilingual Plane, some code points were set aside for the UTF-16 surrogate mechanism and, therefore, do not represent abstract characters. A Unicode scalar value is a Unicode code point that is not a surrogate code point. For consistency with Unicode, I use the term scalar value below when referring to non-Unicode coded character sets, too.

A character encoding is a way to represent a conceptual sequence of scalar values from one or more coded character sets as a concrete sequence of bytes. The bytes are called code units. Unicode defines in-memory Unicode encoding forms whose code unit is not a byte: UTF-16 and UTF-32. (For these Unicode encoding forms, there are corresponding Unicode encoding schemes that use byte code units and represent a non-byte code unit from a correspoding encoding form as multiple bytes and, therefore, could be used in byte-oriented IO even though UTF-8 is preferred for interchange. UTF-8, of course, uses byte code units as both a Unicode encoding form and as a Unicode encoding scheme.)

Coded character sets that assign scalar values in the range 0...255 (decimal) can be considered to trivially imply a character encoding for themselves: You just store the scalar value as an unsigned byte value. (Often such coded character sets import US-ASCII as the lower half.)

However, it is possible to define less obvious encodings even for character sets that only have up to 256 characters. IBM has several EBCDIC character encodings for the set of characters defined in ISO-8859-1. That is, compared to the trivial ISO-8859-1 encoding (the original, not the Web alias for windows-1252), these EBCDIC encodings permute the byte value assignments.

Unicode is the universal coded character set that by design includes abstract characters from all notable legacy coded character sets such that character encodings for legacy coded character sets can be redefined to represent Unicode scalar values. Consider representing ż in the ISO-8859-2 encoding. When we treat the ISO-8859-2 encoding as an encoding for the Unicode coded character set (as opposed treating it as an encoding for the ISO-8859-2 coded character set), byte 0xBF decodes to Unicode scalar value U+017C (and not as scalar value 0xBF).

A compatibility character is a character that according to Unicode principles should not be a distinct abstract character but that Unicode nonetheless codes as a distinct abstract character because some legacy coded character set treated it as distinct.

The Microsoft Code Page 932 Issue

Usually in C++ a “character type” refers to a code unit type, but the Text_view paper uses the term “character type” to refer to a Unicode scalar value when the encoding is a Unicode encoding form. The paper implies that an analogous non-Unicode type exists for Microsoft code page 932 (Microsoft’s version of Shift_JIS), but does one really exist?

Microsoft code page 932 takes the 8-bit encoding of JIS X 0201 coded character set, whose upper half is half-width katakana and lower half is ASCII-based, and replaces the lower half with actual US-ASCII (moving the difference between US-ASCII and the lower half of 8-bit-encoded JIS X 0201 into a font problem!). It then takes the JIS X 0208 coded character set and represents it with two-byte sequences (for the lead byte making use of the unassigned range of JIS X 0201). JIS X 0208 code points aren’t really one-dimensional scalars, but instead two-dimensional row and column numbers in a 94 by 94 grid. (See the first 94 rows of the visualization supplied with the Encoding Standard; avoid opening the link on RAM-limited device!) Shift_JIS / Microsoft code page 932 does not put these two numbers into bytes directly, but conceptually arranges each two rows of 94 columns into one row of a 188 columns and then transforms these new row and column numbers into bytes with some offsetting.

While the JIS X 0208 grid is rearranged into 47 rows of a 188-column grid, the full 188-column grid has 60 rows. The last 13 rows are used for IBM extensions and for private use. The private use area maps to the (start of the) Unicode Private Use Area. (See a visualization of the rearranged grid with the private use part showing up as unassigned; again avoid opening the link on a RAM-limited device.)

The extension part is where the concern that the Text_view paper seeks to address comes in. NEC and IBM came up with some characters that they felt JIS X 0208 needed to be extended with. NEC’s own extensions go onto row 13 (in one-based numbering) of the 94 by 94 JIS X 0208 grid (unallocated in JIS X 0208 proper), so that extension can safely be treated as if it had always been part of JIS X 0208 itself. The IBM extension, however, goes onto the last 3 rows of the 60-row Shift_JIS grid, i.e. outside the space that the JIS X 0208 94 by 94 grid maps to. However, US-ASCII, the half-width katakana part of JIS X 0201, and JIS X 0208 are also encoded, in a different way, by EUC-JP. EUC-JP can only encode the 94 by 94 grid of JIS X 0208. To make the IBM extensions fit into the 94 by 94 grid, NEC relocated the IBM extensions within the 94 by 94 grid in space that the JIS X 0208 standard left unallocated.

When considering IBM Shift_JIS and NEC EUC-JP (without later JIS X 0213 extension), both encode the same set of characters, but in a different way. Furthermore, both can round-trip via Unicode. Unicode principles analyze some of the IBM extension kanji as duplicates of kanji that were already in the original JIS X 0208. However, to enable round-tripping (which was thought worthwhile to achieve at the time), Unicode treats the IBM duplicates as compatibility characters. (Round-tripping is lost, of course, if the text decoded into Unicode is normalized such that compatibility characters are replaced with their canonical equivalents before re-encoding.)

This brings us to the issue that the Text_view paper treats as significant: Since Shift_JIS can represent the whole 94 by 94 JIS X 0208 grid and NEC put the IBM extension there, a naïve conversion from EUC-JP to Shift_JIS can fail to relocate the IBM extension characters to the end of the Shift_JIS code space and can put them in the position where they land if the 94 by 94 grid is simply transformed as the first 47 rows of the 188-column-wide Shift_JIS grid. When decoding to Unicode, Microsoft code page 932 supports both locations for the IBM extensions, but when encoding from Unicode, it has to pick one way of doing things, and it picks the end of the Shift_JIS code space.

That is, Unicode does not assign another set of compatibility characters to Microsoft code page 932’s duplication of the IBM extensions, so despite NEC EUC-JP and IBM Shift_JIS being round-trippable via Unicode, Microsoft code page 932, i.e. Microsoft Shift_JIS, is not. This makes sense considering that there is no analysis that claims the IBM and NEC instances of the IBM extensions as semantically different: They clearly have provenance that indicates that the duplication isn’t an attempt to make a distinction in meaning. The Text_view paper takes the position that C++ should round-trip the NEC instance of the IBM extensions in Microsoft code page 932 as distinct from the IBM instance of the IBM extensions even though Microsoft’s own implementation does not. In fact, the whole point of the Text_view paper mentioning Microsoft code page 932 is to give an example of a legacy encoding that doesn’t round-trip via Unicode, despite Unicode generally having been designed to round-trip legacy encodings, and to opine that it ought to round-trip in C++.

So:

  • The Text_view paper wants there to exist a non-transcoding-based, non-Unicode analog for what for UTF-8 would be a Unicode scalar value but for Microsoft code page 932 instead.
  • The standards that Microsoft code page 932 has been built on do not give us such a scalar.
    • Even if the private use space and the extensions are considered to occupy a consistent grid with the JIS X 0208 characters, the US-ASCII plus JIS X 0201 part is not placed on the same grid.
    • The canonical way of referring to JIS X 0208 independently of bytes isn’t a reference by one-dimensional scalar but a reference by two (one-based) numbers identifying a cell on the 94 by 94 grid.
  • The Text_view paper wants the scalar to be defined such that a distinction between the IBM instance of the IBM extensions and the NEC instance of the IBM extensions is maintained even though Microsoft, the originator of the code page, does not treat these two instances as meaningfully distinct.
Inferring a Coded Character Set from an Encoding

(This section is based on the constraints imposed by Text_view paper instead of being based on what the reference implementation does for Microsoft code page 932. From code inspection, it appears that support for multi-byte narrow execution encodings is unimplemented, and when trying to verify this experimentally, I timed out trying to get it running due to an internal compiler error when trying to build with a newer GCC and a GCC compilation error when trying to build the known-good GCC revision.)

While the standards don’t provide a scalar value definition for Microsoft code page 932, it’s easy to make one up based on tradition: Traditionally, the two-byte characters in CJK legacy encodings have been referred to by interpreting the two bytes as 16-bit big-endian unsigned number presented as hexadecimal (and single-byte characters as a 8-bit unsigned number).

As an example, let’s consider 猪 (which Wiktionary translates as wild boar). Its canonical Unicode scalar value is U+732A. That’s what the JIS X 0208 instance decodes to when decoding Microsoft code page 932 into Unicode. The compatibility character for the IBM kanji purpose is U+FA16. That’s what both the IBM instance of the IBM extension and the NEC instance of the IBM extension decode to when decoding Microsoft code page 932 into Unicode. (For reasons unknown to me, Unicode couples U+FA16 with the IBM kanji compatibility purpose and assigns another compatibility character, U+FAA0, for compatibility with North Korean KPS 10721-2000 standard, which is irrelevant to Microsoft code page 932. Note that not all IBM kanji have corresponding DPRK compatibility characters, so we couldn’t repurpose the DPRK compatibility characters for distinguishing the IBM and NEC instances of the IBM extensions even if we wanted to.)

When interpreting the Microsoft code page 932 bytes as a big-endian integer, the JIS X 0208 instance of 猪 would be 0x9296, the IBM instance would be 0xFB5E, and the NEC instance would be 0xEE42. To highlight how these “scalars” are coupled with the encoding instead of the standard character sets that the encodings originally encode, in EUC-JP the JIS X 0208 instance would be 0xC3F6 and the NEC instance would be 0xFBA3. Also, for illustration, if the same rule was applied to UTF-8, the scalar would be 0xE78CAA instead of U+732A. Clearly, we don’t want the scalars to be different between UTF-8, UTF-16, and UTF-32, so it is at least theoretically unsatisfactory for Microsoft code page 932 and EUC-JP to get different scalars for what are clearly the same characters in the underlying character sets.

It would be possible to do something else that’d give the same scalar values for Shift_JIS and EUC-JP without a lookup table. We could number the characters on the two-dimensional grid starting with 256 for the top left cell to reserve the scalars 0…255 for the JIS X 0201 part. It’s worth noting, though, that this approach wouldn’t work well for Korean and Simplified Chinese encodings that take inspiration from the 94 by 94 structure of JIS X 0208. KS X 1001 and GB2312 also define a 94 by 94 grid like JIS X 0208. However, while Microsoft code page 932 extends the grid down, so a consecutive numbering would just add greater numbers to the end, Microsoft code pages 949 and 936 extend the KS X 1001 and GB2312 grids above and to the left, which means that a consecutive numbering of the extended grid would be totally different from the consecutive numbering of the unextended grid. On the other hand, interpreting each byte pair as a big-endian 16-bit integer would yield the same values in the extended and unextended Korean and Simplified Chinese cases. (See visualizations for 949 and 936; again avoid opening on a RAM-limited device. Search for “U+3000” to locate the top left corner of the original 94 by 94 grid.)

What About EBCDIC?

Text_view wants to avoid transcoding overhead on z/OS, but z/OS has multiple character encodings for the ISO-8859-1 character set. It seems conceptually bogus for all these to have different scalar values for the same character set. However, for all of them to have the same scalar values, a lookup table-based permutation would be needed. If that table permuted to the ISO-8859-1 order, it would be the same as the Unicode order, at which point the scalar values might as well be Unicode scalar values, which Text_view wanted to avoid on z/OS citing performance concerns. (Of course, z/OS also has EBCDIC encodings whose character set is not ISO-8859-1.)

What About GB18030?

The whole point of GB18030 is that it encodes Unicode scalar values in a way that makes the encoding byte-compatible with GBK (Microsoft code page 936) and GB2312. This operation is inherently lookup table-dependent. Inventing a scalar definition for GB18030 that achieved the Text_view goal of avoiding lookup tables would break the design goal of GB18030 that it encodes all Unicode scalar values. (In the Web Platform, due to legacy reasons, all but one scalar value and representing one scalar value twice.)

What’s Wrong with This?

Let’s evaluate the above in the light of P1238R0, the SG16: Unicode Direction paper.

The reason why Text_view tries to fit Unicode-motivated operations onto legacy encodings is that, as noted by “1.1 Constraint: The ordinary and wide execution encodings are implementation defined”, non-UTF execution encodings exist. This is, obviously, true. However, I disagree with the conclusion of making new features apply to these pre-existing execution encodings. I think there is no obligation to adapt new features to make sense for non-UTF execution encodings. It should be sufficient to keep existing legacy code running, i.e. not removing existing features should be sufficient. On the topic of wchar_t the Unicode Direction paper, says “1.4. Constraint: wchar_t is a portability deadend”. I think char with non-UTF-8 execution encoding should also be declared as a deadend whereas the Unicode Direction paper merely notes “1.3. Constraint: There is no portable primary execution encoding”. Making new features work with deadend foundation lures applications deeper into deadends, which is bad.

While inferring scalar values for an encoding by interpreting the encoded bytes for each character as a big-endian integer (thereby effectively inferring a, potentially non-standard, coded character set from an encoding) might be argued to be traditional enough to fit “2.1. Guideline: Avoid excessive inventiveness; look for existing practice”, it is a bad fit for “1.6. Constraint: Implementors cannot afford to rewrite ICU”. If there is concern about implementors not having the bandwidth to implement text processing features from scratch and, therefore, should be prepared to delegate to ICU, it makes no sense make implementations or the C++ standard come up with non-Unicode numberings for abstract characters, since such numberings aren’t supported by ICU and necessarily would require writing new code for anachronistic non-Unicode schemes.

Aside: Maybe analyzing the approach of using byte sequences interpreted as big-endian numbers looks like attacking a straw man and there could be some other non-Unicode numbering instead, such as the consecutive numbering outlined above. Any alternative non-Unicode numbering would still fail “1.6. Constraint: Implementors cannot afford to rewrite ICU” and would also fail “2.1. Guideline: Avoid excessive inventiveness; look for existing practice”.

Furthermore, I think the Text_view paper’s aspiration of distinguishing between the IBM and NEC instances of the IBM extensions in Microsoft code page 932 fails “2.1. Guideline: Avoid excessive inventiveness; look for existing practice”, because it effectively amounts to inventing additional compatibility characters that aren’t recognized as distinct by Unicode or the originator of the code page (Microsoft).

Moreover, iterating over a buffer of text by scalar value is a relatively simple operation when considering the range of operations that make sense to offer for Unicode text but that may not obviously fit non-UTF execution encodings. For example, in the light of “4.2. Directive: Standardize generic interfaces for Unicode algorithms” it would be reasonable and expected to provide operations for performing Unicode Normalization on strings. What does it mean to normalize a string to Unicode Normalization Form D under the ISO-8859-1 execution encoding? What does it mean to apply any Unicode Normalization Form under the windows-1258 execution encoding, which represents Vietnamese in a way that doesn’t match any Unicode Normalization Form? If the answer just is to make these no-ops for non-UTF encodings, would that be the right answer for GB18030? Coming up with answers other than just saying that new text processing operations shouldn’t try to fit non-UTF encodings at all would very quickly violate the guideline to “Avoid excessive inventiveness”.

Looking at other programming languages in the light of “2.1. Guideline: Avoid excessive inventiveness; look for existing practice” provides the way forward. Notable other languages have settled on not supporting coded character sets other than Unicode. That is, only the Unicode way of assigning scalar values to abstract characters is supported. Interoperability with legacy character encodings is achieved by decoding into Unicode upon input and, if non-UTF-8 output is truly required for interoperability, by encoding into legacy encoding upon output. The Unicode Direction paper already acknowledges this dominant design in “4.4. Directive: Improve support for transcoding at program boundaries”. I think C++ should consider the boundary between non-UTF-8 char and non-UTF-16/32 wchar_t on one hand and Unicode (preferably represented as UTF-8) on the other hand as a similar transcoding boundary between legacy code and new code such that new text processing features (other than the encoding conversion feature itself!) are provided on the char8_t/char16_t/char32_t side but not on the non-UTF execution encoding side. That is, while the Text_view paper says “Transcoding to Unicode for all non-Unicode encodings would carry non-negligible performance costs and would pessimize platforms such as IBM’s z/OS that use EBCIDC [sic] by default for the non-Unicode execution character sets.”, I think it’s more appropriate to impose such a cost at the boundary of legacy and future parts of z/OS programs than to contaminate all new text processing APIs with the question “What does this operation even mean for non-UTF encodings generally and EBCDIC encodings specifically?”. (In the case of Windows, the system already works in UTF-16 internally, so all narrow execution encodings already involve transcoding at the system interface boundary. In that context, it seems inappropriate to pretend that the legacy narrow execution encodings on Windows were somehow free of transcoding cost to begin with.)

To avoid a distraction from my main point, I’m explicitly not opining in this document on whether new text processing features should be available for sequences of char when the narrow execution encoding is UTF-8, for sequences of wchar_t when sizeof(wchar_t) is 2 and the wide execution encoding is UTF-16, or for sequences of wchar_t when sizeof(wchar_t) is 4 and the wide execution encoding is UTF-32.

The Type for a Unicode Scalar Value Should Be char32_t

The conclusion of the previous section is that new C++ facilities should not support number assignments to abstract characters other than Unicode, i.e. should not support coded character sets (either standardized or inferred from an encoding) other than Unicode. The conclusion makes it unnecessary to abstract type-wise over Unicode scalar values and some other kinds of scalar values. It just leaves the question of what the concrete type for a Unicode scalar value should be.

The Text_view paper says:

“It has been suggested that char32_t be supported as a character type that is implicitly associated with the Unicode character set and that values of this type always be interpreted as Unicode code point values. This suggestion is intended to enable UTF-32 string literals to be directly usable as sequences of character values (in addition to being sequences of code unit and code point values). This has a cost in that it prohibits use of the char32_t type as a code unit or code point type for other encodings.

I disagree with this and am firmly in the camp that char32_t should be the type for a Unicode scalar value.

The sentence “This has a cost in that it prohibits use of the char32_t type as a code unit or code point type for other encodings.” is particularly alarming. Seeking to use char32_t as a code unit type for encodings other than UTF-32 would dilute the meaning of char32_t into another wchar_t mess. (I’m happy to see that P1041R4 “Make char16_t/char32_t string literals be UTF-16/32” was voted into C++20.)

As for the appropriateness of using the same type both for a UTF-32 code unit and a Unicode scalar value, the whole point of UTF-32 is that its code unit value is directly the Unicode scalar value. That is what UTF-32 is all about, and UTF-32 has nothing else to offer: The value space that UTF-32 can represent is more compactly represented by UTF-8 and UTF-16 both of which are more commonly needed for interoperation with existing interfaces. When having the code units be directly the scalar values is UTF-32’s whole point, it would be unhelpful to distinguish type-wise between UTF-32 code units and Unicode scalar values. (Also, considering that buffers of UTF-32 are rarely useful but iterators yielding Unicode scalar values make sense, it would be sad to make the iterators have a complicated type.)

To provide interfaces that are generic across std::u8string_view, std::u16string_view, and std::u32string_view (and, thereby, strings for which these views can be taken), all of these should have a way to obtain a scalar value iterator that yields char32_t values. To make sure such iterators really yield only Unicode scalar values in an interoperable way, the iterator should yield U+FFFD upon error. What constitutes a single error in UTF-8 is defined in the WHATWG Encoding Standard (matches the “best practice” from the Unicode Standard). In UTF-16, each unpaired surrogate is an error. In UTF-32, each code unit whose numeric value isn’t a valid Unicode scalar value is an error. (The last sentence might be taken as admission that UTF-32 code units and scalar values are not the same after all. It is not. It is merely an acknowledgement that C++ does not statically prevent programs that could erroneously put an invalid value into a buffer that is supposed to be UTF-32.)

In general, new APIs should be defined to handle invalid UTF-8/16/32 either according to the replacement behavior described in the previous paragraph or by stopping and signaling error on the first error. In particular, the replacement behavior should not be left as implementation-defined, considering that differences in the replacement behavior between V8 and Blink lead to a bug. (See another write-up on this topic.)

Transcoding Should Be std::span-Based Instead of Iterator-Based

Since the above contemplates a conversion facility between legacy encodings and Unicode encoding forms, it seems on-topic to briefly opine on what such an API should look like. The Text_view paper says:

Transcoding between encodings that use the same character set is currently possible. The following example transcodes a UTF-8 string to UTF-16.

std::string in = get_a_utf8_string(); std::u16string out; std::back_insert_iterator<std::u16string> out_it{out}; auto tv_in = make_text_view<utf8_encoding>(in); auto tv_out = make_otext_iterator<utf16_encoding>(out_it); std::copy(tv_in.begin(), tv_in.end(), tv_out);

Transcoding between encodings that use different character sets is not currently supported due to lack of interfaces to transcode a code point from one character set to the code point of a different one.

Additionally, naively transcoding between encodings using std::copy() works, but is not optimal; techniques are known to accelerate transcoding between some sets of encoding. For example, SIMD instructions can be utilized in some cases to transcode multiple code points in parallel.

Future work is intended to enable optimized transcoding and transcoding between distinct character sets.

I agree with the assessment that iterator and std::copy()-based transcoding is not optimal due to SIMD considerations. To enable the use of SIMD, the input and output should be std::spans, which, unlike iterators, allow the converter to look at more than one element of the std::span at a time. I have designed and implemented such an API for C++, and I invite SG16 to adopt its general API design. I have a written a document that covers the API design problems that I sought to address and design of the API (in Rust but directly applicable to C++). (Please don’t be distracted by the implementation internals being Rust instead of C++. The API design is still valid for C++ even if the design constraint of the implementation internals being behind C linkage is removed. Also, please don’t be distracted by the API predating char8_t.)

Implications for Text_view

Above I’ve opined that only UTF-8, UTF-16, and UTF-32 (as Unicode encoding forms—not as Unicode encoding schemes!) should be supported for iteration by scalar value and that legacy encodings should be addressed by a conversion facility. Therefore, I think that Text_view should not be standardized as proposed. Instead, I think std::u8string_view, std::u16string_view, and std::u32string_view should gain a way to obtain a Unicode scalar value iterator (that yields values of type char32_t), and a std::span-based encoding conversion API should be provided as a distinct feature (as opposed to trying to connect Unicode scalar value iterators with std::copy()).

Categorieën: Mozilla-nl planet

The Rust Programming Language Blog: Announcing Rust 1.34.1

to, 25/04/2019 - 02:00

The Rust team is happy to announce a new version of Rust, 1.34.1, and a new version of rustup, 1.18.1. Rust is a programming language that is empowering everyone to build reliable and efficient software.

If you have a previous version of Rust installed via rustup, getting Rust 1.34.1 and rustup 1.18.1 is as easy as:

$ rustup update stable

If you don't have it already, you can get rustup from the appropriate page on our website.

What's in 1.34.1 stable

This patch release fixes two false positives and a panic when checking macros in Clippy. Clippy is a tool which provides a collection of lints to catch common mistakes and improve your Rust code.

False positive in clippy::redundant_closure

A false positive in the redundant_closure lint was fixed. The lint did not take into account differences in the number of borrows.

In the following snippet, the method required expects dep: &D but the actual type of dep is &&D:

dependencies.iter().filter(|dep| dep.required());

Clippy erronously suggested .filter(Dependency::required), which is rejected by the compiler due to the difference in borrows.

False positive in clippy::missing_const_for_fn

Another false positive in the missing_const_for_fn lint was fixed. This lint did not take into account that functions inside trait implementations cannot be const fns. For example, when given the following snippet, the lint would trigger:

#[derive(PartialEq, Eq)] // warning: this could be a const_fn struct Point(isize, isize); impl std::ops::Add for Point { type Output = Self; fn add(self, other: Self) -> Self { // warning: this could be a const_fn Point(self.0 + other.0, self.1 + other.1) } } What's new in rustup 1.18.1

A recent rustup release, 1.18.0, introduced a regression that prevented installing Rust through the shell script on older platforms. A patch was released that fixes the issue, avoiding to force TLS v1.2 on the platforms that don't support it.

You can check out other rustup changes in its full release notes.

Categorieën: Mozilla-nl planet

Mike Conley: Firefox Front-End Performance Update #17

wo, 24/04/2019 - 23:52

Hello, folks. I wanted to give a quick update on what the Firefox Front-end Performance team is up to, so let’s get into it.

The name of the game continues to be start-up performance. We made some really solid in-roads last quarter, and this year we want to continue to apply pressure. Specifically, we want to focus on reducing IO (specifically, main-thread IO) during browser start-up.

Reducing main thread IO during start-up

There are lots of ways to reduce IO – in the best case, we can avoid start-up IO altogether by not doing something (or deferring it until much later). In other cases, when the browser might be servicing events on the main thread, we can move IO onto another thread. We can also re-organize, pack or compress files differently so that they’re read off of the disk more efficiently.

If you want to change something, the first step is measuring it. Thankfully, my colleague Florian has written a rather brilliant test that lets us take accounting of how much IO is going on during start-up. The test is deterministic enough that he’s been able to write a whitelist for the various ways we touch the disk on the main thread during start-up, and that whitelist means we’ve made it much more difficult for new IO to be introduced on that thread.

That whitelist has been processed by the team, and have been turned into bugs, bucketed by the start-up phase where the IO is occurring. The next step is to estimate the effort and potential payoff of fixing those bugs, and then try to whittle down the whitelist.

And that’s effectively where we’re at. We’re at the point now where we’ve got a big list of work in front of us, and we have the fun task of burning that list down!

Being better at loading DLLs on Windows

While investigating the warm-up service for Windows, Doug Thayer noticed that we were loading DLLs during start-up oddly. Specifically, using a tool called RAMMap, he noticed that we were loading DLLs using “read ahead” (eagerly reading the entirety of the DLL into memory) into a region of memory marked as not-executable. This means that anytime we actually wanted to call a library function within that DLL, we needed to load it again into an executable region of memory.

Doug also noticed that we were unnecessarily doing ReadAhead for the same libraries in the content process. This wasn’t necessary, because by the time the content process wanted to load these libraries, the parent process would have already done it and it’d still be “warm” in the system file cache.

We’re not sure why we were doing this ReadAhead-into-unexecutable-memory work – it’s existence in the Firefox source code goes back many many years, and the information we’ve been able to gather about the change is pretty scant at best, even with version control. Our top hypothesis is that this was a performance optimization that made more sense in the Windows XP era, but has since stopped making sense as Windows has evolved.

UPDATE: Ehsan pointed us to this bug where the change likely first landed. It’s a long and wind-y bug, but it seems as if this was indeed a performance optimization, and efforts were put in to side-step effects from Prefetch. I suspect that later changes to how Prefetch and SuperFetch work ultimately negated this optimization.

Doug hacked together a quick prototype to try loading DLLs in a more sensible way, and the he was able to capture quite an improvement in start-up time on our reference hardware:

<figcaption>This graph measures various start-up metrics. The scatter of datapoints on the left show the “control” build, and they tighten up on the right with the “test” build. Lower is better.
</figcaption>

At this point, we all got pretty excited. The next step was to confirm Doug’s findings, so I took his control and test builds, and tested them independently on the reference hardware using frame recording. There was a smaller1, but still detectable improvement in the test build. At this point, we decided it was worth pursuing.

Doug put together a patch, got it reviewed and landed, and we immediately saw an impact in our internal benchmarks.

We’re also seeing the impact reflected in Telemetry. The first Nightly build with Doug Thayer’s patch went out on April 14th, and we’re starting to see a nice dip in some of our graphs here:

<figcaption>This graph measures the time at which the browser window reports that it has first painted. April 14th is the second last date on the X axis, and the Y axis is time. The top-most line is plotting the 95th percentile, and there’s a nice dip appearing around April 14th.
</figcaption>

There are other graphs that I’d normally show for improvements like this, except that we started tracking an unrelated regression on April 16th which kind of muddies the visualization. Bad timing, I guess!

We expect this improvement to have the greatest impact on weaker hardware with slower disks, but we’ll be avoiding some unnecessary work for all Windows users, and that gets a thumbs-up in my books.

If all goes well, this fix should roll out in Firefox 68, which reaches our release audience on July 9th!

  1. My test machine has SuperFetch disabled to help reduce noise and inconsistency with start-up tests, and we suspect SuperFetch is able to optimize start-up better in the test build 

Categorieën: Mozilla-nl planet

Daniel Stenberg: Why they use curl

wo, 24/04/2019 - 23:41

As a reader of my blog you know curl. You also most probably already know why you would use curl and if I’m right, you’re also a fan of using the right tool for the job. But do you know why others use curl and why they switch from other solutions to relying on curl for their current and future data transfers? Let me tell you the top reasons I’m told by users.

Logging and exact error handling

What exactly happened in the transfer and why are terribly important questions to some users, and with curl you have the tools to figure that out and also be sure that curl either returns failure or the command worked. This clear and binary distinction is important to users for whom that single file every transfer is important. For example, some of the largest and most well-known banks in the world use curl in their back-ends where each file transfer can mean a transfer of extremely large sums of money.

A few years ago I helped a money transaction service switch to curl to get that exact line in the sand figured out. To know exactly and with certainty if money had been transferred – or not – for a given operation. Vital for their business.

curl does not have the browsers’ lenient approach of “anything goes as long as we get something to show” when it comes to the Internet protocols.

Verbose goodness

curl’s verbose output options allow users to see exactly what curl sends and receives in a quick and non-complicated way. This is invaluable for developers to figure out what’s happening and what’s wrong, in either end involved in the data transfer.

curl’s verbose options allows developers to see all sent and received data even when encryption is used. And if that is not enough, its SSLKEYLOGFILE support allows you to take it to the next level when you need to!

Same behavior over time

Users sometimes upgrade their curl installations after several years of not having done so. Bumping a software’s version after several years and many releases, any software really, can be a bit of a journey and adventure many times as things have changed, behavior is different and things that previously worked no longer do etc.

With curl however, you can upgrade to a version that is a decade newer, with lots of new fancy features and old crummy bugs fixed, only to see that everything that used to work back in the day still works – the same way. With curl, you can be sure that there’s an enormous focus on maintaining old functionality when going forward.

Present on all platforms

The fact that curl is highly portable, our users can have curl and use curl on just about any platform you can think of and use it with the same options and behaviors across them all. Learn curl on one platform, then continue to use it the same way on the next system. Platforms and their individual popularity vary over time and we enjoy to allow users to pick the ones they like – and you can be sure that curl will run on them all.

Performance

When doing the occasional file transfer every once in a while, raw transfer performance doesn’t matter much. Most of the time will then just be waiting on network anyway. You can easily get away with your Python and java frameworks’ multiple levels of overhead and excessive memory consumption.

Users who scan the Internet or otherwise perform many thousands of transfers per second from a large number of threads and machines realize that they need fewer machines that spend less CPU time if they build their file transfer solutions on top of curl. In curl we have a focus on only doing what’s required and it’s a lean and trimmed solution with a well-documented API built purely for Internet data transfers.

The features you want

The author of a banking application recently explained for us that one of the top reasons why they switched to using curl for doing their Internet data transfers, is curl’s ability to keep the file name from the URL.

curl is a feature-packed tool and library that most likely already support the protocols you need and provide the power features you want. With a healthy amount of “extension points” where you can extend it or hook in your custom extra solution.

Support and documentation

No other tool or library for internet transfers have even close to the same amount of documentation, examples available on the net, existing user base that can help out and friendly users to support you when you run into issues. Ask questions on the mailing lists, post a bug on the bug tracker or even show your non-working code on stackoverflow to further your project.

curl is really the only Internet transfer option available to get something that’s old and battle-proven proven by the giants of the industry, that is trustworthy, high-performing and yet for which you can also buy commercial support for, today.

This blog post was also co-posted on wolfssl.com.

Categorieën: Mozilla-nl planet

Ryan Harter: When the Bootstrap Breaks - ODSC 2019

wo, 24/04/2019 - 09:00

I'm excited to announce that I'll be presenting at the Open Data Science Conference in Boston next week. My colleague Saptarshi and I will be talking about When the Bootstrap Breaks.

I've included the abstract below, but the high-level goal of this talk is to strip some varnish off the bootstrap. Folks often look to the bootstrap as a panacea for weird data, but all tools have their failure cases. We plan on highlighting some problems we ran into when trying to use the bootstrap for Firefox data and how we dealt with the issues, both in theory and in practice.

Abstract:

Resampling methods like the bootstrap are becoming increasingly common in modern data science. For good reason too; the bootstrap is incredibly powerful. Unlike t-statistics, the bootstrap doesn’t depend on a normality assumption nor require any arcane formulas. You’re no longer limited to working with well understood metrics like means. One can easily build tools that compute confidence for an arbitrary metric. What’s the standard error of a Median? Who cares! I used the bootstrap.

With all of these benefits the bootstrap begins to look a little magical. That’s dangerous. To understand your tool you need to understand how it fails, how to spot the failure, and what to do when it does. As it turns out, methods like the bootstrap and the t-test struggle with very similar types of data. We’ll explore how these two methods compare on troublesome data sets and discuss when to use one over the other.

In this talk we’ll explore what types to data the bootstrap has trouble with. Then we’ll discuss how to identify these problems in the wild and how to deal with the problematic data. We will explore simulated data and share the code to conduct the simulations yourself. However, this isn’t just a theoretical problem. We’ll also explore real Firefox data and discuss how Firefox’s data science team handles this data when analyzing experiments.

At the end of this session you’ll leave with a firm understanding of the bootstrap. Even better, you’ll understand how to spot potential issues in your data and avoid false confidence in your results.

Categorieën: Mozilla-nl planet

The Mozilla Blog: It’s Complicated: Mozilla’s 2019 Internet Health Report

wo, 24/04/2019 - 05:00
Our annual open-source report examines how humanity and the internet intersect. Here’s what we found

 

Today, Mozilla is publishing the 2019 Internet Health Report — our third annual examination of the internet, its impact on society and how it influences our everyday lives.

http://internethealthreport.org/2019/

The Report paints a mixed picture of what life online looks like today. We’re more connected than ever, with humanity passing the ‘50% of us are now online’ mark earlier this year. And, while almost all of us enjoy the upsides of being connected, we also worry about how the internet and social media are impacting our children, our jobs and our democracies.

When we published last year’s Report, the world was watching the Facebook-Cambridge Analytica scandal unfold — and these worries were starting to grow. Millions of people were realizing that widespread, laissez-faire sharing of our personal data, the massive growth and centralization of the tech industry, and the misuse of online ads and social media was adding up to a big mess.

Over the past year, more and more people started asking: what are we going to do about this mess? How do we push the digital world in a better direction?

As people asked these questions, our ability to see the underlying problems with the system — and to imagine solutions — has evolved tremendously. Recently, we’ve seen governments across Europe step up efforts to monitor and thwart disinformation ahead of the upcoming EU elections. We’ve seen the big tech companies try everything from making ads more transparent to improving content recommendation algorithms to setting up ethics boards (albeit with limited effect and with critics saying ‘you need to do much more!’). And, we’ve seen CEOs and policymakers and activists wrestling with each other over where to go next. We have not ‘fixed’ the problems, but it does feel like we’ve entered a new, sustained era of debate about what a healthy digital society should look like.

The 2019 Internet Health Report examines the story behind these stories, using interviews with experts, data analysis and visualization, and original reporting. It was also built with input from you, the reader: In 2018, we asked readers what issues they wanted to see in the next Report.

In the Report’s three spotlight articles, we unpack three big issues: One examines the need for better machine decision making — that is, asking questions like Who designs the algorithms? and What data do they feed on? and Who is being discriminated against? Another examines ways to rethink the ad economy, so surveillance and addiction are no longer design necessities.  The third spotlight article examines the rise of smart cities, and how local governments can integrate tech in a way that serves the public good, not commercial interests.

Of course, the Report isn’t limited to just three topics. Other highlights include articles on the threat of deepfakes, the potential of user-owned social media platforms, pornography literacy initiatives, investment in undersea cables, and the dangers of sharing DNA results online.

So, what’s our conclusion? How healthy is the internet right now? It’s complicated — the digital environment is a complex ecosystem, just like the planet we live on. There have been a number of positive trends in the past year that show that the internet — and our relationship with it — is getting healthier:

Calls for privacy are becoming mainstream. The last year brought a tectonic shift in public awareness about privacy and security in the digital world, in great part due to the Cambridge Analytica scandal. That awareness is continuing to grow — and also translate into action. European regulators, with help from civil society watchdogs and individual internet users, are enforcing the GDPR: In recent months, Google has been fined €50 million for GDPR violations in France, and tens of thousands of violation complaints have been filed across the continent.

There’s a movement to build more responsible AI. As the flaws with today’s AI become more apparent, technologists and activists are speaking up and building solutions. Initiatives like the Safe Face Pledge seek facial analysis technology that serves the common good. And experts like Joy Buolamwini, founder of the Algorithmic Justice League, are lending their insight to influential bodies like the Federal Trade Commission and the EU’S Global Tech Panel.

Questions about the impact of ‘big tech’ are growing. Over the past year, more and more people focused their attention on the fact that eight companies control much of the internet. As a result, cities are emerging as a counterweight, ensuring municipal technology prioritizes human rights over profit — the Cities for Digital Rights Coalition now has more than two dozen participants. Employees at Google, Amazon, and Microsoft are demanding that their employers don’t use or sell their tech for nefarious purposes. And ideas like platform cooperativism and collaborative ownership are beginning to be discussed as alternatives.

On the flipside, there are many areas where things have gotten worse over the past year — or where there are new developments that worry us:

Internet censorship is flourishing. Governments worldwide continue to restrict internet access in a multitude of ways, ranging from outright censorship to requiring people to pay additional taxes to use social media. In 2018, there were 188 documented internet shutdowns around the world. And a new form of repression is emerging: internet slowdowns. Governments and law enforcement restrict access to the point where a single tweet takes hours to load. These slowdowns diffuse blame, making it easier for oppressive regimes to deny responsibility.

Biometrics are being abused. When large swaths of a population don’t have access to physical IDs, digital ID systems have the potential to make a positive difference. But in practice, digital ID schemes often benefit heavy-handed governments and private actors, not individuals. In India, over 1 billion citizens were put at risk by a vulnerability in Aadhaar, the government’s biometric ID system. And in Kenya, human rights groups took the government to court over its soon-to-be-mandatory National Integrated Identity Management System (NIIMS), which is designed to capture people’s DNA information, the GPS location of their home, and more.

AI is amplifying injustice. Tech giants in the U.S. and China are training and deploying AI at a breakneck pace that doesn’t account for potential harms and externalities. As a result, technology used in law enforcement, banking, job recruitment, and advertising often discriminates against women and people of color due to flawed data, false assumptions, and lack of technical audits. Some companies are creating ‘ethics boards’ to allay concerns — but critics say these boards have little or no impact.

When you look at trends like these — and many others across the Report — the upshot is: the internet has the potential both to uplift and connect us. But it also has the potential to harm and tear us apart. This has become clearer to more and more people in the last few years. It has also become clear that we need to step up and do something if we want the digital world to net out as a positive for humanity rather than a negative.

The good news is that more and more people are dedicating their lives to creating a healthier, more humane digital world. In this year’s Report, you’ll hear from technologists in Ethiopia, digital rights lawyers in Poland, human rights researchers from Iran and China, and dozens of others. We’re indebted to these individuals for the work they do every day. And also to the countless people in the Mozilla community — 200+ staff, fellows, volunteers, like-minded organizations — who helped make this Report possible and who are committed to making the internet a better place for all of us.

This Report is designed to be both a reflection and resource for this kind of work. It is meant to offer technologists and designers inspiration about what they might build; to give policymakers context and ideas for the laws they need to write; and, most of all, to provide citizens and activists with a picture of where others are pushing for a better internet, in the hope that more and more people around the world will push for change themselves. Ultimately, it is by more and more of us doing something in our work and our lives that we will create an internet that is open, human and humane.

I urge you to read the Report, leave comments and share widely.

PS. This year, you can explore all these topics through reading “playlists,” curated by influential people in the internet health space like Esra’a Al Shafei, Luis Diaz Carlos, Joi Ito and others.

The post It’s Complicated: Mozilla’s 2019 Internet Health Report appeared first on The Mozilla Blog.

Categorieën: Mozilla-nl planet

Mark Surman: Why AI + consumer tech?

ti, 23/04/2019 - 22:41

In my last post, I shared some early thoughts on how Mozilla is thinking about AI as part of our overall internet health agenda. I noted in that post that we’re leaning towards consumer tech as the focus and backdrop for whatever goals we take on in AI. In our draft issue brief we say:

Mozilla is particularly interested in how automated decision making is being used in consumer products and services. We want to make sure the interests of all users are designed into these products and services. Where they aren’t, we want to call that out.

After talking to nearly 100 AI experts and activists, this consumer tech focus feels right for Mozilla. But it also raises a number of questions: what do we mean by consumer tech? What is in scope for this work? And what is not? Are we missing something important with this focus?

At its simplest, the consumer tech platforms that we are talking about are general purpose internet products and services aimed at a wide audience for personal use. These include things like social networks, search engines, retail e-commerce, home assistants, computers, smartphones, fitness trackers, self-driving cars, etc. — almost all of which are connected to the internet and are fueled by our personal data. The leading players in these areas are companies like Google, Amazon, Facebook, Microsoft and Apple in the US as well as companies like Baidu, TenCent, and AliBaba in China. These companies are also amongst the biggest developers and users of AI, setting trends and shipping technology that shapes the whole of the tech industry. And, as long as we remain in the era of machine learning, these companies have a disproportionate advantage in AI development as they control huge amounts for data and computing power that can be used to train automated systems.

Given the power of the big tech companies in shaping the AI agenda — and the growing pervasiveness of automated decision making in the tech we all use everyday — we believe we need to set a higher bar for the development, use and impact of AI in consumer products and services. We need a way to reward companies who reach that bar. And push back and hold to account those who do not.

Of course, AI isn’t bad or good on it’s own — it is just another tool in the toolbox of computer engineering. Benefits, harms and side effects come from how systems are designed, what data is selected to train them and what business rules they are given. For example, search for ‘doctor’ on Google, you mostly see white doctors because that bias is in the training data. Similarly, content algorithms on sites like YouTube often recommend increasingly extreme content because the main business rule they are optimized for is to keep people on the site or app for as long as possible. Humans — and the companies they work in — can avoid or fix problems like these. Helping them do so is important work. It’s worth doing.

Of course, there are important issues related to AI and the health of the internet that go beyond consumer technology. The use of biased facial recognition software by police and immigration authorities. Similarly biased and unfair resume sorting algorithms used by human resource departments as part of hiring processes. The use of AI by the military to automate and add precision to killing from a distance. Ensuring that human rights and dignity are protected as the use of machine decision making grows within government and the back offices of big business is critical. Luckily, there is an amazing crew of organizations stepping up to address these issues such as AI Now in the US and Algorithm Watch in Europe. Mozilla taking a lead in these areas wouldn’t add much. Here, we should play a supporting role.

In contrast, there are few players focused squarely on how AI is showing up in consumer products and services. Yet this is one of the places where the power and the impact of AI is moving most rapidly. Also, importantly, consumer tech is the field on which Mozilla has always played. As we try to shape where AI is headed, it makes sense to do so here. We’re already doing so in small ways with technology, showing a more open way to approach machine learning with projects like Deep Speech and Common Voice. However, we have a chance to do much more by using our community, brand and credibility to push the development and use of AI in consumer tech in the right direction. We might do this as a watchdog. Or by collecting a brain trust of fellows with new ideas about privacy in AI. Or by helping to push for policies that lead to real accountability. There are many options. Whatever we pick, it feels like the consumer tech space is both in need of attention and well suited to the strengths that Mozilla brings to the table.

I would say that we’re 90% decided that consumer tech is the right place to focus Mozilla’s internet health movement building work around AI. That means there is a 9/10 chance that this is where we will go — but there is a chance that we hear something at this stage that changes this thinking in a meaningful way. As we zero in on this decision, I’d be interested to know what others think: If we go in this direction, what are the most important things to be thinking about? Where are the big opportunities? On the flip side, are there important things we’ll be missing if we go down this path? Feel free to comment on this post, tweet or email me if you have thoughts.

The post Why AI + consumer tech? appeared first on Mark Surman.

Categorieën: Mozilla-nl planet

The Firefox Frontier: 5 times when video ads autoplay and ruin everything.

ti, 23/04/2019 - 17:42

The room is dark and silent. Suddenly, a loud noise pierces your ears. You panic as everyone turns in your direction. You just wanted to read an article about cute … Read more

The post 5 times when video ads autoplay and ruin everything. appeared first on The Firefox Frontier.

Categorieën: Mozilla-nl planet

Ian Bicking: “Users want control” is a shoulder shrug

ti, 23/04/2019 - 07:00

Making the claim “users want control” is the same as saying you don’t know what users want, you don’t know what is good, and you don’t know what their goals are.

I first started thinking about this during the debate over what would become the ACA. The rhetoric was filled with this idea that people want choice in their medical care: people want control.

No! People want good health care. If they don’t trust systems to provide them good health care, if they don’t trust their providers to understand their priorities, then choice is the fallback: it’s how you work the system when the system isn’t working for you. And it sucks! Here you are, in the middle of some health issue, with treatments and symptoms and the rest of your life duties, and now you have to become a researcher on top of it? But the politicians and the pundits could not stop talking about control.

Control is what you need when you want something and it won’t happen on its own. But (usually) it’s not control you want, it’s just a means.

So when we say users want control over X – their privacy, their security, their data, their history – we are first acknowledging that current systems act against users, but we aren’t proposing any real solution. We’re avoiding even talking about the problems.

For instance, we say “users want control over their privacy,” but what people really want is some subset of:

  1. To avoid embarrassment
  2. To avoid persecution
  3. … sometimes for doing illegal and wrong things
  4. To keep from having the creeping sensation they left something sitting out that they didn’t want to
  5. They want to make some political statement against surveillance
  6. They want to keep things from the prying eyes of those close to them
  7. They want to avoid being manipulated by bad-faith messaging

There’s no easy answers, not everyone holds all these desires, but these are concrete ways of thinking about what people want. They don’t all point in the same direction. (And then consider the complex implications of someone else talking about you!)

There are some cases when a person really does want control. If the person wants to determine their own path, if having choice is itself a personal goal, then you need control. That’s a goal about who you are not just what you get. It’s worth identifying moments when this is important. But if a person does not pay attention to something then that person probably does not identify with the topic and is not seeking control over it. “Privacy advocates” pay attention to privacy, and attain a sense of identity from the very act of being mindful of their own privacy. Everyone else does not.

Let’s think about another example: users want control over their data. What are some things they want?

  1. They don’t want to lose their data
  2. They don’t want their data used to hold them hostage (e.g., to a subscription service)
  3. They don’t want to delete data and have it still reappear
  4. They want to use their data however they want, but more likely they want their data available for use by some other service or tool
  5. They feel it’s unfair if their data is used for commercial purposes without any compensation
  6. They are offended if their data is used to manipulate themselves or others
  7. They don’t want their data used against them in manipulative ways
  8. They want to have shared ownership of data with other people
  9. They want to prevent unauthorized or malicious access to their data

Again these motivations are often against each other. A person wants to be able to copy their data between services, but also delete their data permanently and completely. People don’t want to lose their data, but having personal control over your data is a great way to lose it, or even to lose control over it. The professionalization and centralization of data management by services has mostly improved access control and reliability.

When we simply say users want control it’s giving up on understanding people’s specific desires. Still it’s not exactly wrong: it’s reasonable to assume people will use control to achieve their desires. But if, as technologists, we can’t map functionality to desire, it’s a bit of a stretch to imagine everyone else will figure it out on the fly.

Categorieën: Mozilla-nl planet

This Week In Rust: This Week in Rust 283

ti, 23/04/2019 - 06:00

Hello and welcome to another issue of This Week in Rust! Rust is a systems language pursuing the trifecta: safety, concurrency, and speed. This is a weekly summary of its progress and community. Want something mentioned? Tweet us at @ThisWeekInRust or send us a pull request. Want to get involved? We love contributions.

This Week in Rust is openly developed on GitHub. If you find any errors in this week's issue, please submit a PR.

Updates from Rust Community News & Blog Posts Crate of the Week

This week's crate is color-backtrace, a crate to give panic backtraces more information (and some color, too). Thanks to Willi Kappler for the suggestion!

Submit your suggestions and votes for next week!

Call for Participation

Always wanted to contribute to open-source projects but didn't know where to start? Every week we highlight some tasks from the Rust community for you to pick and get started!

Some of these tasks may also have mentors available, visit the task page for more information.

If you are a Rust project owner and are looking for contributors, please submit tasks here.

Updates from Rust Core

221 pull requests were merged in the last week

Approved RFCs

Changes to Rust follow the Rust RFC (request for comments) process. These are the RFCs that were approved for implementation this week:

No RFCs were approved this week.

Final Comment Period

Every week the team announces the 'final comment period' for RFCs and key PRs which are reaching a decision. Express your opinions now.

RFCs Tracking Issues & PRs New RFCs Upcoming Events Africa Asia Pacific Europe North America

If you are running a Rust event please add it to the calendar to get it mentioned here. Please remember to add a link to the event too. Email the Rust Community Team for access.

Rust Jobs

Tweet us at @ThisWeekInRust to get your job offers listed here!

Quote of the Week

No quote was selected for QotW.

Please submit your quotes for next week!

This Week in Rust is edited by: nasa42, llogiq, and Flavsditz.

Discuss on r/rust.

Categorieën: Mozilla-nl planet

The Rust Programming Language Blog: Rust's 2019 roadmap

ti, 23/04/2019 - 02:00

Each year the Rust community comes together to set out a roadmap. This year, in addition to the survey, we put out a call for blog posts in December, which resulted in 73 blog posts written over the span of a few weeks. The end result is the recently-merged 2019 roadmap RFC. To get all of the details, please give it a read, but this post lays out some of the highlights.

The theme: Maturity

In short, 2019 will be a year of rejuvenation and maturation for the Rust project. We shipped a lot of stuff last year, and grew a lot. Now it's time to take a step back, take stock, and prepare for the future.

The work we have planned for this year falls into three major categories:

  • Governance: improving how the project is run
  • Finish long-standing requests: closing out work we've started but never finished
  • Polish: improving the overall quality of the language and tooling
Governance

Over the last three years, the Rust project has grown a lot. Rust used to have a core team of 8 members. When we added sub-teams in 2015, we grew to 23 members. We've now grown to over 100 — that's bigger than many companies! And of course, looking beyond the teams, the size of the Rust community as a whole has grown tremendously as well. As a result of this growth, we've found that the processes which served us well when we were a smaller project are starting to feel some strain.

Many of the teams have announced plans to review and revamp their processes so as to scale better. Often this can be as simple as taking the time to write down things that previously were understood only informally — sometimes it means establishing new structures.

Because of this widespread interest in governance, we've also created a new Governance Working Group. This group is going to be devoted to working with each team to hone its governance structure and to help pass lessons and strategies between teams.

Additionally, the RFC process has been a great boon for Rust, but as we've grown, there have been times where it didn't work so well too. We may look at revisions to the process this year.

Long-standing requests

There are a number of exciting initiatives that have been sitting in a limbo state — the majority of the design is done, but there are some lingering complications that we haven't had time to work out. This year we hope to take a fresh look at some of these problems and try hard to resolve those lingering problems.

Examples include:

  • The Cargo team and custom registries
  • The Language team is taking a look at async/await, specialization, const generics, and generic associated types
  • The Libs team wants to finish custom allocators
Polish

Finally, the last few years have also seen a lot of foundational work. The compiler, for example, was massively refactored to support incremental compilation and to be better prepared for IDEs. Now that we've got these pieces in place, we want to do the “polish” work that really makes for a great experience.

Examples:

  • Compile times and IDE support
  • Polishing the specification of the language by improving the reference and laying out the unsafe code guidelines
  • The WebAssembly WG's work this year includes polishing our wasm support, for example, debugging
Conclusion

This post only covered a few examples of the plans we've been making. If you'd like to see the full details, take a look at the RFC itself.

Here's to a great 2019 for Rust!

Categorieën: Mozilla-nl planet

Mozilla VR Blog: VoxelJS: Chunking Magic

ti, 23/04/2019 - 01:05
 Chunking Magic

A couple of weeks ago I relaunched VoxelJS with modern ThreeJS and modules support. Today I'm going to talk a little bit about how VoxelJS works internally, specifically how voxels are represented and drawn. This is the key magic part of a voxel engine and I owe a tremendous debt to Max Ogden, James Halliday and Mikola Lysenko

Voxels are represented by numbers in a large three dimensional array. Each number says what type of block goes in that block slot, with 0 representing empty. The challenge is how to represent a potentially infinite set of voxels without slowing the computer to a crawl. The only way to do this is to load just a portion of the world at a time.

The Challenge

We could do this by generating a large structure, say 512 x 512 x 512 blocks, then setting entries in the array whenever the player creates or destroys a block.

This process would work, but we have a problem. A grid of numbers is great for representing voxels, but we don’t have a way to draw them. ThreeJS (and by extension, WebGL) only thinks in terms of polygons. So we need a process to turn the voxel grid in to a polygon mesh. This process is called meshing.

The simple way to do meshing would be to create a cube object for every voxel in the original data. This would work but it would be incredibly slow. Plus most cubes are inside terrain and would never be seen, but they would still take up memory in the GPU and have to be culled from the view. Clearly we need to find a way to create more efficient meshes.

Even if we have a better way to do the meshing we still have another problem. When the player creates or destroys a block we need to update the data. Updating a single entry in an array is easy, but then we have to recreate the entire polygon mesh. That’s going to be incredibly slow if we must rebuild the entire mesh of the entire world every time the player creates a block, which will happen a lot in a game like Minecraft. VoxelJS has a solution to both of our problems. : chunks.

Chunks

The world is divided into large cubes called chunks, each composed of a uniform set of blocks. By default each chunk is 16x16x16 blocks, though you can adjust this depending on your need. Each chunk is represented by an array of block types, just as before, and when any block in the chunk changes the entire chunk transformed into a new polygon mesh and sent to the GPU. Because the chunk is so much smaller than before this shouldn’t take very long. In practice it can be done in less than a few milliseconds on a laptop. Only the single chunk which changed needs to be updated, the rest can stay in GPU memory and just be redrawn every frame (which is something modern GPUs excel at).

As the player moves around the world VoxelJS maintains a list of chunks that are near by. This is defined as chunks within a certain radius of the player (also configurable). As chunks go out of the nearby list they are deleted and new chunks are created. Thus at any given time only a small number of chunks are in memory.

This chunking system works pretty well with the naive algorithm that creates a cube for every voxel, but it still ends up using a tremendous amount of GPU memory and can be slow to draw, especially on mobile GPUs. We can do better.

Meshing

The first optimization is to only create cube geometry for voxels that are actually visible from the outside. All internal voxels can be culled. In VoxelJS this is implemented by the CullingMesher in the main src directory.

Culling works a lot better than naive meshing, but we could still do better. If there is a 10 x 10 wall it would be represented by 100 cubes (then multiplied another 12 when turned into triangles); yet in theory the wall could be drawn with a single box object.

To handle this case VoxelJS has another algorithm called the GreedyMesher. This produces vastly more efficient meshes and takes only slightly longer to compute than the culling mesh.

There is a downside to greedy meshing, however. Texture mapping is very easy on cubes, especially when using a texture atlas. The same with ambient occlusion. However, for arbitrary sized rectangular shapes the texturing and lighting calculations become a lot more complicated.

Texturing

Once we have the voxels turned into geometry we can upload it to the GPU, but this won’t let us draw different block types differently. We could do this with vertex colors, but for a real Minecraft like environment we need textures. A mesh can only be drawn with a single texture in WebGL. (Actually it is possible to use multiple textures per polygon using Texture Arrays but those are only supported with WebGL 2). In order to render multiple textures on a single mesh, which is required for chunks that have more than one block type, we must combine the different images into a single texture called a texture atlas.

To use a texture atlas we need to tell the GPU which part of the atlas to draw for each triangle in the mesh. VoxelJS does this by adding a subrects attribute to the buffer that holds the vertex data. Then the fragment shader can use this data to calculate the correct UV values for the required sub-texture.

Animated textures

VoxelJS can also handle animated textures for special blocks like water and lava. This is done by uploading all of the animation frames next to each other in the texture atlas. A frameCount attribute is passed to the shader along with a time uniform. The shader uses this information to offset the texture coordinates to get the particular frame that should be drawn.

Flaws

This animation system works pretty well but it does require all animations to have the same speed. To fix this I think we would need to have a speed attribute passed to the shader.

Currently VoxelJS supports both culled and greedy meshing. You can switch between them at runtime by swapping out the mesher and regenerating all chunks. Ambient occlusion lighting works wonderfully with culled meshing but does not currently work correctly with the greedy algorithm. Additionally the texturing code for greedy meshing is far more complicated and less efficient than it should be. All of these are open issues to be fixed.

Getting Involved

I hope this gives you insight into how VoxelJS Next works and how you can customize it to your needs.

If you'd like to get involved I've created a list of issues that are great for people new to the project.

Categorieën: Mozilla-nl planet

Mozilla B-Team: happy bmo push day!

mo, 22/04/2019 - 17:59

Bugfixes + enabling the new security feature for API keys.

release tag

the following changes have been pushed to bugzilla.mozilla.org:

  • [1541303] Default component bug type is not set as expected; enhancement severity is still used for existing bugs
  • [1543760] When cloning a bug, the bug is added to ‘Regressed by’ of the new bug
  • [1543718] Obsolete attachments should have a strikethrough
  • [1543798] Do not treat email addresses with invalid.bugs as unassigned…

View On WordPress

Categorieën: Mozilla-nl planet

Daniel Stenberg: curl + hackerone = TRUE

mo, 22/04/2019 - 17:03

There seems to be no end to updated posts about bug bounties in the curl project these days. Not long ago I mentioned the then new program that sadly enough was cancelled only a few months after its birth.

Now we are back with a new and refreshed bug bounty program! The curl bug bounty program reborn.

This new program, which hopefully will manage to survive a while, is setup in cooperation with the major bug bounty player out there: hackerone.

Basic rules

If you find or suspect a security related issue in curl or libcurl, report it! (and don’t speak about it in public at all until an agreed future date.)

You’re entitled to ask for a bounty for all and every valid and confirmed security problem that wasn’t already reported and that exists in the latest public release.

The curl security team will then assess the report and the problem and will then reward money depending on bug severity and other details.

Where does the money come from?

We intend to use funds and money from wherever we can. The Hackerone Internet Bug Bounty program helps us, donations collected over at opencollective will be used as well as dedicated company sponsorships.

We will of course also greatly appreciate any direct sponsorships from companies for this program. You can help curl getting even better by adding funds to the bounty program and help us reward hard-working researchers.

Why bounties at all?

We compete for the security researchers’ time and attention with other projects, both open and proprietary. The projects that can help put food on these researchers’ tables might have a better chance of getting them to use their tools, time, skills and fingers to find our problems instead of someone else’s.

Finding and disclosing security problems can be very time and resource consuming. We want to make it less likely that people give up their attempts before they find anything. We can help full and part time security engineers sustain their livelihood by paying for the fruits of their labor. At least a little bit.

Only released code?

The state of the code repository in git is not subject for bounties. We need to allow developers to do mistakes and to experiment a little in the git repository, while we expect and want every actual public release to be free from security vulnerabilities.

So yes, the obvious downside with this is that someone could spot an issue in git and decide not to report it since it doesn’t give any money and hope that the flaw will linger around and ship in the release – and then reported it and claim reward money. I think we just have to trust that this will not be a standard practice and if we in fact notice that someone tries to exploit the bounty in this manner, we can consider counter-measures then.

How about money for the patches?

There’s of course always a discussion as to why we should pay anyone for bugs and then why just pay for reported security problems and not for heroes who authored the code in the first place and neither for the good people who write the patches to fix the reported issues. Those are valid questions and we would of course rather pay every contributor a lot of money, but we don’t have the funds for that. And getting funding for this kind of dedicated bug bounties seem to be doable, where as a generic pay contributors fund is trickier both to attract money but it is also really hard to distribute in an open project of curl’s nature.

How much money?

At the start of this program the award amounts are as following. We reward up to this amount of money for vulnerabilities of the following security levels:

Critical: 2,000 USD
High: 1,500 USD
Medium: 1,000 USD
Low: 500 USD

Depending on how things go, how fast we drain the fund and how much companies help us refill, the amounts may change over time.

Found a security flaw?

Report it!

Categorieën: Mozilla-nl planet

Niko Matsakis: AiC: Collaborative summary documents

mo, 22/04/2019 - 06:00

In my previous post, I talked about the idea of mapping the solution space:

When we talk about the RFC process, we always emphasize that the point of RFC discussion is not to select the best answer; rather, the point is to map the solution space. That is, to explore what the possible tradeoffs are and to really look for alternatives. This mapping process also means exploring the ups and downs of the current solutions on the table.

One of the challenges I see with how we often do design is that this “solution space” is actually quite implicit. We are exploring it through comments, but each comment is only tracing out one path through the terrain. I wanted to see if we could try to represent the solution space explicitly. This post is a kind of “experience report” on one such experiment, what I am calling a collaborative summary document (in contrast to the more standard summary comment that we often do).

The idea: a collaborative summary document

I’ll get into the details below, but the basic idea was to create a shared document that tried to present, in a neutral fashion, the arguments for and against a particular change. I asked the people to stop commenting in the thread and instead read over the document, look for things they disagreed with, and offer suggestions for how it could be improved.

My hope was that we could not only get a thorough summary from the process, but also do something deeper: change the focus of the conversation from “advocating for a particular point of view” towards “trying to ensure a complete and fair summary”. I figured that after this period was done, people were likely go back to being advocates for their position, but at least for some time we could try to put those feelings aside.

So how did it go?

Overall, I felt very positive about the experience and I am keen to try it again. I think that something like “collaborative summary documents” could become a standard part of our process. Still, I think it’s going to take some practice trying this a few times to figure out the best structure. Moreover, I think it is not a silver bullet: to realize the full potential, we’re going to have to make other changes too.

What I did in depth

What I did more specifically was to create a Dropbox Paper document. This document contained my best effort at summarizing the issue at hand, but it was not meant to be just my work. The idea was that we would all jointly try to produce the best summary we could.

After that, I made an announcement on the original thread asking people to participate in the document. Specifically, as the document states, the idea was for people to do something like this:

  • Read the document, looking for things they didn’t agree with or felt were unfairly represented.
  • Leave a comment explaining their concern; or, better, supplying alternate wording that they did agree with
    • The intention was always to preserve what they felt was the sense of the initial comment, but to make it more precise or less judgemental.

I was then playing the role of editor, taking these comments and trying to incorporate them into the whole. The idea was that, as people edited the document, we would gradually approach a fixed point, where there was nothing left to edit.

Structure of the shared document

Initially, when I created the document, I structured it into two sections – basically “pro” and “con”. The issue at hand was a particular change to the Futures API (the details don’t matter here). In this case, the first section advocated for the change, and the second section advocated against it. So, something like this (for a fictional problem):

Pro:

We should make this change because of X and Y. The options we have now (X1, X2) aren’t satisfying because of problem Z.

Con:

This change isn’t needed. While it would make X easier, there are already other useful ways to solve that problem (such as X1, X2). Similarly, the goals of isn’t very desirable in the first place because of A, B, and C.

I quickly found this structure rather limiting. It made it hard to compare the arguments – as you can see here, there are often “references” between the two sections (e.g., the con section refers to the argument X and tries to rebut it). Trying to compare and consider these points required a lot of jumping back and forth between the sections.

Using nested bullets to match up arguments

So I decided to restructure the document to integrate the arguments for and against. I created nesting to show when one point was directly in response to another. For example, it might read like this (this is not an actual point; those were much more detailed):

  • Pro: We should make this change because of X.
    • Con: However, there is already the option of X1 and X2 to satisfy that use-case.
      • Pro: But X1 and X2 suffer from Z.
  • Pro: We should make this change because of Y and Z.
    • Con: Those goals aren’t as important because of A, B, and C.

Furthermore, I tried to make the first bullet point a bit special – it would be the one that encapsulated the heart of the dispute, from my POV, with the later bullet points getting progressively more into the weeds.

Nested bullets felt better, but we can do better still I bet

I definitely preferred the structure of nested bullets to the original structure, but it didn’t feel perfect. For one thing, it requires me to summarize each argument into a single paragraph. Sometimes this felt “squished”. I didn’t love the repeated “pro” and “con”. Also, things don’t always fit neatly into a tree; sometimes I had to “cross-reference” between points on the tree (e.g., referencing another bullet that had a detailed look at the trade-offs).

If I were to do this again, I might tinker a bit more with the format. The most extreme option would be to try and use a “wiki-like” format. This would allow for free inter-linking, of course, and would let us hide details into a recursive structure. But I worry it’s too much freedom.

Adding “narratives” on top of the “core facts”

One thing I found that surprised me a bit: the summary document aimed to summarize the “core facts” of the discussion – in so doing, I hoped to summarize the two sides of the argument. But I found that facts alone cannot give a “complete” summary: to give a complete summary, you also need to present those facts “in context”. Or, put another way, you also need to explain the weighting that each side puts on the facts.

In other words, the document did a good job of enumerating the various concerns and “facets” of the discussion. But it didn’t do a good job of explaining why you might fall on one side or the other.

I tried to address this by crafting a “summary comment” on the main thread. This comment had a very specific form. It begin by trying to identify the “core tradeoff” – the crux of the disagreement:

So the core tradeoff here is this:

  • By leaving the design as is, we keep it as simple and ergonomic as it can be;
    • but, if we wish to pass implicit parameters to the future when polling, we must use TLS.

It then identifies some of the “facets” of the space which different people weight in different ways:

So, which way you fall will depend on

  • how important you think it is for Future to be ergonomic
    • and naturally how much of an ergonomic hit you believe this to be
    • how likely you think it is for us to want to add implicit parameters
    • how much of a problem you think it is to use TLS for those implicit parameters

And then it tried to tell a series of “narratives”. Basically to tell the story of each group that was involved and why that led them to assign different weights to those points above. Those weights in turn led to a different opinion on the overall issue.

For example:

I think a number of people feel that, by now, between Rust and other ecosystems, we have a pretty good handle on what sort of data we want to thread around and what the best way is to do it. Further, they feel that TLS or passing parameters explicitly is the best solution approach for those cases. Therefore, they prefer to leave the design as is, and keep things simple. (More details in the doc, of course.)

Or, on the other side:

Others, however, feel like there is additional data they want to pass implicitly and they do not feel convinced that TLS is the best choice, and that this concern outweights the ergonomic costs. Therefore, they would rather adopt the PR and keep our options open.

Finally, it’s worth noting that there aren’t always just two sides. In fact, in this case I identified a third camp:

Finally, I think there is a third position that says that this controversy just isn’t that important. The performance hit of TLS, if you wind up using it, seems to be minimal. Similarly, the clarity/ergonomics of Future are not as criticial, as users who write async fn will not implement it directly, and/or perhaps the effect is not so large. These folks probably could go either way, but would mostly like us to stop debating it and start building stuff. =)

One downside of writing the narratives in a standard summary comment was that it was not “part of” the main document. In fact, it feels to me like these narratives are a pretty key part of the whole thing. In fact, it was only once I added these narratives that I really felt I started to understand why one might choose one way or the other when it came to this decision.

If I were to do this again, I would make narratives more of a first-place entity in the document itself. I think I would also focus on some other “meta-level reasoning”, such as fears and risks. I think it’s worth thinking, for any given decision, “what if we make the wrong call” – e.g., in this case, what happens if we decide not to future proof, but then we regret it; in contrast, what happens if we decide to add future proofing, but we never use it.

We never achieved “shared ownership” of the summary

One of my goals was that we could, at least for a moment, disconnect people from their particular position and turn their attention towards the goal of achieving a shared and complete summary. I didn’t feel that we were very succesful in this goal.

For one thing, most participants simply left comments on parts they disagreed with; they didn’t themselves suggest alternate wording. That meant that I personally had to take their complaint and try to find some “middle ground” that accommodated the concern but preserved the original point. This was stressful for me and a lot of work. More importantly, it meant that most people continued to interact with the document as advocates for their point-of-view, rather than trying to step back and advocate for the completeness of the summary.

In other words: when you see a sentence you disagree with, it is easy to say that you disagree with it. It is much harder to rephrase it in a way that you do agree with – but which still preserves (what you believe to be) the original intent. Doing so requires you to think about what the other person likely meant, and how you can preserve that.

However, one possible reason that people may have been reluctant to offer suggestions is that, often, it was hard to make “small edits” that addressed people’s concerns. Especially early on, I found that, in order to address some comment, I would have to make larger restructurings. For example, taking a small sentence and expanding it to a bullet point of its own.

Finally, some people who were active on the thread didn’t participate in the doc. Or, if they did, they did so by leaving comments on the original GitHub thread. This is not surprising: I was asking people to do something new and unfamiliar. Also, this whole process played out relatively quickly, and I suspect some people just didn’t even see the document before it was done.

If I were to do this again, I would want to start it earlier in the process. I would also want to consider synchronous meetings, where we could go try to process edits as a group (but I think it would take some thought to figure out how to run such a meeting).

In terms of functioning asynchronously, I would probably change to use a Google Doc instead of a Dropbox Paper. Google Docs have a better workflow for suggesting edits, I believe, as well, as a richer permissions model.

Finally, I would try to draw a harder line in trying to get people to “own” the document and suggest edits of their own. I think the challenge of trying to neutrally represent someone else’s point of view is pretty powerful.

Concluding remarks

Conducting this exercise taught me some key lessons:

  • We should experiment with the best way to describe the back-and-forth (I found it better to put closely related points together, for example, rather than grouping the arguments into ‘pro and con’).
  • We should include not only the “core facts” but also the “narratives” that weave those facts together.
  • We should do this summary process earlier and we should try to find better ways to encourage participation.

Overall, I felt very good about the idea of “collaborative summary documents”. I think they are a clear improvement over the “summary comment”, which was the prior state of the art.

If nothing else, the quality of the summary itself was greatly improved by being a collaborative document. I felt like I had a pretty good understanding of the question when I started, but getting feedback from others on the things they felt I misunderstood, or just the places where my writing was unclear, was very useful.

But of course my aims run larger. I hope that we can change how design work feels, by encouraging all of us to deeply understand the design space (and to understand what motivates the other side). My experiment with this summary document left me feeling pretty convinced that it could be a part of the solution.

Feedback

I’ve created a discussion thread on the internals forum where you can leave questions or comments. I’ll definitely read them and I will try to respond, though I often get overwhelmed1, so don’t feel offended if I fail to do so.

Footnotes
  1. So many things, so little time.

Categorieën: Mozilla-nl planet

The Servo Blog: This Month In Servo 128

mo, 22/04/2019 - 02:30

In the past month, we merged 189 PRs in the Servo organization’s repositories.

Planning and Status

Our roadmap is available online, including the team’s plans for 2019.

This week’s status updates are here.

Exciting works in progress Notable Additions
  • ferjm added support for replaying media that has ended.
  • ferjm fixed a panic that could occur when playing audio on certain platforms.
  • nox ensured a source of unsafety in layout now panics instead of causing undefined behaviour.
  • soniasinglad removed the need for OpenSSL binaries to be present in order to run tests.
  • ceyusa implemented support for EGL-only hardware accelerated media playback.
  • Manishearth improved the transform-related parts of the WebXR implementation.
  • AZWN exposed some hidden unsafe behaviour in Promise-related APIs.
  • ferjm added support for using MediaStream as media sources.
  • georgeroman implemented support for the media canPlayType API.
  • JHBalaji and others added support for value curve automation in the WebAudio API.
  • jdm implemented a sampling profiler.
  • gterzian made the sampling profiler limit the total number of samples stored.
  • Manishearth fixed a race in the WebRTC backend.
  • kamal-umudlu added support for using the fullscreen capabilities of the OS for the Fullscreen API.
  • jdm extended the set of supported GStreamer packages on Windows.
  • pylbrecht added measurements for layout queries that are forced to wait on an ongoing layout operation to complete.
  • TheGoddessInari improved the MSVC detection in Servo’s build system on Windows.
  • sbansal3096 fixed a panic when importing a stylesheet via CSSOM APIs.
  • georgeroman implemented the missing XMLSerializer API.
  • KwanEsq fixed web compatibility issue with a CSSOM API.
  • aditj added support for the DeleteCookies WebDriver API.
  • peterjoel redesigned the preferences support to better support preferences at compile-time.
  • gterzian added a thread pool for the network code.
  • lucasfantacuci refactored a bunch of code that makes network requests to use a builder pattern.
  • cdeler implemented the missing DOMException constructor API.
  • gterzian and jdm added Linux support to the thread sampling implementation.
New Contributors

Interested in helping build a web browser? Take a look at our curated list of issues that are good for new contributors!

Categorieën: Mozilla-nl planet

Christopher Arnold

sn, 20/04/2019 - 00:59


At last year’s Game Developers Conference I had the chance to experience new immersive video environments that are being created by game developers releasing titles for the new Oculus and HTC Vive and Google Daydream platforms.  One developer at the conference, Opaque Mulitimedia, demonstrated "Earthlight" which gave the participant an opportunity to crawl on the outside of the International Space Station as the earth rotated below.  In the simulation, a Microsoft Kinect sensor was following the position of my hands.  But what I saw in the visor was that my hands were enclosed in an astronaut’s suit.  The visual experience was so compelling that when my hands missed the rungs of the ladder I felt a palpable sense of urgency because the environment was so realistically depicted.  (The space station was rendered as a scale model of the actual space station using the "Unreal" game physics engine.)  The experience was so far beyond what I’d experienced a decade ago with the crowd-sourced simulated environments like Second Life, where artists create 3D worlds in a server-hosted environment that other people could visit as avatars.  
Since that time I’ve seen some fascinating demonstrations at Mozilla’s Virtual Reality developer events.  I’ve had the chance to witness a 360 degree video of a skydive, used the WoofbertVR application to visit real art gallery collections displayed in a simulated art gallery, spectated a simulated launch and lunar landing of Apollo 11, and browsed 360 photography depicting dozens of fascinating destinations around the globe.  This is quite a compelling and satisfying way to experience visual splendor depicted spatially.  With the New York Times and  iMax now entering the industry, we can anticipate an incredible surfeit of media content to take us to places in the world we might never have a chance to go.
Still the experiences of these simulated spaces seems very ethereal.  Which brings me to another emerging field.  At Mozilla Festival in London a few years ago, I had a chance to meet Yasuaki Kakehi of Keio University in Japan, who was demonstrating a haptic feedback device called Techtile.  The Techtile was akin to a microphone for physical feedback that could then be transmitted over the web to another mirror device.  When he put marbles in one cup, another person holding an empty cup could feel the rattle of the marbles as if the same marble impacts were happening on the sides of the empty cup held by the observer.  The sense was so realistic, it was hard to believe that it was entirely synthesized and transmitted over the Internet.  Subsequently, at the Consumer Electronics Show, I witnessed another of these haptic speakers.  But this one conveyed the sense not by mirroring precise physical impacts, but by giving precisely timed pulses, which the holder could feel as an implied sense of force direction without the device actually moving the user's hand at all.  It was a haptic illusion instead of a precise physical sensation.
As haptics work advances it has potential to impact common everyday experiences beyond the theoretical and experimental demonstrations I experienced.  This year haptic devices are available in the new Honda cars on sale this year as Road Departure Mitigation, whereby steering wheels can simulate rumble strips on the sides of a lane just by sensing the painted lines on the pavement with cameras.I am also very excited to see this field expand to include music.  At Ryerson University's SMART lab, Dr. Maria Karam, Dr. Deborah Fels and Dr. Frank Russo applied the concepts of haptics and somatosensory depiction of music to people who didn't have the capability of appreciating music aurally.  Their first product, called the Emoti-chair breaks the frequency range of music to depict different audio qualities spatially to the listeners back.  This is based on the concept that the human cochlea is essentially a large coiled surface upon which sounds of different frequencies resonate and are felt at different locations.  While I don't have perfect pitch, I think having a spatial-perception of tonal scale would allow me to develop a cognitive sense of pitch correctness to compensate using a listening aid like this.  Fortunately, Dr. Karam is advancing this work to introduce new form factors to the commercial market in coming years.
Over many years I have had the chance to study various forms of folk percussion.  One of the most interesting drumming experiences I have had was a visit to Lombok, Indonesia where I had the chance to see a Gamelan performance in a small village along with the large Gendang Belek drums accompanying.  The Gendang Belek is a large barrel drum worn with a strap that goes over the shoulders.  When the drum is struck the reverberation is so fierce and powerful that it shakes the entire body, by resonating through the spine.  I had an opportunity to study Japanese Taiko while living in Japan.  The taiko, resonates in the listener by resonating in the chest.  But the experience of bone-conduction through the spine is altogether a more intense way to experience rhythm.
Because I am such an avid fan of physical experiences of music, I am frequently gravitating toward bassey music.  I tend to play it in a sub-woofer-heavy car stereo, or seek out experiences to hear this music in nightclub or festival performances where large speakers animate the lower frequencies of music.  I can imagine that if more people had the physical experience of drumming that I've had, instead of just the auditory experience of it, more people would enjoy making music themselves.

As more innovators like TADs Inc. (an offshoot of the Ryerson University project) bring physical experiences of music to the general consumer, I look forward to experiencing my music in greater depth.









Categorieën: Mozilla-nl planet

Christopher Arnold: How a speech-based internet will change our perceptions

sn, 20/04/2019 - 00:52
https://en.wikipedia.org/wiki/BeowulfA long time ago I remember reading Stephen Pinker discussing the evolution of language.  I had read Beowulf, Chaucer and Shakespeare, so I was quite interested in these linguistic adaptations over time.  Language shifts rapidly through the ages, to the  point that even English of 500 years ago sounds foreign to us now.  His thesis in the piece was about how language is going to shift toward the Chinese pronunciation of it.  Essentially, the majority of speakers will determine the rules of the language’s direction.  There are more Chinese in the world than native English speakers, so as they adopt and adapt the language, more of us will speak like the greater factions of our language’s custodians.  The future speakers of English, will determine its course.  By force of "majority rules", language will go in the direction of its greatest use, which will be the Pangea of the global populace seeking common linguistic currency with others of foreign tongues.  Just as the US dollar is an “exchange currency” standard at present between foreign economies, English is the shortest path between any two ESL speakers, no matter which background.
Subsequently, I heard these concepts reiterated in a Scientific American podcast.  The concept there being that English, when spoken by those who learned it as a second language, is easier for other speakers to understand than native-spoken English.  British, Indian, Irish, Aussie, New Zealand and American English are relics in a shift, very fast, away from all of them.  As much as we appreciate each, they are all toast.  Corners will be cut, idiomatic usage will be lost, as the fastest path to information conveyance determines that path that language takes in its evolution.  English will continue to be a mutt language flavored by those who adopt and co-opt it.  Ultimately meaning that no matter what the original language was, the common use of it will be the rules of the future.  So we can say goodbye to grammar as native speakers know it.  There is a greater shift happening than our traditions.  And we must brace as this evolution takes us with it to a linguistic future determined by others.
I’m a person who has greatly appreciated idiomatic and aphoristic usage of English.  So I’m one of those, now old codgers, who cringes at the gradual degradation of language.  But I’m listening to an evolution in process, a shift toward a language of broader and greater utility.  So the cringes I feel, are reactions to the time-saving adaptations of our language as it becomes something greater than it has been in the past.  Brits likely thought/felt the same as their linguistic empire expanded.  Now is just a slightly stranger shift.
This evening I was in the kitchen, and I decided to ask Amazon Alexa to play some Led Zeppelin.  This was a band that used to exist in the 1970’s era during which I grew up.  I knew their entire corpus very well.  So when I started hearing one of my favorite songs, I knew this was not what I had asked for.  It was a good rendering for sure, but it was not Robert Plant singing.  Puzzled, I asked Alexa who was playing.  She responded “Lez Zeppelin”.  This was a new band to me.  A very good cover band I admit.  (You can read about them here: http://www.lezzeppelin.com/)But why hadn't Alexa wanted to respond to my initial request?  Was it because Atlantic Records hadn't licensed Led Zeppelin's actual catalog for Amazon Prime subscribers?
Two things struck me.  First, we aren’t going to be tailoring our English to Chinese ESL common speech patterns as Mr. Pinker predicted.  We’re probably also going to be shifting our speech patterns to what Alexa, Siri, Cortana and Google Home can actually understand.  They are the new ESL vector that we hadn't anticipated a decade ago.  It is their use of English that will become conventional, as English is already the de facto language of computing, and therefore our language is now the slave to code.
What this means for that band (that used to be called Zeppelin) is that such entity will no longer be discoverable.  In the future, if people say “Led Zeppelin” to Alexa, she’ll respond with Lez Zeppelin (the rights-available version of the band formerly known as "Led Zeppelin").  Give humanity 100 years or so, and the idea of a band called Led Zeppelin will seem strange to folk.  Five generations removed, nobody will care who the original author was.  The "rights" holder will be irrelevant.  The only thing that will matter in 100 years is what the bot suggests.
Our language isn't ours.  It is the path to the convenient.  In bot speak, names are approximate and rights (ignoring the stalwart protectors) are meaningless.  Our concepts of trademarks, rights ownership, etc. are going to be steam-rolled by other factors, other "agents" acting at the user's behest.  The language and the needs of the spontaneous are immediate!
Categorieën: Mozilla-nl planet

Pages