mozilla

Mozilla Nederland LogoDe Nederlandse
Mozilla-gemeenschap

Abonneren op feed Mozilla planet
Planet Mozilla - http://planet.mozilla.org/
Bijgewerkt: 2 weken 2 dagen geleden

JavaScript at Mozilla: JavaScript Startup Bytecode Cache

di, 12/12/2017 - 15:00

We want to make Firefox load pages as fast as possible, to make sure that you can get all the goods from the loaded pages available as soon as possible on your screen.

The JavaScript Startup Bytecode Cache (JSBC) is an optimization made to improve the time to fully load a page. As with many optimizations, this is a trade-off between speed and memory. In exchange for faster page-load, we store extra data in a cache.

The Context

When Firefox loads a web page, it is likely that this web page will need to run some JavaScript code. This code is transferred to the browser from the network, from the network cache, or from a service worker.

JavaScript is a general purpose programming language that is used to improve web pages. It makes them more interactive, it can request dynamically loaded content, and it can improve the way web pages are programmed with libraries.

JavaScript libraries are collections of code that are quite wide in terms of scope and usage. Most of the code of these libraries is not used (~70%) while the web page is starting up. The web page’s startup lasts beyond the first paint, it goes beyond the point when all resources are loaded, and can even last a few seconds longer after the page feels ready to be used.

When all the bytes of one JavaScript source are received we run a syntax parser. The goal of the syntax parser is to raise JavaScript syntax errors as soon as possible. If the source is large enough, it is syntax parsed on a different thread to avoid blocking the rest of the web page.

As soon as we know that there are no syntax errors, we can start the execution by doing a full parse of the executed functions to generate bytecode. The bytecode is a format that simplifies the execution of the JavaScript code by an interpreter, and then by the Just-In-Time compilers (JITs). The bytecode is much larger than the source code, so Firefox only generates the bytecode of executed functions.

Sequence diagram describing a script request, with the overhead of the Syntax parser and the Full parser in the content process. The content process request a script through IPC, the cache reply with the source, the source is parsed, then full parsed, and executed.

The Design

The JSBC aims at improving the startup of web pages by saving the bytecode of used functions in the network cache.

Saving the bytecode in the cache removes the need for the syntax-parser and replaces the full parser with a decoder. A decoder has the advantages of being smaller and faster than a parser. Therefore, when the cache is present and valid, we can run less code and use less memory to get the result of the full parser.

Having a bytecode cache however causes two problems. The first problem concerns the cache. As JavaScript can be updated on the server, we have to ensure that the bytecode is up to date with the current source code. The second problem concerns the serialization and deserialization of JavaScript. As we have to render a page at the same time, we have to ensure that we never block the main loop used to render web pages.

Alternate Data Type

While designing the JSBC, it became clear that we should not re-implement a cache.

At first sight a cache sounds like something that maps a URL to a set of bytes. In reality, due to invalidation rules, disk space, the mirroring of the disk in RAM, and user actions, handling a cache can become a full time job.

Another way to implement a cache, as we currently do for Asm.js and WebAssembly, is to map the source content to the decoded / compiled version of the source. This is impractical for the JSBC for two reasons: invalidation and user actions would have to be propagated from the first cache to this other one; and we need the full source before we can get the bytecode, so this would race between parsing and a disk load, which due to Firefox’s sandbox rules will need to deal with interprocess communication (IPC).

The approach chosen for the JSBC wasn’t to implement any new caching mechanism, but to reuse the one already available in the network cache. The network cache is usually used to handle URLs except those handle by a service worker or those using some Firefox internal privileged protocols.

The bytecode is stored in the network cache alongside the source code as “alternate content”; the user of the cache can request either one.

To request a resource, a page that is sandboxed in a content process creates a channel. This channel is then mirrored through IPC in the parent process, which resolves the protocol and dispatches it to the network. If the resource is already available in the cache, the cached version is used after verifying the validity of the resource using the ETag provided by the server. The cached content is transferred through IPC back to the content process.

To request bytecode, the content process annotates the channel with a preferred data type. When this annotation is present, the parent process, which has access to the cache, will look for an alternate content with the same data type. If there is no alternate content or if the data type differs, then the original content (the JavaScript source) is transferred. Otherwise, the alternate content (the bytecode) is transferred back to the content process with an extra flag repeating the data type.

Sequence diagram describing a script request, with the overhead of the Syntax parser and the Full parser in the content process. The content process request a script through IPC, the cache reply with the source, the source is parsed, then full parsed, and executed.

To save the bytecode, the content process has to keep the channel alive after having requested an alternate data type. When the bytecode is ready to be encoded, it opens a stream to the parent process. The parent process will save the given stream as the alternate data for the resource that was initially requested.

This API was implemented in Firefox’s network stack by Valentin Gosu and Michal Novotny, which was necessary to make this work possible. The first advantage of this interface is that it can also be implemented by Service Workers, which is currently being added in Firefox 59 by Eden Chuang and the service worker team. The second advantage of this interface is that it is not specific to JavaScript at all, and we could also save other forms of cached content, such as decoded images or precompiled WebAssembly scripts.

Serialization & Deserialization

SpiderMonkey already had a serialization and deserialization mechanism named XDR. This part of the code was used in the past to encode and decode Firefox’s internal JavaScript files, to improve Firefox startup. Unfortunately, XDR serialization and deserialization cannot handle lazily-parsed JavaScript functions and block the main thread of execution.

Saving Lazy Functions

Since 2012, Firefox parses functions lazily. XDR was meant to encode fully parsed files. With the JSBC, we need to avoid parsing unused functions. Most of the functions that are shipped to users are either never used, or not used during page startup. Thus we added support for encoding functions the way they are represented when they are unused, which is with start and end offsets in the source. Thus unused functions will only consume a minimal amount of space in addition to that taken by the source code.

Once a web page has started up, or in case of a different execution path, the source might be needed to delazify functions that were not cached. As such, the source must be available without blocking. The solution is to embed the source within the bytecode cache content. Instead of storing the raw source, the same way it is served by the network cache, it is stored in UCS2 encoding as compressed chunks¹, the same way we represent it in memory.

Non-Blocking Transcoding

XDR is a main-thread only blocking process that can serialize and deserialize bytecode. Blocking the main thread is problematic on the web, as it hangs the browser, making it unresponsive until the operation finishes. Without rewriting XDR, we managed to make this work such that it does not block the event loop. Unfortunately, deserialization and serialization could not both be handled the same way.

Deserialization was the easier of the two. As we already support parsing JavaScript sources off the main thread, decoding is just a special case of parsing, which produces the same output with a different input. So if the decoded content is large enough, we will transfer it to another thread in order to be decoded without blocking the main thread.

Serialization was more difficult. As it uses resources handled by the garbage collector, it must remain on the main thread. Thus, we cannot use another thread as with deserialization. In addition, the garbage collector might reclaim the bytecode of unused functions, and some objects attached to the bytecode might be mutated after execution, such as the object held by the JSOP_OBJECT opcode. To work around these issues, we are incrementally encoding the bytecode as soon as it is created, before its first execution.

To incrementally encode with XDR without rewriting it, we encode each function separately, along with location markers where the encoded function should be stored. Before the first execution we encode the JavaScript file with all functions encoded as lazy functions. When the function is requested, we generate the bytecode and replace the encoded lazy functions with the version that contains the bytecode. Before saving the serialization in the cache, we replace all location markers with the actual content, thus linearizing the content as if we had produced it in a single pass.

The Outcome The Threshold

The JSBC is a trade-off between encoding/decoding time and memory usage, where the right balance is determined by the number of times the page is visited. As this is a trade-off, we have the ability to choose where to set the cursor, based on heuristics.

To find the threshold, we measured the time needed to encode, the time gained by decoding, and the distribution of cache hits. The best threshold is the value that minimizes the cost function over all page loads. Thus we are comparing the cost of loading a page without any optimization (x1), the cost of loading a page and encoding the bytecode (x1.02 — x1.1), and the cost of decoding the bytecode (x0.52 — x0.7). In the best/worst case, the cache hit threshold would be 1 or 2, if we only considered the time aspect of the equation.

As a human, it seems that we should not penalize the first visit of a website and save content in the disk. You might read a one time article on a page that you will never visit again, and saving the cached bytecode to disk for future visits sounds like a net loss. For this reason, the current threshold is set to encode the bytecode only on the 4th visit, thus making it available on the 5th and subsequent visits.

The Results

The JSBC is surprisingly effective, and instead of going deeper into explanations, let’s see how it behaves on real websites that frequently load JavaScript, such as Wikipedia, Google Search results, Twitter, Amazon and the Facebook Timeline.

This graph represents the average time between the start of navigation and when the onload event for each website is fired, with and without the JavaScript Startup Bytecode Cache (JSBC). The error bars are the first quartile, median and third quartile values, over the set of approximately 30 page loads for each configuration. These results were collected on a new profile with the apptelemetry addon and with the tracking protection enabled.

While this graph shows the improvement for all pages (wikipedia: 7.8%, google: 4.9%, twitter: 5.4%, amazon: 4.9% and facebook: 12%), this does not account for the fact that these pages continue to load even after the onload event. The JSBC is configured to capture the execution of scripts until the page becomes idle.

Telemetry results gathered during an experiment on Firefox Nightly’s users reported that when the JSBC is enabled, page loads are on average 43ms faster, while being effective on only half of the page loads.

The JSBC neither improves benchmarks nor regresses them. Benchmarks’ behaviour does not represent what users actually do when visiting a website — they would not reload the pages 15 times to check the number of CPU cycles. The JSBC is tuned to capture everything until the pages becomes idle. Benchmarks are tuned to avoid having an impatient developer watching a blank screen for ages, and thus they do not wait for the bytecode to be saved before starting over.

 
Thanks to Benjamin Bouvier and Valentin Gosu for proofreading this blog post and suggesting improvements, and a special thank you to Steve Fink and Jérémie Patonnier for improving this blog post.

¹ compressed chunk: Due to a recent regression this is no longer the case, and it might be interesting for a new contributor to fix.

Categorieën: Mozilla-nl planet

Daniel Stenberg: The curl year 2017

di, 12/12/2017 - 10:43

I’m about to take an extended vacation for the rest of the year and into the beginning of the next, so I decided I’d sum up the year from a curl angle already now, a few weeks early. (So some numbers will grow a bit more after this post.)

2017

So what did we do this year in the project, how did curl change?

The first curl release of the year was version 7.53.0 and the last one was 7.57.0. In the separate blog posts on 7.55.0, 7.56.0 and 7.57.0 you’ll note that we kept up adding new goodies and useful features. We produced a total of 9 releases containing 683 bug fixes. We announced twelve security problems. (Down from 24 last year.)

At least 125 different authors wrote code that was merged into curl this year, in the 1500 commits that were made. We never had this many different authors during a single year before in the project’s entire life time! (The 114 authors during 2016 was the previous all-time high.)

We added more than 160 new names to the THANKS document for their help in improving curl. The total amount of contributors is now over 1660.

This year we truly started to use travis for CI builds and grew from a mere two builds per commit and PR up to nineteen (with additional ones run on appveyor and elsewhere). The current build set is a very good verification that that most things still compile and work after a PR is merged. (see also the testing curl article).

Mozilla announced that they too will use colon-slash-slash in their logo. Of course we all know who had it that in their logo first… =)

 

In March 2017, we had our first ever curl get-together as we arranged curl up 2017 a weekend in Nuremberg, Germany. It was very inspiring and meeting parts of the team in real life was truly a blast. This was so good we intend to do it again: curl up 2018 will happen.

curl turned 19 years old in March. In May it surpassed 5,000 stars on github.

Also in May, we moved over the official curl site (and my personal site) to get hosted by Fastly. We were beginning to get problems to handle the bandwidth and load, and in one single step all our worries were graciously taken care of!

We got curl entered into the OSS-fuzz project, and Max Dymond even got a reward from Google for his curl-fuzzing integration work and thanks to that project throwing heaps of junk at libcurl’s APIs we’ve found and fixed many issues.

The source code (for the tool and library only) is now at about 143,378 lines of code. It grew around 7,057 lines during the year. The primary reasons for the code growth were:

  1. the new libssh-powered SSH backend (not yet released)
  2. the new mime API (in 7.56.0) and
  3. the new multi-SSL backend support (also in 7.56.0).
Your maintainer’s view

Oh what an eventful year it has been for me personally.

The first interim meeting for QUIC took place in Japan, and I participated from remote. After all, I’m all set on having curl support QUIC and I’ll keep track of where the protocol is going! I’ve participated in more interim meetings after that, all from remote so far.

I talked curl on the main track at FOSDEM in early February (and about HTTP/2 in the Mozilla devroom). I’ve then followed that up and have also had the pleasure to talk in front of audiences in Stockholm, Budapest, Jönköping and Prague through-out the year.

 

I went to London and “represented curl” in the third edition of the HTTP workshop, where HTTP protocol details were discussed and disassembled, and new plans for the future of HTTP were laid out.

 

In late June I meant to go to San Francisco to a Mozilla “all hands” conference but instead I was denied to board the flight. That event got a crazy amount of attention and I received massive amounts of love from new and old friends. I have not yet tried to enter the US again, but my plan is to try again in 2018…

I wrote and published my h2c tool, meant to help developers convert a set of HTTP headers into a working curl command line.

The single occasion that overshadows all other events and happenings for me this year by far, was without doubt when I was awarded the Polhem Prize and got a gold medal medal from no other than his majesty the King of Sweden himself. For all my work and years spent on curl no less.

Not really curl related, but in November I was also glad to be part of the huge Firefox Quantum release. The biggest Firefox release ever, and one that has been received really well.

I’ve managed to commit over 800 changes to curl through the year, which is 54% of the totals and more commits than I’ve done in curl during a single year since 2005 (in which I did 855 commits). I explain this increase mostly on inspiration from curl up and the prize, but I think it also happened thanks to excellent feedback and motivation brought by my fellow curl hackers.

We’re running towards the end of 2017 with me being the individual who did most commits in curl every single month for the last 28 months.

2018?

More things to come!

Categorieën: Mozilla-nl planet

Emily Dunham: Some northwest area tech conferences and their approximate dates

di, 12/12/2017 - 09:00
Some northwest area tech conferences and their approximate dates

Somebody asked me recently about what conferences a developer in the pacific northwest looking to attend more FOSS events should consider. Here’s an incomplete list of conferences I’ve attended or hear good things about, plus the approximate times of year to expect their CFPs.

The Southern California Linux Expo (SCaLE) is a large, established Linux and FOSS conference in Pasadena, California. Look for its CFP at socallinuxexpo.org in September, and expect the conference to be scheduled in late February or early March each year.

If you don’t mind a short flight inland, OpenWest is a similar conference held in Utah each year. Look for its CFP in March at openwest.org, and expect the conference to happen around July. I especially enjoy the way that OpenWest brings the conference scene to a bunch of fantastic technologists who don’t always make it onto the national or international conference circuit.

Moving northward, there are a couple DevOps Days conferences in the area: Look for a PDX DevOps Days CFP around March and conference around September, and keep an eye out in case Boise DevOps Days returns.

If you’re into a balance of intersectional community and technical content, consider OSBridge (opensourcebridge.org) held in Portland around June, and OSFeels (osfeels.com) held around July in Seattle.

In Washington state, LinuxFest Northwest (CFP around December, conference around April, linuxfestnorthwest.org) in Bellingham, and SeaGL (seagl.org, CFP around June, conference around October) in Seattle are solid grass-roots FOSS conferences. For infosec in the area, consider toorcamp (toorcamp.toorcon.net, registration around March, conference around June) in the San Juan Islands.

And finally, if a full conference seems like overkill, considering attending a BarCamp event in your area. Portland has CAT BarCamp (catbarcamp.org) at Portland State University around October, and Corvallis has Beaver BarCamp (beaverbarcamp.org) each April.

This is by no means a complete list of conferences in the area, and I haven’t even tried to list the myriad specialized events that spring up around any technology. Meetup, and calagator.org for the Portland area, are also great places to find out about meetups and events.

Categorieën: Mozilla-nl planet

This Week In Rust: This Week in Rust 212

di, 12/12/2017 - 06:00

Hello and welcome to another issue of This Week in Rust! Rust is a systems language pursuing the trifecta: safety, concurrency, and speed. This is a weekly summary of its progress and community. Want something mentioned? Tweet us at @ThisWeekInRust or send us a pull request. Want to get involved? We love contributions.

This Week in Rust is openly developed on GitHub. If you find any errors in this week's issue, please submit a PR.

Updates from Rust Community News & Blog Posts Crate of the Week

This week's crate is printpdf, a pure Rust PDF-writing library that already has a lot of features (though I note a lot of bool-taking methods). Thanks to Felix Schütt for the suggestion!

Submit your suggestions and votes for next week!

Call for Participation

Always wanted to contribute to open-source projects but didn't know where to start? Every week we highlight some tasks from the Rust community for you to pick and get started!

Some of these tasks may also have mentors available, visit the task page for more information.

If you are a Rust project owner and are looking for contributors, please submit tasks here.

Updates from Rust Core

105 pull requests were merged in the last week

New Contributors
  • Agustin Chiappe Berrini
  • Jonathan Strong
  • JRegimbal
  • Timo
Approved RFCs

Changes to Rust follow the Rust RFC (request for comments) process. These are the RFCs that were approved for implementation this week:

No RFCs were approved this week.

Final Comment Period

Every week the team announces the 'final comment period' for RFCs and key PRs which are reaching a decision. Express your opinions now. This week's FCPs are:

New RFCs Upcoming Events

If you are running a Rust event please add it to the calendar to get it mentioned here. Email the Rust Community Team for access.

Rust Jobs

Tweet us at @ThisWeekInRust to get your job offers listed here!

Quote of the Week

Although rusting is generally a negative aspect of iron, a particular form of rusting, known as “stable rust,” causes the object to have a thin coating of rust over the top, and if kept in low relative humidity, makes the “stable” layer protective to the iron below

Wikipedia on rusting of iron.

Thanks to leodasvacas for the suggestion!

Submit your quotes for next week!

This Week in Rust is edited by: nasa42 and llogiq.

Categorieën: Mozilla-nl planet

Gregory Szorc: High-level Problems with Git and How to Fix Them

ma, 11/12/2017 - 11:30

I have a... complicated relationship with Git.

When Git first came onto the scene in the mid 2000's, I was initially skeptical because of its horrible user interface. But once I learned it, I appreciated its speed and features - especially the ease at which you could create feature branches, merge, and even create commits offline (which was a big deal in the era when Subversion was the dominant version control tool in open source and you needed to speak with a server in order to commit code). When I started using Git day-to-day, it was such an obvious improvement over what I was using before (mainly Subversion and even CVS).

When I started working for Mozilla in 2011, I was exposed to the Mercurial version control, which then - and still today - hosts the canonical repository for Firefox. I didn't like Mercurial initially. Actually, I despised it. I thought it was slow and its features lacking. And I frequently encountered repository corruption.

My first experience learning the internals of both Git and Mercurial came when I found myself hacking on hg-git - a tool that allows you to convert Git and Mercurial repositories to/from each other. I was hacking on hg-git so I could improve the performance of converting Mercurial repositories to Git repositories. And I was doing that because I wanted to use Git - not Mercurial - to hack on Firefox. I was trying to enable an unofficial Git mirror of the Firefox repository to synchronize faster so it would be more usable. The ulterior motive was to demonstrate that Git is a superior version control tool and that Firefox should switch its canonical version control tool from Mercurial to Git.

In what is a textbook definition of irony, what happened instead was I actually learned how Mercurial worked, interacted with the Mercurial Community, realized that Mozilla's documentation and developer practices were... lacking, and that Mercurial was actually a much, much more pleasant tool to use than Git. It's an old post, but I summarized my conversion four and a half years ago. This started a chain of events that somehow resulted in me contributing a ton of patches to Mercurial, taking stewardship of hg.mozilla.org, and becoming a member of the Mercurial Steering Committee - the governance group for the Mercurial Project.

I've been an advocate of Mercurial over the years. Some would probably say I'm a Mercurial fanboy. I reject that characterization because fanboy has connotations that imply I'm ignorant of realities. I'm well aware of Mercurial's faults and weaknesses. I'm well aware of Mercurial's relative lack of popularity, I'm well aware that this lack of popularity almost certainly turns away contributors to Firefox and other Mozilla projects because people don't want to have to learn a new tool. I'm well aware that there are changes underway to enable Git to scale to very large repositories and that these changes could threaten Mercurial's scalability advantages over Git, making choices to use Mercurial even harder to defend. (As an aside, the party most responsible for pushing Git to adopt architectural changes to enable it to scale these days is Microsoft. Could anyone have foreseen that?!)

I've achieved mastery in both Git and Mercurial. I know their internals and their command line interfaces extremely well. I understand the architecture and principles upon which both are built. I'm also exposed to some very experienced and knowledgeable people in the Mercurial Community. People who have been around version control for much, much longer than me and have knowledge of random version control tools you've probably never heard of. This knowledge and exposure allows me to make connections and see opportunities for version control that quite frankly most do not.

In this post, I'll be talking about some high-level, high-impact problems with Git and possible solutions for them. My primary goal of this post is to foster positive change in Git and the services around it. While I personally prefer Mercurial, improving Git is good for everyone. Put another way, I want my knowledge and perspective from being part of a version control community to be put to good wherever it can.

Speaking of Mercurial, as I said, I'm a heavy contributor and am somewhat influential in the Mercurial Community. I want to be clear that my opinions in this post are my own and I'm not speaking on behalf of the Mercurial Project or the larger Mercurial Community. I also don't intend to claim that Mercurial is holier-than-thou. Mercurial has tons of user interface failings and deficiencies. And I'll even admit to being frustrated that some systemic failings in Mercurial have gone unaddressed for as long as they have. But that's for another post. This post is about Git. Let's get started.

The Staging Area

The staging area is a feature that should not be enabled in the default Git configuration.

Most people see version control as an obstacle standing in the way of accomplishing some other task. They just want to save their progress towards some goal. In other words, they want version control to be a save file feature in their workflow.

Unfortunately, modern version control tools don't work that way. For starters, they require people to specify a commit message every time they save. This in of itself can be annoying. But we generally accept that as the price you pay for version control: that commit message has value to others (or even your future self). So you must record it.

Most people want the barrier to saving changes to be effortless. A commit message is already too annoying for many users! The Git staging area establishes a higher barrier to saving. Instead of just saving your changes, you must first stage your changes to be saved.

If you requested save in your favorite GUI application, text editor, etc and it popped open a select the changes you would like to save dialog, you would rightly think just save all my changes already, dammit. But this is exactly what Git does with its staging area! Git is saying I know all the changes you made: now tell me which changes you'd like to save. To the average user, this is infuriating because it works in contrast to how the save feature works in almost every other application.

There is a counterargument to be made here. You could say that the editor/application/etc is complex - that it has multiple contexts (files) - that each context is independent - and that the user should have full control over which contexts (files) - and even changes within those contexts - to save. I agree: this is a compelling feature. However, it isn't an appropriate default feature. The ability to pick which changes to save is a power-user feature. Most users just want to save all the changes all the time. So that should be the default behavior. And the Git staging area should be an opt-in feature.

If intrinsic workflow warts aren't enough, the Git staging area has a horrible user interface. It is often referred to as the cache for historical reasons. Cache of course means something to anyone who knows anything about computers or programming. And Git's use of cache doesn't at all align with that common definition. Yet the the terminology in Git persists. You have to run commands like git diff --cached to examine the state of the staging area. Huh?!

But Git also refers to the staging area as the index. And this terminology also appears in Git commands! git help commit has numerous references to the index. Let's see what git help glossary has to say::

index A collection of files with stat information, whose contents are stored as objects. The index is a stored version of your working tree. Truth be told, it can also contain a second, and even a third version of a working tree, which are used when merging. index entry The information regarding a particular file, stored in the index. An index entry can be unmerged, if a merge was started, but not yet finished (i.e. if the index contains multiple versions of that file).

In terms of end-user documentation, this is a train wreck. It tells the lay user absolutely nothing about what the index actually is. Instead, it casually throws out references to stat information (requires the user know what the stat() function call and struct are) and objects (a Git term for a piece of data stored by Git). It even undermines its own credibility with that truth be told sentence. This definition is so bad that it would probably improve user understanding if it were deleted!

Of course, git help index says No manual entry for gitindex. So there is literally no hope for you to get a concise, understandable definition of the index. Instead, it is one of those concepts that you think you learn from interacting with it all the time. Oh, when I git add something it gets into this state where git commit will actually save it.

And even if you know what the Git staging area/index/cached is, it can still confound you. Do you know the interaction between uncommitted changes in the staging area and working directory when you git rebase? What about git checkout? What about the various git reset invocations? I have a confession: I can't remember all the edge cases either. To play it safe, I try to make sure all my outstanding changes are committed before I run something like git rebase because I know that will be safe.

The Git staging area doesn't have to be this complicated. A re-branding away from index to staging area would go a long way. Adding an alias from git diff --staged to git diff --cached and removing references to the cache from common user commands would make a lot of sense and reduce end-user confusion.

Of course, the Git staging area doesn't really need to exist at all! The staging area is essentially a soft commit. It performs the save progress role - the basic requirement of a version control tool. And in some aspects it is actually a better save progress implementation than a commit because it doesn't require you to type a commit message! Because the staging area is a soft commit, all workflows using it can be modeled as if it were a real commit and the staging area didn't exist at all! For example, instead of git add --interactive + git commit, you can run git commit --interactive. Or if you wish to incrementally add new changes to an in-progress commit, you can run git commit --amend or git commit --amend --interactive or git commit --amend --all. If you actually understand the various modes of git reset, you can use those to uncommit. Of course, the user interface to performing these actions in Git today is a bit convoluted. But if the staging area didn't exist, new high-level commands like git amend and git uncommit could certainly be invented.

To the average user, the staging area is a complicated concept. I'm a power user. I understand its purpose and how to harness its power. Yet when I use Mercurial (which doesn't have a staging area), I don't miss the staging area at all. Instead, I learn that all operations involving the staging area can be modeled as other fundamental primitives (like commit amend) that you are likely to encounter anyway. The staging area therefore constitutes an unnecessary burden and cognitive load on users. While powerful, its complexity and incurred confusion does not justify its existence in the default Git configuration. The staging area is a power-user feature and should be opt-in by default.

Branches and Remotes Management is Complex and Time-Consuming

When I first used Git (coming from CVS and Subversion), I thought branches and remotes were incredible because they enabled new workflows that allowed you to easily track multiple lines of work across many repositories. And ~10 years later, I still believe the workflows they enable are important. However, having amassed a broader perspective, I also believe their implementation is poor and this unnecessarily confuses many users and wastes the time of all users.

My initial zen moment with Git - the time when Git finally clicked for me - was when I understood Git's object model: that Git is just a content indexed key-value store consisting of a different object types (blobs, trees, and commits) that have a particular relationship with each other. Refs are symbolic names pointing to Git commit objects. And Git branches - both local and remote - are just refs having a well-defined naming convention (refs/heads/<name> for local branches and refs/remotes/<remote>/<name> for remote branches). Even tags and notes are defined via refs.

Refs are a necessary primitive in Git because the Git storage model is to throw all objects into a single, key-value namespace. Since the store is content indexed and the key name is a cryptographic hash of the object's content (which for all intents and purposes is random gibberish to end-users), the Git store by itself is unable to locate objects. If all you had was the key-value store and you wanted to find all commits, you would need to walk every object in the store and read it to see if it is a commit object. You'd then need to buffer metadata about those objects in memory so you could reassemble them into say a DAG to facilitate looking at commit history. This approach obviously doesn't scale. Refs short-circuit this process by providing pointers to objects of importance. It may help to think of the set of refs as an index into the Git store.

Refs also serve another role: as guards against garbage collection. I won't go into details about loose objects and packfiles, but it's worth noting that Git's key-value store also behaves in ways similar to a generational garbage collector like you would find in programming languages such as Java and Python. The important thing to know is that Git will garbage collect (read: delete) objects that are unused. And the mechanism it uses to determine which objects are unused is to iterate through refs and walk all transitive references from that initial pointer. If there is an object in the store that can't be traced back to a ref, it is unreachable and can be deleted.

Reflogs maintain the history of a value for a ref: for each ref they contain a log of what commit it was pointing to, when that pointer was established, who established it, etc. Reflogs serve two purposes: facilitating undoing a previous action and holding a reference to old data to prevent it from being garbage collected. The two use cases are related: if you don't care about undo, you don't need the old reference to prevent garbage collection.

This design of Git's store is actually quite sensible. It's not perfect (nothing is). But it is a solid foundation to build a version control tool (or even other data storage applications) on top of.

The title of this section has to do with sub-optimal branches and remotes management. But I've hardly said anything about branches or remotes! And this leads me to my main complaint about Git's branches and remotes: that they are very thin veneer over refs. The properties of Git's underlying key-value store unnecessarily bleed into user-facing concepts (like branches and remotes) and therefore dictate sub-optimal practices. This is what's referred to as a leaky abstraction.

I'll give some examples.

As I stated above, many users treat version control as a save file step in their workflow. I believe that any step that interferes with users saving their work is user hostile. This even includes writing a commit message! I already argued that the staging area significantly interferes with this critical task. Git branches do as well.

If we were designing a version control tool from scratch (or if you were a new user to version control), you would probably think that a sane feature/requirement would be to update to any revision and start making changes. In Git speak, this would be something like git checkout b201e96f, make some file changes, git commit. I think that's a pretty basic workflow requirement for a version control tool. And the workflow I suggested is pretty intuitive: choose the thing to start working on, make some changes, then save those changes.

Let's see what happens when we actually do this:

$ git checkout b201e96f Note: checking out 'b201e96f'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example: git checkout -b <new-branch-name> HEAD is now at b201e96f94... Merge branch 'rs/config-write-section-fix' into maint $ echo 'my change' >> README.md $ git commit -a -m 'my change' [detached HEAD aeb0c997ff] my change 1 file changed, 1 insertion(+) $ git push indygreg fatal: You are not currently on a branch. To push the history leading to the current (detached HEAD) state now, use git push indygreg HEAD:<name-of-remote-branch> $ git checkout master Warning: you are leaving 1 commit behind, not connected to any of your branches: aeb0c997ff my change If you want to keep it by creating a new branch, this may be a good time to do so with: git branch <new-branch-name> aeb0c997ff Switched to branch 'master' Your branch is up to date with 'origin/master'.

I know what all these messages mean because I've mastered Git. But if you were a newcomer (or even a seasoned user), you might be very confused. Just so we're on the same page, here is what's happening (along with some commentary).

When I run git checkout b201e96f, Git is trying to tell me that I'm potentially doing something that could result in the loss of my data. A golden rule of version control tools is don't lose the user's data. When I run git checkout, Git should be stating the risk for data loss very clearly. But instead, the If you want to create a new branch sentence is hiding this fact by instead phrasing things around retaining commits you create rather than the possible loss of data. It's up to the user to make the connection that retaining commits you create actually means don't eat my data. Preventing data loss is critical and Git should not mince words here!

The git commit seems to work like normal. However, since we're in a detached HEAD state (a phrase that is likely gibberish to most users), that commit isn't referred to by any ref, so it can be lost easily. Git should be telling me that I just committed something it may not be able to find in the future. But it doesn't. Again, Git isn't being as protective of my data as it needs to be.

The failure in the git push command is essentially telling me I need to give things a name in order to push. Pushing is effectively remote save. And I'm going to apply my reasoning about version control tools not interfering with save to pushing as well: Git is adding an extra barrier to remote save by refusing to push commits without a branch attached and by doing so is being user hostile.

Finally, we git checkout master to move to another commit. Here, Git is actually doing something halfway reasonable. It is telling me I'm leaving commits behind, which commits those are, and the command to use to keep those commits. The warning is good but not great. I think it needs to be stronger to reflect the risk around data loss if that suggested Git commit isn't executed. (Of course, the reflog for HEAD will ensure that data isn't immediately deleted. But users shouldn't need to involve reflogs to not lose data that wasn't rewritten.)

The point I want to make is that Git doesn't allow you to just update and save. Because its dumb store requires pointers to relevant commits (refs) and because that requirement isn't abstracted away or paved over by user-friendly features in the frontend, Git is effectively requiring end-users to define names (branches) for all commits. If you fail to define a name, it gets a lot harder to find your commits, exchange them, and Git may delete your data. While it is technically possible to not create branches, the version control tool is essentially unusable without them.

When local branches are exchanged, they appear as remote branches to others. Essentially, you give each instance of the repository a name (the remote). And branches/refs fetched from a named remote appear as a ref in the ref namespace for that remote. e.g. refs/remotes/origin holds refs for the origin remote. (Git allows you to not have to specify the refs/remotes part, so you can refer to e.g. refs/remotes/origin/master as origin/master.)

Again, if you were designing a version control tool from scratch or you were a new Git user, you'd probably think remote refs would make good starting points for work. For example, if you know you should be saving new work on top of the master branch, you might be inclined to begin that work by running git checkout origin/master. But like our specific-commit checkout above:

$ git checkout origin/master Note: checking out 'origin/master'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example: git checkout -b <new-branch-name> HEAD is now at 95ec6b1b33... RelNotes: the eighth batch

This is the same message we got for a direct checkout. But we did supply a ref/remote branch name. What gives? Essentially, Git tries to enforce that the refs/remotes/ namespace is read-only and only updated by operations that exchange data with a remote, namely git fetch, git pull, and git push.

For this to work correctly, you need to create a new local branch (which initially points to the commit that refs/remotes/origin/master points to) and then switch/activate that local branch.

I could go on talking about all the subtle nuances of how Git branches are managed. But I won't.

If you've used Git, you know you need to use branches. You may or may not recognize just how frequently you have to type a branch name into a git command. I guarantee that if you are familiar with version control tools and workflows that aren't based on having to manage refs to track data, you will find Git's forced usage of refs and branches a bit absurd. I half jokingly refer to Git as Game of Refs. I say that because coming from Mercurial (which doesn't require you to name things), Git workflows feel to me like all I'm doing is typing the names of branches and refs into git commands. I feel like I'm wasting my precious time telling Git the names of things only because this is necessary to placate the leaky abstraction of Git's storage layer which requires references to relevant commits.

Git and version control doesn't have to be this way.

As I said, my Mercurial workflow doesn't rely on naming things. Unlike Git, Mercurial's store has an explicit (not shared) storage location for commits (changesets in Mercurial parlance). And this data structure is ordered, meaning a changeset later always occurs after its parent/predecessor. This means that Mercurial can open a single file/index to quickly find all changesets. Because Mercurial doesn't need pointers to commits of relevance, names aren't required.

My Zen of Mercurial moment came when I realized you didn't have to name things in Mercurial. Having used Git before Mercurial, I was conditioned to always be naming things. This is the Git way after all. And, truth be told, it is common to name things in Mercurial as well. Mercurial's named branches were the way to do feature branches in Mercurial for years. Some used the MQ extension (essentially a port of quilt), which also requires naming individual patches. Git users coming to Mercurial were missing Git branches and Mercurial's bookmarks were a poor port of Git branches.

But recently, more and more Mercurial users have been coming to the realization that names aren't really necessary. If the tool doesn't actually require naming things, why force users to name things? As long as users can find the commits they need to find, do you actually need names?

As a demonstration, my Mercurial workflow leans heavily on the hg show work and hg show stack commands. You will need to enable the show extension by putting the following in your hgrc config file to use them:

[extensions] show =

Running hg show work (I have also set the config commands.show.aliasprefix=sto enable me to type hg swork) finds all in-progress changesets and other likely-relevant changesets (those with names and DAG heads). It prints a concise DAG of those changesets:

hg show work output

And hg show stack shows just the current line of work and its relationship to other important heads:

hg show stack output

Aside from the @ bookmark/name set on that top-most changeset, there are no names! (That @ comes from the remote repository, which has set that name.)

Outside of code archeology workflows, hg show work shows the changesets I care about 95% of the time. With all I care about (my in-progress work and possible rebase targets) rendered concisely, I don't have to name things because I can just find whatever I'm looking for by running hg show work! Yes, you need to run hg show work, visually scan for what you are looking for, and copy a (random) hash fragment into a number of commands. This sounds like a lot of work. But I believe it is far less work than naming things. Only when you practice this workflow do you realize just how much time you actually spend finding and then typing names in to hg and - especailly - git commands! The ability to just hg update to a changeset and commit without having to name things is just so liberating. It feels like my version control tool is putting up fewer barriers and letting me work quickly.

Another benefit of hg show work and hg show stack are that they present a concise DAG visualization to users. This helps educate users about the underlying shape of repository data. When you see connected nodes on a graph and how they change over time, it makes it a lot easier to understand concepts like merge and rebase.

This nameless workflow may sound radical. But that's because we're all conditioned to naming things. I initially thought it was crazy as well. But once you have a mechanism that gives you rapid access to data you care about (hg show work in Mercurial's case), names become very optional. Now, a pure nameless workflow isn't without its limitations. You want names to identify the main targets for work (e.g. the master branch). And when you exchange work with others, names are easier to work with, especially since names survive rewriting. But in my experience, most of my commits are only exchanged with me (synchronizing my in-progress commits across devices) and with code review tools (which don't really need names and can operate against raw commits). My most frequent use of names comes when I'm in repository maintainer mode and I need to ensure commits have names for others to reference.

Could Git support nameless workflows? In theory it can.

Git needs refs to find relevant commits in its store. And the wire protocol uses refs to exchange data. So refs have to exist for Git to function (assuming Git doesn't radically change its storage and exchange mechanisms to mitigate the need for refs, but that would be a massive change and I don't see this happening).

While there is a fundamental requirement for refs to exist, this doesn't necessarily mean that user-facing names must exist. The reason that we need branches today is because branches are little more than a ref with special behavior. It is theoretically possible to invent a mechanism that transparently maps nameless commits onto refs. For example, you could create a refs/nameless/ namespace that was automatically populated with DAG heads that didn't have names attached. And Git could exchange these refs just like it can branches today. It would be a lot of work to think through all the implications and to design and implement support for nameless development in Git. But I think it is possible.

I encourage the Git community to investigate supporting nameless workflows. Having adopted this workflow in Mercurial, Git's workflow around naming branches feels heavyweight and restrictive to me. Put another way, nameless commits are actually lighter-weight branches than Git branches! To the common user who just wants version control to be a save feature, requiring names establishes a barrier towards that goal. So removing the naming requirement would make Git simpler and more approachable to new users.

Forks aren't the Model You are Looking For

This section is more about hosted Git services (like GitHub, Bitbucket, and GitLab) than Git itself. But since hosted Git services are synonymous with Git and interaction with a hosted Git services is a regular part of a common Git user's workflow, I feel like I need to cover it. (For what it's worth, my experience at Mozilla tells me that a large percentage of people who say I prefer Git or we should use Git actually mean I like GitHub. Git and GitHub/Bitbucket/GitLab are effectively the same thing in the minds of many and anyone finding themselves discussing version control needs to keep this in mind because Git is more than just the command line tool: it is an ecosystem.)

I'll come right out and say it: I think forks are a relatively poor model for collaborating. They are light years better than what existed before. But they are still so far from the turn-key experience that should be possible. The fork hasn't really changed much since the current implementation of it was made popular by GitHub many years ago. And I view this as a general failure of hosted services to innovate.

So we have a shared understanding, a fork (as implemented on GitHub, Bitbucket, GitLab, etc) is essentially a complete copy of a repository (a git clone if using Git) and a fresh workspace for additional value-added services the hosting provider offers (pull requests, issues, wikis, project tracking, release tracking, etc). If you open the main web page for a fork on these services, it looks just like the main project's. You know it is a fork because there are cosmetics somewhere (typically next to the project/repository name) saying forked from.

Before service providers adopted the fork terminology, fork was used in open source to refer to a splintering of a project. If someone or a group of people didn't like the direction a project was taking, wanted to take over ownership of a project because of stagnation, etc, they would fork it. The fork was based on the original (and there may even be active collaboration between the fork and original), but the intent of the fork was to create distance between the original project and its new incantation. A new entity that was sufficiently independent of the original.

Forks on service providers mostly retain this old school fork model. The fork gets a new copy of issues, wikis, etc. And anyone who forks establishes what looks like an independent incantation of a project. It's worth noting that the execution varies by service provider. For example, GitHub won't enable Issues for a fork by default, thereby encouraging people to file issues against the upstream project it was forked from. (This is good default behavior.)

And I know why service providers (initially) implemented things this way: it was easy. If you are building a product, it's simpler to just say a user's version of this project is a git clone and they get a fresh database. On a technical level, this meets the traditional definition of fork. And rather than introduce a new term into the vernacular, they just re-purposed fork (albeit with softer connotations, since the traditional fork commonly implied there was some form of strife precipitating a fork).

To help differentiate flavors of forks, I'm going to define the terms soft fork and hard fork. A soft fork is a fork that exists for purposes of collaboration. The differentiating feature between a soft fork and hard fork is whether the fork is intended to be used as its own project. If it is, it is a hard fork. If not - if all changes are intended to be merged into the upstream project and be consumed from there - it is a soft fork.

I don't have concrete numbers, but I'm willing to wager that the vast majority of forks on Git service providers which have changes are soft forks rather than hard forks. In other words, these forks exist purely as a conduit to collaborate with the canonical/upstream project (or to facilitate a short-lived one-off change).

The current implementation of fork - which borrows a lot from its predecessor of the same name - is a good - but not great - way to facilitate collaboration. It isn't great because it technically resembles what you'd expect to see for hard fork use cases even though it is used predominantly with soft forks. This mismatch creates problems.

If you were to take a step back and invent your own version control hosted service and weren't tainted by exposure to existing services and were willing to think a bit beyond making it a glorified frontend for the git command line interface, you might realize that the problem you are solving - the product you are selling - is collaboration as a service, not a Git hosting service. And if your product is collaboration, then implementing your collaboration model around the hard fork model with strong barriers between the original project and its forks is counterproductive and undermines your own product. But this is how GitHub, Bitbucket, GitLab, and others have implemented their product!

To improve collaboration on version control hosted services, the concept of a fork needs to significantly curtailed. Replacing it should be a UI and workflow that revolves around the central, canonical repository.

You shouldn't need to create your own clone or fork of a repository in order to contribute. Instead, you should be able to clone the canonical repository. When you create commits, those commits should be stored and/or more tightly affiliated with the original project - not inside a fork.

One potential implementation is doable today. I'm going to call it workspaces. Here's how it would work.

There would exist a namespace for refs that can be controlled by the user. For example, on GitHub (where my username is indygreg), if I wanted to contribute to some random project, I would git push my refs somewhere under refs/users/indygreg/ directly to that project's. No forking necessary. If I wanted to contribute to a project, I would just clone its repo then push to my workspace under it. You could do this today by configuring your Git refspec properly. For pushes, it would look something like refs/heads/*:refs/users/indygreg/* (that tells Git to map local refs under refs/heads/ to refs/users/indygreg/ on that remote repository). If this became a popular feature, presumably the Git wire protocol could be taught to advertise this feature such that Git clients automatically configured themselves to push to user-specific workspaces attached to the original repository.

There are several advantages to such a workspace model. Many of them revolve around eliminating forks.

At initial contribution time, no server-side fork is necessary in order to contribute. You would be able to clone and contribute without waiting for or configuring a fork. Or if you can create commits from the web interface, the clone wouldn't even be necessary! Lowering the barrier to contribution is a good thing, especially if collaboration is the product you are selling.

In the web UI, workspaces would also revolve around the source project and not be off in their own world like forks are today. People could more easily see what others are up to. And fetching their work would require typing in their username as opposed to configuring a whole new remote. This would bring communities closer and hopefully lead to better collaboration.

Not requiring forks also eliminates the need to synchronize your fork with the upstream repository. I don't know about you, but one of the things that bothers me about the Game of Refs that Git imposes is that I have to keep my refs in sync with the upstream refs. When I fetch from origin and pull down a new master branch, I need to git merge that branch into my local master branch. Then I need to push that new master branch to my fork. This is quite tedious. And it is easy to merge the wrong branches and get your branch state out of whack. There are better ways to map remote refs into your local names to make this far less confusing.

Another win here is not having to push and store data multiple times. When working on a fork (which is a separate repository), after you git fetch changes from upstream, you need to eventually git push those into your fork. If you've ever worked on a large repository and didn't have a super fast Internet connection, you may have been stymied by having to git push large amounts of data to your fork. This is quite annoying, especially for people with slow Internet connections. Wouldn't it be nice if that git push only pushed the data that was truly new and didn't already exist somewhere else on the server? A workspace model where development all occurs in the original repository would fix this. As a bonus, it would make the storage problem on servers easier because you would eliminate thousands of forks and you probably wouldn't have to care as much about data duplication across repos/clones because the version control tool solves a lot of this problem for you, courtesy of having all data live alongside or in the original repository instead of in a fork.

Another win from workspace-centric development would be the potential to do more user-friendly things after pull/merge requests are incorporated in the official project. For example, the ref in your workspace could be deleted automatically. This would ease the burden on users to clean up after their submissions are accepted. Again, instead of mashing keys to play the Game of Refs, this would all be taken care of for you automatically. (Yes, I know there are scripts and shell aliases to make this more turn-key. But user-friendly behavior shouldn't have to be opt-in: it should be the default.)

But workspaces aren't all rainbows and unicorns. There are access control concerns. You probably don't want users able to mutate the workspaces of other users. Or do you? You can make a compelling case that project administrators should have that ability. And what if someone pushes bad or illegal content to a workspace and you receive a cease and desist? Can you take down just the offending workspace while complying with the order? And what happens if the original project is deleted? Do all its workspaces die with it? These are not trivial concerns. But they don't feel impossible to tackle either.

Workspaces are only one potential alternative to forks. And I can come up with multiple implementations of the workspace concept. Although many of them are constrained by current features in the Git wire protocol. But Git is (finally) getting a more extensible wire protocol, so hopefully this will enable nice things.

I challenge Git service providers like GitHub, Bitbucket, and GitLab to think outside the box and implement something better than how forks are implemented today. It will be a large shift. But I think users will appreciate it in the long run.

Conclusion

Git is an ubiquitous version control tool. But it is frequently lampooned for its poor usability and documentation. We even have research papers telling us which parts are bad. Nobody I know has had a pleasant initial experience with Git. And it is clear that few people actually understand Git: most just know the command incantations they need to know to accomplish a small set of common activities. (If you are such a person, there is nothing to be ashamed about: Git is a hard tool.)

Popular Git-based hosting and collaboration services (such as GitHub, Bitbucket, and GitLab) exist. While they've made strides to make it easier to commit data to a Git repository (I purposefully avoid saying use Git because the most usable tools seem to avoid the git command line interface as much as possible), they are often a thin veneer over Git itself (see forks). And Git is a thin veneer over a content indexed key-value store (see forced usage of bookmarks).

As an industry, we should be concerned about the lousy usability of Git and the tools and services that surround it. Some may say that Git - with its near monopoly over version control mindset - is a success. I have a different view: I think it is a failure that a tool with a user experience this bad has achieved the success it has.

The cost to Git's poor usability can be measured in tens if not hundreds of millions of dollars in time people have wasted because they couldn't figure out how to use Git. Git should be viewed as a source of embarrassment, not a success story.

What's really concerning is that the usability problems of Git have been known for years. Yet it is as popular as ever and there have been few substantial usability improvements. We do have some alternative frontends floating around. But these haven't caught on.

I'm at a loss to understand how an open source tool as popular as Git has remained so mediocre for so long. The source code is out there. Anybody can submit a patch to fix it. Why is it that so many people get tripped up by the same poor usability issues years after Git became the common version control tool? It certainly appears that as an industry we have been unable or unwilling to address systemic deficiencies in a critical tool. Why this is, I'm not sure.

Despite my pessimism about Git's usability and its poor track record of being attentive to the needs of people who aren't power users, I'm optimistic that the future will be brighter. While the ~7000 words in this post pale in comparison to the aggregate word count that has been written about Git, hopefully this post strikes a nerve and causes positive change. Just because one generation has toiled with the usability problems of Git doesn't mean the next generation has to suffer through the same. Git can be improved and I encourage that change to happen. The three issues above and their possible solutions would be a good place to start.

Categorieën: Mozilla-nl planet

Jared Hirsch: Test Pilot in 2018: Lessons Learned from Screenshots

za, 09/12/2017 - 01:36
Test Pilot in 2018: Lessons learned from graduating Screenshots into Firefox

Wil and I were talking about the Bugzilla vs Github question for Screenshots a couple of days ago, and I have to admit that I’ve come around to “let’s just use Bugzilla for everything, and just ride the trains, and work with the existing processes as much as possible.”

I think it’s important to point out that this is a complete departure from my original point of view. Getting an inside view of how and why Firefox works the way it does has changed my mind. Everything just moves slower, with so many good reasons for doing so (good topic for another blog post). Given that our goal is to hand Screenshots off, just going with the existing processes, minimizing friction, is the way to go.

If Test Pilot’s goals include landing stuff in Firefox, what does this mean for the way that we run earlier steps in the product development process?

Suggestions for experiments that the Test Pilot team will graduate into Firefox Ship maximal, not minimal, features

I don’t think we should plan on meaningful iteration once a feature lands in Firefox. It’s just fundamentally too slow to iterate rapidly, and it’s way too hard for a very small team to ship features faster than that default speed (again, many good reasons for that friction).

The next time we graduate something into Firefox, we should plan to ship much more than a minimum viable product, because we likely won’t get far past that initial landing point.

When in Firefox, embrace the Firefox ways

Everything we do, once an experiment gets approval to graduate, should be totally Firefox-centric. Move to Bugzilla for bugs (except, maybe, for server bugs). Mozilla-central for code, starting with one huge import from Github (again, except for server code). Git-cinnabar works really well, if you prefer git over mercurial. We have committers within the team now, and relationships with reviewers, so the code side of things is pretty well sorted.

Similarly for processes: we should just go with the existing processes to the best of our ability, which is to say, constantly ask the gatekeepers if we’re doing it right. Embrace the fact that everything in Firefox is high-touch, and use existing personal relationships to ask early and often if we’ve missed any important steps. We will always miss something, whether it’s new rules for some step, or neglecting to get signoff from some newly-created team, but we can plan for that in the schedule. I think we’ve hit most of the big surprises in shipping Screenshots.

Aim for bigger audiences in Test Pilot

Because it’s difficult to iterate on features when code is inside Firefox, we should make our Test Pilot audience as close to a release audience as possible. We want to aim for the everyday users, not the early adopters. I think we can do this by just advertising Test Pilot experiments more heavily.

By gathering data from a larger audience, our data will be more representative of the release audience, and give us a better chance of feature success.

Aim for web-flavored features / avoid dramatic changes to the Firefox UI

Speaking as the person who did “the weird Gecko stuff” on both Universal Search and Min-Vid, doing novel things with the current Firefox UI is hard, for all the reasons Firefox things are hard: the learning curve is significant and it’s high-touch. Knowledge of how the code works is confined to a small group of people, the docs aren’t great, and learning how things work requires reading the source code, plus asking for help on IRC.

Given that our team’s strengths lie in web development, we will be more successful if our features focus on the webby things: cross-device, mobile, or cloud integration; bringing the ‘best of the web’ into the browser. This is stuff we’re already good at, stuff that could be a differentiator for Firefox just as much as new Firefox UI, and we can iterate much more quickly on server code.

This said, if Firefox Product wants to reshape the browser UI itself in powerful, unexpected, novel ways, we can do it, but we should have some Firefox or Platform devs committed to the team for a given experiment.

Categorieën: Mozilla-nl planet

Mozilla VR Blog: Experimenting with AR and the Web on iOS

vr, 08/12/2017 - 20:48
Experimenting with AR and the Web on iOS

Today, we’re happy to announce that the WebXR Viewer app is available for download on iTunes. In our recent announcement of our Mixed Reality program, we talked about some explorations we were doing to extend WebVR to include AR and MR technology. In that post, we pointed at an iOS WebXR Viewer app we had developed to allow us to experiment with these ideas on top of Apple’s ARKit. While the app is open source, we wanted to let developers experiment with web-based AR without having to build the app themselves.

The WebXR Viewer app lets you view web pages created with the webxr-polyfill Javascript library, an experimental library we created as part of our explorations. This app is not intended to be a full-fledged web browser, but rather a way to test, demonstrate and share AR experiments created with web technology.

Code written with the webxr-polyfill runs in this app, as well as Google’s experimental WebARonARCore APK on Android. We are working on supporting other AR and VR browsers, including WebVR on desktop.

Experimenting with AR and the Web on iOS

We’ve also been working on integrating the webxr-polyfill into the popular three.js graphics library and the A-Frame framework to make it easy for three.js and A-Frame developers to try out these ideas. We are actively working on these libraries and using them in our own projects; while they are works-in-progress, each contains some simple examples to help you get started with them. We welcome feedback and contributions!

Experimenting with AR and the Web on iOS

What’s Next?

We are not the only company interested in how WebVR could be extended to support AR and MR; for example, Google released a WebAR extension to WebVR with the WebARonARCore application mentioned above, and discussions on the WebVR standards documents has been lively with these issues.

As a result, the companies developing the WebVR API (including us) recently decided to rename the WebVR 2.0 proposal to the WebXR Device API and rename the WebVR Community Group to the Immersive Web Community Group, to reflect broad agreement that AR and VR devices should be exposed through a common API. The WebXR API we created was based on WebVR2.0; we will be aligning it with the WebXR Device API as is develops and continue using it to explore ideas for exposing additional AR concepts into WebXR.

We’ve been working on this app since earlier this fall, before the WebVR community decided to move from WebVR to WebXR, and we are looking forward to continue updating the app and libraries as the WebXR Device API is developed. We will continue to use this app as a platform for our experiments with WebXR on iOS using ARKit, and welcome others (both inside and outside the Immersive Web Community Group) to work with us on the app, the javascript libraries, and demonstrations of how the web can support AR and VR moving forward.

Categorieën: Mozilla-nl planet

The Firefox Frontier: What’s on the new Firefox menus? Speed

vr, 08/12/2017 - 19:16

If you’ve downloaded the new Firefox, you’ve probably noticed that we did some redecorating. Our new UI design (we call it Photon) is bright, bold, and inspired by the speed … Read more

The post What’s on the new Firefox menus? Speed appeared first on The Firefox Frontier.

Categorieën: Mozilla-nl planet

Will Kahn-Greene: html5lib-python 1.0 released!

vr, 08/12/2017 - 18:00
html5lib-python v1.0 released!

Yesterday, Geoffrey released html5lib 1.0 [1]! The changes aren't wildly interesting.

The more interesting part for me is how the release happened. I'm going to spend the rest of this post talking about that.

[1]Technically there was a 1.0 release followed by a 1.0.1 release because the 1.0 release had issues. The story of Bleach and html5lib

I work on Bleach which is a Python library for sanitizing and linkifying text from untrusted sources for safe usage in HTML. It relies heavily on another library called html5lib-python. Most of the work that I do on Bleach consists of figuring out how to make html5lib do what I need it to do.

Over the last few years, maintainers of the html5lib library have been working towards a 1.0. Those well-meaning efforts got them into a versioning model which had some unenthusing properties. I would often talk to people about how I was having difficulties with Bleach and html5lib 0.99999999 (8 9s) and I'd have to mentally count how many 9s I had said. It was goofy [2].

In an attempt to deal with the effects of the versioning, there's a parallel set of versions that start with 1.0b. Because there are two sets of versions, it was a total pain in the ass to correctly specify which versions of html5lib that Bleach worked with.

While working on Bleach 2.0, I bumped into a few bugs and upstreamed a patch for at least one of them. That patch sat in the PR queue for months. That's what got me wondering--is this project dead?

I tracked down Geoffrey and talked with him a bit on IRC. He seems to be the only active maintainer. He was really busy with other things, html5lib doesn't pay at all, there's a ton of stuff to do, he's burned out, and recently there have been spats of negative comments in the issues and PRs. Generally the project had a lot of stop energy.

Some time in August, I offered to step up as an interim maintainer and shepherd html5lib to 1.0. The goals being:

  1. land or close as many old PRs as possible
  2. triage, fix, and close as many issues as possible
  3. clean up testing and CI
  4. clean up documentation
  5. ship 1.0 which ends the versioning issues
[2]Many things in life are goofy. Thoughts on being an interim maintainer

I see a lot of open source projects that are in trouble in the sense that they don't have a critical mass of people and energy. When the sole part-time volunteer maintainer burns out, the project languishes. Then the entitled users show up, complain, demand changes, and talk about how horrible the situation is and everyone should be ashamed. It's tough--people are frustrated and then do a bunch of things that make everything so much worse. How do projects escape the raging inferno death spiral?

For a while now, I've been thinking about a model for open source projects where someone else pops in as an interim maintainer for a short period of time with specific goals and then steps down. Maybe this alleviates users' frustrations? Maybe this gives the part-time volunteer burned-out maintainer a breather? Maybe this can get the project moving again? Maybe the temporary interim maintainer can make some of the hard decisions that a regular long-term maintainer just can't?

I wondered if I should try that model out here. In the process of convincing myself that stepping up as an interim maintainer was a good idea [3], I looked at projects that rely on html5lib [4]:

  • pip vendors it
  • Bleach relies upon it heavily, so anything that uses Bleach uses html5lib (jupyter, hypermark, readme_renderer, tensorflow, ...)
  • most web browsers (Firefox, Chrome, servo, etc) have it in their repositories because web-platform-tests uses it

I talked with Geoffrey and offered to step up with these goals in mind.

I started with cleaning up the milestones in GitHub. I bumped everything from the 0.9999999999 (10 9s) milestone which I determined will never happen into a 1.0 milestone. I used this as a bucket for collecting all the issues and PRs that piqued my interest.

I went through the issue tracker and triaged all the issues. I tried to get steps to reproduce and any other data that would help resolve the issue. I closed some issues I didn't think would ever get resolved.

I triaged all the pull requests. Some of them had been open for a long time. I apologized to people who had spent their time to upstream a fix that sat around for years. In some cases, the changes had bitrotted severely and had to be redone [5].

Then I plugged away at issues and pull requests for a couple of months and pushed anything out of the milestone that wasn't well-defined or something we couldn't fix in a week.

At the end of all that, Geoffrey released version 1.0 and here we are today!

[3]I have precious little free time, so this decision had sweeping consequences for my life, my work, and people around me. [4]Recently, I discovered libraries.io--it's pretty amazing project. They have a page for html5lib. I had written a (mediocre) tool that does vaguely similar things. [5]This is what happens on projects that don't have a critical mass of energy/people. It sucks for everyone involved. Conclusion and thoughts

I finished up as interim maintainer for html5lib. I don't think I'm going to continue actively as a maintainer. Yes, Bleach uses it, but I've got other things I should be doing.

I think this was an interesting experiment. I also think it was a successful experiment in regards to achieving my stated goals, but I don't know if it gave the project much momentum to continue forward.

I'd love to see other examples of interim maintainers stepping up, achieving specific goals, and then stepping down again. Does it bring in new people to the community? Does it affect the raging inferno death spiral at all? What kinds of projects would benefit from this the most? What kinds of projects wouldn't benefit at all?

Categorieën: Mozilla-nl planet

The Mozilla Blog: Mozilla Awards Research Grants to Fund Top Research Projects

vr, 08/12/2017 - 17:53

We are happy to announce the results of the Mozilla Research Grant program for the second half of 2017. This was a competitive process, with over 70 applicants. After three rounds of judging, we selected a total of fourteen proposals, ranging from building tools to support open web platform projects like Rust and WebAssembly to designing digital assistants for low- and middle- income families and exploring decentralized web projects in the Orkney Islands. All these projects support Mozilla’s mission to make the Internet safer, more empowering, and more accessible.

The Mozilla Research Grants program is part of Mozilla’s Emerging Technologies commitment to being a world-class example of inclusive innovation and impact culture-and reflects Mozilla’s commitment to open innovation, continuously exploring new possibilities with and for diverse communities.

Zhendong Su University of California, Davis Practical, Rigorous Testing of the Mozilla Rust and bindgen Compilers Ross Tate Cornell University Inferable Typed WebAssembly Laura Watts IT University of Copenhagen Shaping community-based managed services (‘Orkney Cloud Saga’) Svetlana Yarosh University of Minnesota Children & Parent Using Speech Interfaces for Informational Queries Serge Egelman UC Berkeley / International Computer Science Institute Towards Usable IoT Access Controls in the Home Alexis Hiniker University of Washington Understanding Design Opportunities for In-Home Digital Assistants for Low- and Middle-Income Families Blase Ur University of Chicago Improving Communication About Privacy in Web Browsers Wendy Ju Cornell Tech Video Data Corpus of People Reacting to Chatbot Answers to Enable Error Recognition and Repair Katherine Isbister University of California Santa Cruz Designing for VR Publics: Creating the right interaction infrastructure for pro-social connection, privacy, inclusivity, and co-mingling in social VR Sanjeev Arora Princeton University and the Institute for Advanced Study Compact representations of meaning of natural language: Toward a rigorous and interpretable study Rachel Cummings Georgia Tech Differentially Private Analysis of Growing Datasets Tongping Liu University of Texas at San Antonio Guarder: Defending Heap Vulnerabilities with Flexible Guarantee and Better Performance

 

The Mozilla Foundation will also be providing grants in support of two additional proposals:

 

J. Nathan Matias CivilServant, incubated by Global Voices Preventing online harassment with Community A/B Test Systems Donghee Yvette Wohn New Jersey Institute of Technology Dealing with Harassment: Moderation Practices of Female and LGBT Live Streamers

 

Congratulations to all successfully funded applicants! The 2018H1 round of grant proposals will open in the Spring; more information is available at https://research.mozilla.org/research-grants/.

Jofish Kaye, Principal Research Scientist, Emerging Technologies, Mozilla

The post Mozilla Awards Research Grants to Fund Top Research Projects appeared first on The Mozilla Blog.

Categorieën: Mozilla-nl planet

Gregory Szorc: Good Riddance to AUFS

vr, 08/12/2017 - 16:00

For over a year, AUFS - a layering filesystem for Linux - has been giving me fits.

As I initially measured last year, AUFS has... suboptimal performance characteristics. The crux of the problem is that AUFS obtains a global lock in the Linux kernel (at least version 3.13) for various I/O operations, including stat(). If you have more than a couple of active CPU cores, the overhead from excessive kernel locking inside _raw_spin_lock() can add more overhead than extra CPU cores add capacity. That's right: under certain workloads, adding more CPU cores actually slows down execution due to cores being starved waiting for a global lock in the kernel!

If that weren't enough, AUFS can also violate POSIX filesystem guarantees under load. It appears that AUFS sometimes forgets about created files or has race conditions that prevent created files from being visible to readers until many seconds later! I think this issue only occurs when there are concurrent threads creating files.

These two characteristics of AUFS have inflicted a lot of hardship on Firefox's continuous integration. Large parts of Firefox's CI execute in Docker. And the host environment for Docker has historically used Ubuntu 14.04 with Linux 3.13 and Docker using AUFS. AUFS was/is the default storage driver for many versions of Docker. When this storage driver is used, all files inside Docker containers are backed by AUFS unless a Docker volume (a directory bind mounted from the host filesystem - EXT4 in our case) is in play.

When we started using EC2 instances with more CPU cores, we weren't getting a linear speedup for CPU bound operations. Instead, CPU cycles were being spent inside the kernel. Stack profiling showed AUFS as the culprit. We were thus unable to leverage more powerful EC2 instances because adding more cores would only provide marginal to negative gains against significant cost expenditure.

We worked around this problem by making heavy use of Docker volumes for tasks incurring significant I/O. This included version control clones and checkouts.

Somewhere along the line, we discovered that AUFS volumes were also the cause of several random file not found errors throughout automation. Initially, we thought many of these errors were due to bugs in the underlying tools (Mercurial and Firefox's build system were common victims because they do lots of concurrent I/O). When the bugs mysteriously went away after ensuring certain operations were performed on EXT4 volumes, we were able to blame AUFS for the myriad of filesystem consistency problems.

Earlier today, we pushed out a change to upgrade Firefox's CI to Linux 4.4 and switched Docker from AUFS to overlayfs (using the overlay2 storage driver). The improvements exceeded my expectations.

Linux build times have decreased by ~4 minutes, from ~750s to ~510s.

Linux Rust test times have decreased by ~4 minutes, from ~615s to ~380s.

Linux PGO build times have decreased by ~5 minutes, from ~2130s to ~1820s.

And this is just the build side of the world. I don't have numbers off hand, but I suspect many tests also got a nice speedup from this change.

Multiplied by thousands of tasks per day and factoring in the cost to operate these machines, the elimination of AUFS has substantially increased the efficiency (and reliability) of Firefox CI and easily saved Mozilla tens of thousands of dollars per year. And that's just factoring in the savings in the AWS bill. Time is money and people are a lot more expensive than AWS instances (you can run over 3,000 c5.large EC2 instances at spot pricing for what it costs to employ me when I'm on the clock). So the real win here comes from Firefox developers being able to move faster because their builds and tests complete several minutes faster.

In conclusion, if you care about performance or filesystem correctness, avoid AUFS. Use overlayfs instead.

Categorieën: Mozilla-nl planet

Dustin J. Mitchell: Parameterized Roles

vr, 08/12/2017 - 16:00

The roles functionality in Taskcluster is a kind of “macro expansion”: given the roles

group:admins -> admin-scope-1 admin-scope-2 assume:group:devs group:devs -> dev-scope

the scopeset ["assume:group:admins", "my-scope"] expands to

[ "admin-scope-1", "admin-scope-2", "assume:group:admins", "assume:group:devs", "dev-scope", "my-scope", ]

because the assume:group:admins expanded the group:admins role, and that recursively expanded the group:devs role.

However, this macro expansion did not allow any parameters, similar to allowing function calls but without any arguments.

The result is that we have a lot of roles that look the same. For example, project-admin:.. roles all have similar scopes (with the project name included in them), and a big warning in the description saying “DO NOT EDIT”.

Role Parameters

Now we can do better! A role’s scopes can now include <..>. When expanding, this string is replaced by the portion of the scope that matched the * in the roleId. An example makes this clear:

project-admin:* -> assume:hook-id:project-<..>/* assume:project:<..>:* auth:create-client:project/<..>/* auth:create-role:hook-id:project-<..>/* auth:create-role:project:<..>:* auth:delete-client:project/<..>/* auth:delete-role:hook-id:project-<..>/* auth:delete-role:project:<..>:* auth:disable-client:project/<..>/* auth:enable-client:project/<..>/* auth:reset-access-token:project/<..>/* auth:update-client:project/<..>/* auth:update-role:hook-id:project-<..>/* auth:update-role:project:<..>:* hooks:modify-hook:project-<..>/* hooks:trigger-hook:project-<..>/* index:insert-task:project.<..>.* project:<..>:* queue:get-artifact:project/<..>/* queue:route:index.project.<..>.* secrets:get:project/<..>/* secrets:set:project/<..>/*

With the above parameterized role in place, we can delete all of the existing project-admin:.. roles: this one will do the job. A client that has assume:project-admin:bugzilla in its scopes will have assume:hook-id:project:bugzilla/* and all the rest in its expandedScopes.

There’s one caveat: a client with assume:project-admin:nss* will have assume:hook-id:project:nss* – note the loss of the trailing /. The * consumes any parts of the scope after the <..>. In practice, as in this case, this is not an issue, but could certainly cause surprise for the unwary.

Implementation

Parameterized roles seem pretty simple, but they’re not!

Efficiency

Before parameterized roles the Taskcluster-Auth service would pre-compute the full expansion of every role. That meant that any API call requiring expansion of a set of scopes only needed to combine the expansion of each scope in the set – a linear operation. This avoided a (potentially exponential-time!) recursive expansion, trading some up-front time pre-computing for a faster response to API calls.

With parameterized roles, such pre-computation is not possible. Depending on the parameter value, the expansion of a role may or may not match other roles. Continuing the example above, the role assume:project:focus:xyz would be expanded when the parameter is focus, but not when the parameter is bugzilla.

The fix was to implement the recursive approach, but in such a way that non-pathological cases have reasonable performance. We use a trie which, given a scope, returns the set of scopes from any matching roles along with the position at which those scopes matched a * in the roleId. In principle, then, we resolve a scopeset by using this trie to expand (by one level) each of the scopes in the scopeset, substituting parameters as necessary, and recursively expand the resulting scopes.

To resolve a scope set, we use a queue to “flatten” the recursion, and keep track of the accumulated scopes as we proceed. We already had some utility functions that allow us to make a few key optimizations. First, it’s only necessary to expand scopes that start with assume: (or, for completeness, things like * or assu*). More importantly, if a scope is already included in the seen scopeset, then we need not enqueue it for recursion – it has already been accounted for.

In the end, the new implementation is tens of milliseconds slower for some of the more common queries. While not ideal, in practice that as not been problematic. If necessary, some simple caching might be added, as many expansions repeat exactly.

Loops

An advantage of the pre-computation was that it could seek a “fixed point” where further expansion does not change the set of expanded scopes. This allowed roles to refer to one another:

some-role -> assume:another-role another* -> assume:some-role

A naïve recursive resolver might loop forever on such an input, but could easily track already-seen scopes and avoid recursing on them again. The situation is much worse with parameterized roles. Consider:

some-role-* -> assume:another-role-<..>x another-role-* -> assume:some-role-<..>y

A simple recursive expansion of assume:some-role-abc would result in an infinite set of roles:

assume:another-role-abcx assume:some-role-abcxy assume:another-role-abcxyx assume:some-role-abcxyxy ...

We forbid such constructions using a cycle check, configured to reject only cycles that involve parameters. That permits the former example while prohibiting the latter.

Atomic Modifications

But even that is not enough! The existing implementation of roles stored each role in a row in Azure Table Storage. Azure provides concurrent access to and modification of rows in this storage, so it’s conceivable that two roles which together form a cycle could be added simultaneously. Cycle checks for each row insertion would each see only one of the rows, but the result after both insertions would cause a cycle. Cycles will crash the Taskcluster-Auth service, which will bring down the rest of Taskcluster. Then a lot of people will have a bad day.

To fix this, we moved roles to Azure Blob Storage, putting all roles in a single blob. This service uses ETags to implement atomic modifications, so we can perform a cycle check before committing and be sure that no cyclical configuration is stored.

What’s Next

The parameterized role support is running in production now, but we have no yet updated any roles, aside from a few test roles, to use it. The next steps are to use the support to address a few known weak points in role configuration, including the project administration roles used as an example above.

Categorieën: Mozilla-nl planet

Alessio Placitelli: Add-on recommendations for Firefox users: a prototype recommender system leveraging existing data sources

vr, 08/12/2017 - 12:59
By: Alessio Placitelli, Ben Miroglio, Jason Thomas, Shell Escalante and Martin Lopatka. With special recognition of the development efforts of Roberto Vitillo who kickstarted this project, Mauro Doglio for massive contributions to the code base during his time at Mozilla, Florian Hartmann, who contributed efforts towards prototyping the ensemble linear combiner, Stuart Colville for coordinating … →
Categorieën: Mozilla-nl planet

Ryan Harter: CLI for alerts via Slack

vr, 08/12/2017 - 09:00

I finally got a chance to scratch an itch today.

Problem

When working with bigger ETL jobs, I frequently run into jobs that take hours to run. I usually either step away from the computer or work on something less important while the job runs. I don't have a good …

Categorieën: Mozilla-nl planet

Mozilla Open Innovation Team: From Visual Assets to WebVR Experiences: Mozilla Launches Second Part of its WebVR Medieval Fantasy…

do, 07/12/2017 - 18:00
From Visual Assets to WebVR Experiences: Mozilla Launches Second Part of its WebVR Medieval Fantasy Challenge!

Mozilla seeks to continually grow a robust community around A-Frame and WebVR. This is why we partnered with Sketchfab to create hundreds of medieval fantasy assets for the WebVR community to use. Today we are moving forward to use these assets in new and innovative ways such as games, scenes and character interaction.

Launching part two of our WebVR Medieval Fantasy Design Challenge we are calling for developers to create stories and scenes building on the visually stunning collection of assets previously built by talented designers from all around the world.

This challenge started December 5th and will be running until February 28th. We are excited to be able to offer great prizes such as a VR enabled laptop, an Oculus headset, pixel phones, Daydream headsets and even a few Lenovo Jedi Challenge kits.

Through this challenge, we hope to get submissions not only from people who have been using WebVR for a long time but also those who are new to the community and would like to take the plunge into the world of WebVR!

<figcaption>Thor and the Midgard Serpent by MrEmjeR on Sketchfab</figcaption>

Therefore, we have written an extensive tutorial to get new users started in A-Frame and will be using a slack group specifically for this challenge to support new users with quick tips and answers to questions. You can also review the A-Frame blog for more information on building and for project inspiration.

<figcaption>Baba Yaga’s hut by Inuciian on Sketchfab</figcaption>

Check out the Challenge Site for further information and how to submit your work. We can’t wait to see what you come up with!

From Visual Assets to WebVR Experiences: Mozilla Launches Second Part of its WebVR Medieval Fantasy… was originally published in Mozilla Open Innovation on Medium, where people are continuing the conversation by highlighting and responding to this story.

Categorieën: Mozilla-nl planet

The Firefox Frontier: New Firefox Preference Center, Feels As Fast As It Runs

do, 07/12/2017 - 15:44

We’ve released an all-new Firefox to the world, and it includes a lot of new browser technology that’s super fast. People and press everywhere are loving it. We’ve made many … Read more

The post New Firefox Preference Center, Feels As Fast As It Runs appeared first on The Firefox Frontier.

Categorieën: Mozilla-nl planet

The Mozilla Blog: Mozilla is Funding Art About Online Privacy and Security

do, 07/12/2017 - 15:15
Mozilla’s Creative Media Grants support art and artists raising awareness about surveillance, tracking, and hacking

 

La convocatoria para solicitudes está disponible en Español aquí

The Mozilla Manifesto states that “Individuals’ security and privacy on the Internet are fundamental and must not be treated as optional.”

Today, Mozilla is seeking artists, media producers, and storytellers who share that belief — and who use their art to make a difference.

Mozilla’s Creative Media Grants program is now accepting submissions. The program awards grants ranging from $10,000 to $35,000 for films, apps, storytelling, and other forms of media that explore topics like mass surveillance and the erosion of online privacy.

What we’re looking for

We seek to support producers creating work on the web, about the web, and for a broad public. Producers should share Mozilla’s concern that the private communications of internet citizens are increasingly being monitored and monetized by state and corporate actors.

As we move to an era of ubiquitous and connected digital technology, Mozilla sees a vital role for media produced in the public interest that advocates for internet citizens being informed, empowered and in control of their digital lives.

Imagine: An open-source browser extension that reveals how much Facebook really knows about you. Or artwork and journalism that examine how women’s personal data is tracked and commodified online.

(These are real projects, created by artists who now receive Mozilla Creative Media grants. Learn more about Data Selfie and Chupadados.)

The audiences for this work should be primarily in Europe and Latin America.

While this does not preclude makers from other regions from applying, content and approach must be relevant to one of these two regions.

How to apply

To apply, download the application guide.

Lee la descripción del proyecto en Español aquí.

All applications must be in English, and applicants are encouraged to read the application guide. Applications will be open until Midnight Pacific time December 31st, 2017. Applications will be reviewed January 2018.

The post Mozilla is Funding Art About Online Privacy and Security appeared first on The Mozilla Blog.

Categorieën: Mozilla-nl planet

William Lachance: Perfherder summer of contribution thoughts

di, 29/09/2015 - 06:00

A few months ago, Joel Maher announced the Perfherder summer of contribution. We wrapped things up there a few weeks ago, so I guess it’s about time I wrote up a bit about how things went.

As a reminder, the idea of summer of contribution was to give a set of contributors the opportunity to make a substantial contribution to a project we were working on (in this case, the Perfherder performance sheriffing system). We would ask that they sign up to do 5–10 hours of work a week for at least 8 weeks. In return, Joel and myself would make ourselves available as mentors to answer questions about the project whenever they ran into trouble.

To get things rolling, I split off a bunch of work that we felt would be reasonable to do by a contributor into bugs of varying difficulty levels (assigning them the bugzilla whiteboard tag ateam-summer-of-contribution). When someone first expressed interest in working on the project, I’d assign them a relatively easy front end one, just to cover the basics of working with the project (checking out code, making a change, submitting a PR to github). If they made it through that, I’d go on to assign them slightly harder or more complex tasks which dealt with other parts of the codebase, the nature of which depended on what they wanted to learn more about. Perfherder essentially has two components: a data storage and analysis backend written in Python and Django, and a web-based frontend written in JS and Angular. There was (still is) lots to do on both, which gave contributors lots of choice.

This system worked pretty well for attracting people. I think we got at least 5 people interested and contributing useful patches within the first couple of weeks. In general I think onboarding went well. Having good documentation for Perfherder / Treeherder on the wiki certainly helped. We had lots of the usual problems getting people familiar with git and submitting proper pull requests: we use a somewhat clumsy combination of bugzilla and github to manage treeherder issues (we “attach” PRs to bugs as plaintext), which can be a bit offputting to newcomers. But once they got past these issues, things went relatively smoothly.

A few weeks in, I set up a fortnightly skype call for people to join and update status and ask questions. This proved to be quite useful: it let me and Joel articulate the higher-level vision for the project to people (which can be difficult to summarize in text) but more importantly it was also a great opportunity for people to ask questions and raise concerns about the project in a free-form, high-bandwidth environment. In general I’m not a big fan of meetings (especially status report meetings) but I think these were pretty useful. Being able to hear someone else’s voice definitely goes a long way to establishing trust that you just can’t get in the same way over email and irc.

I think our biggest challenge was retention. Due to (understandable) time commitments and constraints only one person (Mike Ling) was really able to stick with it until the end. Still, I’m pretty happy with that success rate: if you stop and think about it, even a 10-hour a week time investment is a fair bit to ask. Some of the people who didn’t quite make it were quite awesome, I hope they come back some day.

On that note, a special thanks to Mike Ling for sticking with us this long (he’s still around and doing useful things long after the program ended). He’s done some really fantastic work inside Perfherder and the project is much better for it. I think my two favorite features that he wrote up are the improved test chooser which I talked about a few months ago and a get related platform / branch feature which is a big time saver when trying to determine when a performance regression was first introduced.

I took the time to do a short email interview with him last week. Here’s what he had to say:

  1. Tell us a little bit about yourself. Where do you live? What is it you do when not contributing to Perfherder?

I’m a postgraduate student of NanChang HangKong university in China whose major is Internet of things. Actually,there are a lot of things I would like to do when I am AFK, play basketball, video game, reading books and listening music, just name it ; )

  1. How did you find out about the ateam summer of contribution program?

well, I remember when I still a new comer of treeherder, I totally don’t know how to start my contribution. So, I just go to treeherder irc and ask for advice. As I recall, emorley and jfrench talk with me and give me a lot of hits. Then Will (wlach) send me an Email about ateam summer of contribution and perfherder. He told me it’s a good opportunity to learn more about treeherder and how to work like a team! I almost jump out of bed (I receive that email just before get asleep) and reply with YES. Thank you Will!

  1. What did you find most challenging in the summer of contribution?

I think the most challenging thing is I not only need to know how to code but also need to know how treeherder actually work. It’s a awesome project and there are a ton of things I haven’t heard before (i.e T-test, regression). So I still have a long way to go before I familiar with it.

  1. What advice would give you to future ateam contributors?

The only thing you need to do is bring your question to irc and ask. Do not hesitate to ask for help if you need it! All the people in here are nice and willing to help. Enjoy it!

Categorieën: Mozilla-nl planet

William Lachance: More Perfherder updates

vr, 07/08/2015 - 06:00

Since my last update, we’ve been trucking along with improvements to Perfherder, the project for making Firefox performance sheriffing and analysis easier.

Compare visualization improvements

I’ve been spending quite a bit of time trying to fix up the display of information in the compare view, to address feedback from developers and hopefully generally streamline things. Vladan (from the perf team) referred me to Blake Winton, who provided tons of awesome suggestions on how to present things more concisely.

Here’s an old versus new picture:

Screen Shot 2015-07-14 at 3.53.20 PM Screen Shot 2015-08-07 at 1.57.39 PM

Summary of significant changes in this view:

  • Removed or consolidated several types of numerical information which were overwhelming or confusing (e.g. presenting both numerical and percentage standard deviation in their own columns).
  • Added tooltips all over the place to explain what’s being displayed.
  • Highlight more strongly when it appears there aren’t enough runs to make a definitive determination on whether there was a regression or improvement.
  • Improve display of visual indicator of magnitude of regression/improvement (providing a pseudo-scale showing where the change ranges from 0% – 20%+).
  • Provide more detail on the two changesets being compared in the header and make it easier to retrigger them (thanks to Mike Ling).
  • Much better and more intuitive error handling when something goes wrong (also thanks to Mike Ling).

The point of these changes isn’t necessarily to make everything “immediately obvious” to people. We’re not building general purpose software here: Perfherder will always be a rather specialized tool which presumes significant domain knowledge on the part of the people using it. However, even for our audience, it turns out that there’s a lot of room to improve how our presentation: reducing the amount of extraneous noise helps people zero in on the things they really need to care about.

Special thanks to everyone who took time out of their schedules to provide so much good feedback, in particular Avi Halmachi, Glandium, and Joel Maher.

Of course more suggestions are always welcome. Please give it a try and file bugs against the perfherder component if you find anything you’d like to see changed or improved.

Getting the word out

Hammersmith:mozilla-central wlach$ hg push -f try pushing to ssh://hg.mozilla.org/try no revisions specified to push; using . to avoid pushing multiple heads searching for changes remote: waiting for lock on repository /repo/hg/mozilla/try held by 'hgssh1.dmz.scl3.mozilla.com:8270' remote: got lock after 4 seconds remote: adding changesets remote: adding manifests remote: adding file changes remote: added 1 changesets with 1 changes to 1 files remote: Trying to insert into pushlog. remote: Inserted into the pushlog db successfully. remote: remote: View your change here: remote: https://hg.mozilla.org/try/rev/e0aa56fb4ace remote: remote: Follow the progress of your build on Treeherder: remote: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e0aa56fb4ace remote: remote: It looks like this try push has talos jobs. Compare performance against a baseline revision: remote: https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=e0aa56fb4ace

Try pushes incorporating Talos jobs now automatically link to perfherder’s compare view, both in the output from mercurial and in the emails the system sends. One of the challenges we’ve been facing up to this point is just letting developers know that Perfherder exists and it can help them either avoid or resolve performance regressions. I believe this will help.

Data quality and ingestion improvements

Over the past couple weeks, we’ve been comparing our regression detection code when run against Graphserver data to Perfherder data. In doing so, we discovered that we’ve sometimes been using the wrong algorithm (geometric mean) to summarize some of our tests, leading to unexpected and less meaningful results. For example, the v8_7 benchmark uses a custom weighting algorithm for its score, to account for the fact that the things it tests have a particular range of expected values.

To hopefully prevent this from happening again in the future, we’ve decided to move the test summarization code out of Perfherder back into Talos (bug 1184966). This has the additional benefit of creating a stronger connection between the content of the Talos logs and what Perfherder displays in its comparison and graph views, which has thrown people off in the past.

Continuing data challenges

Having better tools for visualizing this stuff is great, but it also highlights some continuing problems we’ve had with data quality. It turns out that our automation setup often produces qualitatively different performance results for the exact same set of data, depending on when and how the tests are run.

A certain amount of random noise is always expected when running performance tests. As much as we might try to make them uniform, our testing machines and environments are just not 100% identical. That we expect and can deal with: our standard approach is just to retrigger runs, to make sure we get a representative sample of data from our population of machines.

The problem comes when there’s a pattern to the noise: we’ve already noticed that tests run on the weekends produce different results (see Joel’s post from a year ago, “A case of the weekends”) but it seems as if there’s other circumstances where one set of results will be different from another, depending on the time that each set of tests was run. Some tests and platforms (e.g. the a11yr suite, MacOS X 10.10) seem particularly susceptible to this issue.

We need to find better ways of dealing with this problem, as it can result in a lot of wasted time and energy, for both sheriffs and developers. See for example bug 1190877, which concerned a completely spurious regression on the tresize benchmark that was initially blamed on some changes to the media code– in this case, Joel speculates that the linux64 test machines we use might have changed from under us in some way, but we really don’t know yet.

I see two approaches possible here:

  1. Figure out what’s causing the same machines to produce qualitatively different result distributions and address that. This is of course the ideal solution, but it requires coordination with other parts of the organization who are likely quite busy and might be hard.
  2. Figure out better ways of detecting and managing these sorts of case. I have noticed that the standard deviation inside the results when we have spurious regressions/improvements tends to be higher (see for example this compare view for the aforementioned “regression”). Knowing what we do, maybe there’s some statistical methods we can use to detect bad data?

For now, I’m leaning towards (2). I don’t think we’ll ever completely solve this problem and I think coming up with better approaches to understanding and managing it will pay the largest dividends. Open to other opinions of course!

Categorieën: Mozilla-nl planet

William Lachance: Perfherder update

di, 14/07/2015 - 06:00

Haven’t been doing enough blogging about Perfherder (our project to make Talos and other per-checkin performance data more useful) recently. Let’s fix that. We’ve been making some good progress, helped in part by a group of new contributors that joined us through an experimental “summer of contribution” program.

Comparison mode

Inspired by Compare Talos, we’ve designed something similar which hooks into the perfherder backend. This has already gotten some interest: see this post on dev.tree-management and this one on dev.platform. We’re working towards building something that will be really useful both for (1) illustrating that the performance regressions we detect are real and (2) helping developers figure out the impact of their changes before they land them.

Screen Shot 2015-07-14 at 3.54.57 PM Screen Shot 2015-07-14 at 3.53.20 PM

Most of the initial work was done by Joel Maher with lots of review for aesthetics and correctness by me. Avi Halmachi from the Performance Team also helped out with the t-test model for detecting the confidence that we have that a difference in performance was real. Lately myself and Mike Ling (one of our summer of contribution members) have been working on further improving the interface for usability — I’m hopeful that we’ll soon have something implemented that’s broadly usable and comprehensible to the Mozilla Firefox and Platform developer community.

Graphs improvements

Although it’s received slightly less attention lately than the comparison view above, we’ve been making steady progress on the graphs view of performance series. Aside from demonstrations and presentations, the primary use case for this is being able to detect visually sustained changes in the result distribution for talos tests, which is often necessary to be able to confirm regressions. Notable recent changes include a much easier way of selecting tests to add to the graph from Mike Ling and more readable/parseable urls from Akhilesh Pillai (another summer of contribution participant).

Screen Shot 2015-07-14 at 4.09.45 PM

Performance alerts

I’ve also been steadily working on making Perfherder generate alerts when there is a significant discontinuity in the performance numbers, similar to what GraphServer does now. Currently we have an option to generate a static CSV file of these alerts, but the eventual plan is to insert these things into a peristent database. After that’s done, we can actually work on creating a UI inside Perfherder to replace alertmanager (which currently uses GraphServer data) and start using this thing to sheriff performance regressions — putting the herder into perfherder.

As part of this, I’ve converted the graphserver alert generation code into a standalone python library, which has already proven useful as a component in the Raptor project for FirefoxOS. Yay modularity and reusability.

Python API

I’ve also been working on creating and improving a python API to access Treeherder data, which includes Perfherder. This lets you do interesting things, like dynamically run various types of statistical analysis on the data stored in the production instance of Perfherder (no need to ask me for a database dump or other credentials). I’ve been using this to perform validation of the data we’re storing and debug various tricky problems. For example, I found out last week that we were storing up to duplicate 200 entries in each performance series due to double data ingestion — oops.

You can also use this API to dynamically create interesting graphs and visualizations using ipython notebook, here’s a simple example of me plotting the last 7 days of youtube.com pageload data inline in a notebook:

Screen Shot 2015-07-14 at 4.43.55 PM

[original]

Categorieën: Mozilla-nl planet

Pagina's