I have been working recently on a new Platform Engineering initiative called Uptime, the goal of which is to reduce Firefox’s crash rate on both desktop and mobile. As a result I’ve been spending a lot of time looking at crash reports, particular on the Nightly channel. This in turn has increased my appreciation of how important Nightly channel users are.
A crash report from a Nightly user is much more useful than a crash report from a non-Nightly user, for two reasons.
- If a developer lands a change that triggers crashes for Nightly users, they will get fast feedback via crash reports, often within a day or two. This maximizes the likelihood of a fix, because the particular change will be fresh in the developer’s mind. Also, backing out changes is usually easy at this point. In contrast, finding out about a crash weeks or months later is less useful.
- Because a new Nightly build is done every night, if a new crash signature appears, we have a fairly small regression window. This makes it easier to identify which change caused the new crashes.
Also, Nightly builds contain some extra diagnostics and checks that can also be helpful with identifying a range of problems. (See MOZ_DIAGNOSTIC_ASSERT for one example.)
If we could significantly increase the size of our Nightly user population, that would definitely help reduce crash rates. We would get data about a wider range of crashes. We would also get stronger signals for specific crash-causing defects. This is important because the number of crash reports received for each Nightly build is relatively low, and it’s often the case that a cluster of crash reports that come from two or more different users will receive more attention than a cluster that comes from a single user.
(You might be wondering how we distinguish those two cases. Each crash report doesn’t contain enough information to individually identify the user — unless the user entered their email address into the crash reporting form — but crash reports do contain enough information that you can usually tell if two different crash reports have come from two different users. For example, the installation time alone is usually enough, because it’s measured to the nearest second.)
All this is doubly true on Android, where the number of Nightly users is much smaller than on Windows, Mac and Linux.
Using the Nightly channel is not the best choice for everyone. There are some disadvantages.
- Nightly is less stable than later channels, but not drastically so. The crash rate is typically 1.5–2.5 times higher than Beta or Release, though occasionally it spikes higher for a short period. So a Nightly user should be comfortable with the prospect of less stability.
- Nightly gets updated every 24 hours, which some people would find annoying.
There are also advantages.
- Nightly users get to experience new features and fixes immediately.
- Nightly users get the satisfaction that they are helping produce a better Firefox. The frustration of any crash is offset by the knowledge that the information in the corresponding crash report is disproportionately valuable. Indeed, there’s a non-trivial likelihood that a single crash report from a Nightly user will receive individual attention from an engineer.
If you, or somebody you know, thinks that those advantages outweigh the disadvantages, please consider switching. Thank you.
Following up on my previous post, I counted the fraction of instructions in Firefox opt/debug libxul.so that use each XMM/YMM register.
- As before, debug builds are heavily weighted towards use of the first few registers, and opt builds allocate across more registers as you'd expect.
- In debug builds, usage of the higher-numbered registers (up to 7) is a combination of va_start spilling all parameter registers (0-7) to the stack, and handwritten-assembly. It looks like almost all the handwritten assembly in Firefox restricts itself to registers 0-7, presumably so it works in x86-32 as well. Maybe some of that code would benefit from being updated for x86-64 with more registers?
- In opt builds there's a clear drop-off in usage after register 7, more than can be explained by handwritten assembly or va_start spilling (since those equally affect debug). It's not related to caller/callee-saves status because all MM registers are caller-saves on Linux. It appears that in some functions experiencing moderate register pressure, gcc has freely used registers 0-7 but avoided using 8-15. Maybe that's because the latter require longer instruction encodings in some cases. You don't see the same dropoff moving to the upper eight GP registers, which have the same encoding length issue, but that may because of callee-saves and generally increased register pressure.
- In libxul at least, MM registers are used far less often than GP registers. Register 0, the most-used by far, is used by barely 1% of instructions, comparable to the least-used GP registers. Registers 8 to 15 are each used by less than 0.1% of instructions.
As before, these are static counts and I'd expect weighting instructions by dynamic frequency would change the results dramatically --- on the right workloads --- since most of the hand-written assembly in Firefox is hand-written specifically to optimize use of MM registers in hot loops.
Update One interesting takeaway is that you have eight huge registers (256 bits each, 512 soon) unused by most code. That creates some interesting possibilities...
Intern Q&A: Bugzilla, Session 1
In this role, Alex Salkever will be responsible for driving strategic positioning and marketing communications campaigns. Alex will oversee the global communications, social media, user support and content marketing teams and work across the organization to develop impactful outbound communications for Mozilla and Firefox products.
Alex was most recently Chief Marketing Officer of Silk.co, a data publishing and visualization startup, where he led efforts focused on user growth and platform partnerships. Alex has held a variety of senior marketing, marketing communications and product marketing roles working on products in the fields of scientific instruments, cloud computing, telecommunications and Internet of Things. In these various capacities, Alex has managed campaigns across all aspects of marketing and product marketing including PR, content marketing, user acquisition, developer marketing and marketing analytics.
Alex also brings to Mozilla his experience as a former Technology Editor for BusinessWeek.com. Among his many accomplishments, Alex is the co-author of “The Immigrant Exodus”, a book named to The Economist Book of the Year List in the Business Books category in 2012.
mconley livehacks on real Firefox bugs while thinking aloud.
This is the SUMO weekly call
When you say remoter workers, you already failed your intents of having an effective team. Companies setting up "friendly remote work environment" are most of the time failing at understanding the nature of the issues. The best way you can set up a distributed team is by forgetting about the workers and focusing on the work itself.
- Mike Taylor in Texas, USA
- Hallvord Steen in Norway
- Adam Stevenson in Ontario, Canada
- Karl Dubost (myself) in Kanagawa, Japan
- We have contributors in India, Japan, France, Romania, Brazil, Mexico, etc.
We are working together.
The most important part of creating a successful distributed team is when you stop thinking that there are remote workers in your team. You need to consider that the work can be done from anywhere by anyone competent for the job. This will give a good base for organizing the work in terms of process, protocols and tools to be productive and effective. That's the key, the only one.
- Choose open first: Opening a private discussion is a lot harder, than making a private comment on a discussion.
- Record Action Items: Have action items which are identifiable by all the team members you are working with (and broader when possible. See 1.). These items need an owner, an unambiguous actionable task, a target or context and a deadline.
- Record any meetings: When there is a meeting, write down detailed minutes on the spot. Give these minutes a unique and stable URI. The context might be useful for another team or a new employee later on.
- Share your work assets with others: Anything you produce, code, documents, etc. Give access to it. Share it as early as possible with again stable and public URIs.
- Share your worklog: This helps others to decide if they can request more things from you. This will help them to decide if they can make progress on their own job.
- Web Archived Mailing-Lists: Set your mailing-list archives in a way that makes it accessible to everyone (to the world if possible, to the entire company, and in some very rare contexts to only your team)
There are many others small tips to make this more effective, but these will go a long way in achieving your goals.
The amazing benefits of working that way is that it doesn't allow only for a distributed team, it makes the whole organization more robust by having a solid information flow management.
PS: I have been working in a distributed way at W3C, Opera and now Mozilla for the last 15 years. In my work history, I still consider the W3C (2000-2008) the best place for distributed work among staff. I don't know about W3C today.
I recently got to spend a week back at the heart of an excellent delightful inspiring technical community: Recurse Center or RC. This friendly group consists mostly of programmers from around the world who have, at some point, participated in RC’s three-month “retreat” in New York City to work on whatever projects happen to interest them. The retreat’s motto is “never graduate”, and so participants continue to support each other’s technical growth and curiosity forever and ever.
I’m an RC alum from 2014! RC’s retreat is how I ended up contributing to open source software and eventually gathering the courage to join Mozilla. Before RC, despite already having thousands of hours of programming and fancy math under my belt, I held myself back with doubts about whether I’m a “real programmer”, whatever that stereotype means. That subconscious negativity hasn’t magically disappeared, but I’ve had a lot of good experiences in the past few years to help me manage it. Today, RC helps me stay excited about learning all the things for the sake of learning all the things.
A retreat at RC looks something like this: you put your life more-or-less on hold, move to NYC, and spend three months tinkering in a big, open office with around fifty fellow (thoughtful, kind, enthusiastic) programmers. During my 2014 retreat, I worked mostly on lowish-level networking things in Python, pair programmed on whatever else people happened to be working on, gave and received code review, chatted with wise “residents”, attended spontaneous workshops, presentations and so on.
- How to implement a basic debugger?
- How to improve the technical interview process?
- What holds developers back or slows them down? What unnecessary assumptions do we have about our tools and their limitations?
RC’s retreat is a great environment for growing as a developer, but I don’t want to make it sound like it’s all effortless whimsy. Both the hardest and most wonderful part of RC (and many other groups) is being surrounded by extremely impressive, positive people who never seem to struggle with anything. It’s easy to slip into showing off our knowledge or to get distracted by measuring ourselves against our peers. Sometimes this is impostor syndrome. Sometimes it’s the myth of the 10x developer. RC puts a lot of effort into being a safe space where you can reveal your ignorance and ask questions, but insecurity can always be a challenge.
Similarly, the main benefit of RC is learning from your peers, but the usual ways of doing this seem to be geared toward people who are outgoing and think out loud. These are valuable skills, but when we focus on them exclusively we don’t hear from people who have different defaults. There is also little structure provided by RC so you are free to self-organize and exchange ideas as you deem appropriate. The risk is that quiet people are allowed to hide in their quiet corners, and then everyone misses out on their contributions. I think RC makes efforts to balance this out, but the overall lack of structure means you really have to take charge of how you learn from others. I’m definitely better at this than I used to be.
RC is an experiment and it’s always changing. Although at this point my involvement is mostly passive, I’m glad to be a part of it. I love that I’ve been able to work closely with vastly different people, getting an inside look at their work habits and ways of thinking. Now, long after my “never-graduation”, the RC community continues to expose me to a variety of ideas about technology and learning in a way that makes us all get better. Continuous improvement, yeah!
If you want to try to schedule a real task from the command line feel free to give it a try:
Here's the output of scheduling a Linux64 debug task.
NOTE: It will not post to Treeherder
NOTE: It will open a new tab asking you to grant access to your TaskCluster temp credentials.
(TC_scheduling) armenzg@armenzg-thinkpad:~/repos/TC_developer_scheduling_experiments$ python schedule_linux64_task.py
04:48:50 root Setting INFO level
04:48:50 mozci.taskcluster.tc We're going to open a new tab and authenticate you with TaskCluster.
Opening browser window to login.taskcluster.net
Asking you to grant temporary credentials to:
04:48:54 mozci.taskcluster.tc Inspect the task in https://tools.taskcluster.net/task-inspector/#bmt-5IqPTwmn8JrMzdofGg
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.