This is the 2014q3 Input post-mortem. It was a better quarter than 2014q2--that one kind of sucked. Instead 2014q2 started out well and then got kind of busy and then I was pretty overwhelmed by the end.
Things to know:
- Input is Mozilla's product feedback site.
- Fjord is the code that runs Input.
- I unilaterally decided to extend 2014q3 to October 6th.
- I am Will Kahn-Greene and I'm the primary developer on Input.
We added a bunch of code this quarter:
- October 7th. 2014: 23466 total, 11614 Python
Compare to previous quarters:
- 2014q1: April 1st, 2014: 15195 total, 6953 Python
- 2014q2: July 1st, 2014: 20456 total, 9247 Python
Nothing wildly interesting there other than noting that the codebase for Input continues to grow.Contributor stats
Ian Kronquist was the Input intern for Summer 2014. He contributed several fixes to Input. Yay!
We spent a bunch of time making our docs and Vagrant provisioning script less buggy so as to reduce the problems new contributors have when working on Input. I talked with several people about things they're interested in working on. Plus several people did some really great work on Input.
Generally, I think Input is at a point where it's not too hard to get up and running, we've got several lists of bugs that are good ones to start with and the documentation is good-ish. I think the thing that's hampering us right now is that I'm not spending enough time and energy answering questions, managing the work and keeping things going.
Anyhow, welcome L. Guruprasad, Adam Okoye and Ruben Vereecken! Additionally, many special thanks to L. Guruprasad who fixed a lot of issues with the Vagrant provisioning scripts. That work is long and tedious, but it helps everyone.Accomplishments
Dashboards for everyone: We wrote an API and some compelling examples of dashboards you can build using the API. It's being used in a few places now. We'll grow it going forward as needs arise. I'm pretty psyched about this since it makes it possible for people with needs to help themselves and not have to wait for me to get around to their work. Dashboards for everyone project plan.
Vagrant: We took the work I did last quarter and improved upon it, rewrote the docs and have a decent Vagrant setup now. Reduce contributor pain project plan.
Abuse detection: Ian spent his internship working on an abuse classifier so that we can more proactively detect and prevent abusive feedback from littering Input. We gathered some interesting data and the next step is probably to change the approach we used and apply some more complex ML things to the problem. The key here is that we want to detect abuse with confidence and not accidentally catch swaths of non-abuse. Input feedback has some peculiar properties that make this difficult. Reduce the abuse project plan.
Loop support: Loop is now using Input for user sentiment feedback.
Heartbeat support: User Advocacy is working on a project to give us a better baseline for user sentiment. This project was titled Heartbeat, but I'm not sure whether that'll change or not. Regardless, we added support for the initial prototype. Heartbeat project plan.
Data retention policy: We've been talking about a data retention policy for some time. We decided on one, finalized it and codified it in code.
Shed the last vestiges of Playdoh and funfactory: We shed the last bits of Playdoh and funfactory. Input uses the same protections and security decisions those two projects enforced, but without being tied to some of the infrastructure decisions. This made it easier to switch to peep-based requirements management.
Switched to FactoryBoy and overhauled tests: Tests run pretty fast in Fjord now. We switched to FactoryBoy, so writing model-based tests is a lot easier than the stuff we had before.Summary
Better than 2014q2 and we fixed some more technical debt further making it easier to develop for and maintain Input. Still, there's lots of work to do.
I started looking into exploiting our Telemetry data to determine which add-ons are causing performance issues with Firefox. So far there are three metrics that I plan to correlate with add-ons:
- startup time,
- shutdown time,
- background hangs.
In this post I am going over my findings for the first scenario, i.e. the relation between startup time and installed add-ons.
In an ideal world, all add-ons would have an uniform way to initialize themselves which could be instrumented. Unfortunately that’s not possible, many add-ons use asynchronous facilities and or rely on observer notifications for initialization. In other words, there is no good way to easily measure the initialization time for all add-ons without possibly touching their codebases individually.
This is the sort of problem that screams for a multi-way ANOVA but, after some thought and data exploration, it turns out that the interaction terms can be dropped between add-ons, i.e. the relation between add-ons and the startup time can be modeled as a pure additive one. Since a multi-way ANOVA is equivalent to a linear regression between a set of predictors and their interactions, the problem can be modeled with a generalized linear model where for each Telemetry submission the add-on map is represented as a boolean vector of dummy variables that can assume a value of 0 or 1 corresponding to “add-on installed” and “add-on not installed”, respectively.
Startup time depends on many other factors that are not taken into account in the model, like current system load and hard drive parameters. This means that it would be very surprising, to say the least, if one could predict the startup time without those variables. That doesn’t mean that we can’t explain part of the variance! In fact, after training the model on the data collected during the past month, it yielded a score of about 0.15, which in other words means that we can explain about 15% of the variance. Again, as we are not trying to predict the startup time accurately this is not necessarily a bad result. The F ratio, which relates the variance between add-ons to the variance within add-ons, is significant which remarks that having or not certain add-ons installed does influence the startup time.
Many of the p-values of the predictor’s coefficients are highly significant (<< 0.001); it’s just a matter of sorting the significant results by their effect size to determine the add-ons that cause a notable slowdown of Firefox during startup:
The horizontal axis measures the startup time overhead with respect to the average startup time of Firefox. For instance, Yandex Elements seems to be slowing down startup by about 8 seconds on average. The error-bars represent the standard errors of the sampling distributions of the coefficients.
Note that the model is based on a very small fraction of our user-base, i.e. the subset that has Telemetry enabled, so there clearly is some implicit bias. The picture might be different for a truly random sample of our users, nevertheless it is an indication of where to start digging deeper.
The next step is to “dashboardify” the whole thing and contact the developers of the various add-ons. We are also considering notifying users, in a yet to be determined way, when the browser detects add-ons that are known to cause performance issues.
I’m back at the screen after a week of paternity leave, and I’ll be working part-time for next two weeks while we settle in to the new family routine at home.
In the meantime, I wanted to mention a Mozilla contributor analysis project in case people would like to get involved.
We have a wiki page now, which means it’s a real thing. And here are some words my sleep-deprived brain prepared for you earlier today:
—The goal and scope of the work:
Explore existing contribution datasets to look for possible insights and metrics that would be useful to monitor on an ongoing basis, before the co-incident workweek in Portland at the beginning of December.
- Stress-test our current capacity to use existing contribution data
- Look for actionable insights to support Mozilla-wide community building efforts
- Run ad-hoc analysis before building any ‘tools’
- If useful, prototype tools that can be re-used for ongoing insights into community health
- Build processes so that contributors can get involved in this metrics work
- Document gaps in our existing data / knowledge
- Document ideas for future analysis and exploration
I’m very excited that three members of the community have already offered to support the project and we’ve barely even started.
In the end, these numbers we’re looking at are about the community, and for the benefit of the community, so the more community involvement there is in this process, the better.
If you’re interested in data analysis, or know someone who is, send them the link.
This project is one of my priorities over the following 4-8 weeks. On that note, this looks quite appealing right now.
So I’m going make more tea and eat more biscuits.
This is discussed very briefly on my about page, but I figured it could use a bit of a longer discussion. I generally consider myself to have joined the Mozilla community in ~2006. I know that I was using Mozilla Firefox, Mozilla Thunderbird, and Mozilla Sunbird way before that (probably since ~2004, which is when I built my own computer). But I was just an enthusiast then, running beta builds, then alpha and eventually nightly builds. (This was way back when we things were more dangerous to run: Minefield and Shredder.)
Anyway, back to 2006…I initially got involved in a more technical fashion by writing extensions (or maybe it was GreaseMonkey scripts). I don’t really have anyway to prove this though — I don’t seem to have any of that code. (This was before widespread distributed version control.) Anyway, let’s just assume this 2006 date is correct.
My first patch was in 2008 to move a function from the Provider for Google Calendar to the calendar core so that I could use it in Thundershows: a calendar provider for TV shows  . (As far as I know, I’m one of a handful of people to actually implement a calendar provider.) I found the calendar project much easier to get involved in than other aspects of Mozilla since it was so much smaller. (I also toyed with adding an entire new protocol to Thunderbird, which R Kent James has now done successfully!  )
I then came across Instantbird in ~2008 (sometime in the Instantbird 0.1 era). I thought this was great — Mozilla was finally making an instant messaging client! Well, I was kind of right…Instantbird is not an official Mozilla project, but it was exactly what I wanted! The guys (mostly Florian Quèze) in the #instantbird IRC channel were awesome: kind, patient, helpful, and welcoming. They were the ones that really introduced me into the Mozilla way of doing things. I fixed my first bug for Instantbird in 2010 and haven’t stopped since! I’ve since added IRC support and am now one of the lead developers. I’ve mentored Google Summer of Code students twice (2013 and 2014), contribute to Thunderbird and am a peer of the chat code shared between Instantbird and Thunderbird. (I do also occassionally contribute to other projects. )This was my first project to really have other users, I had people filing bugs, asking for new features, etc. It was great! I even had someone (years later) tell me in #instantbird that they had loved Thundershows! My second bug dealt with the same set of code and had tests committed (by me) over 5 years after the initial patch. Oops! My work was based off of some experiments Joshua Cranmer did to add support for web forums to Thunderbird. After all this time, I still want that extension. Oh, also rkent did EXACTLY what I wanted years later: which is add Twitter to Thunderbird. But not Firefox. After seven years (and over 1800 commits), I’ve never fixed a bug in Firefox; although I have had code committed to mozilla-central.
We’ve been using the functions packaged in Redo for a few years now at Mozilla. One of the things we’ve been striving for with it is the ability to write the most natural code possible. In it’s simplest form, retry, a callable that may raise, the exceptions to retry on, and the callable to run to cleanup before another attempt – are all passed in as arguments. As a result, we have a number of code blocks like this, which don’t feel very Pythonic:retry(self.session.request, sleeptime=5, max_sleeptime=15, retry_exceptions=(requests.HTTPError, requests.ConnectionError), attempts=self.retries, kwargs=dict(method=method, url=url, data=data, config=self.config, timeout=self.timeout, auth=self.auth, params=params) )
It’s particularly unfortunate that you’re forced to let retry do your exception handling and cleanup – I find that it makes the code a lot less readable. It’s also not possible to do anything in a finally block, unless you wrap the retry in one.
Recently, Chris AtLee discovered a new method of doing retries that results in much cleaner and more readable code. With it, the above block can be rewritten as:for attempt in retrier(attempts=self.retries): try: self.session.request(method=method, url=url, data=data, config=self.config, timeout=self.timeout, auth=self.auth, params=params) break except (requests.HTTPError, requests.ConnectionError), e: pass
retrier simply handles the the mechanics of tracking attempts and sleeping, leaving your code to do all of its own exception handling and cleanup – just as if you weren’t retrying at all. It’s important to note that the break at the end of the try block is important, otherwise self.session.request would run even if it succeeded.
I released Redo 1.3 with this new functionality this morning – enjoy!
In tonight’s linear algebra class, I made the mistake of leaving my paper notebook home. Ok, I thought, I’ll just use Amaya and see how that goes.
Not so well, it turns out.Twenty minutes of lecture equals a frantic “where is that thing?”, and nothing learned…
- The template for a MathML subscript is in a different panel from the template for a MathML summation (“sigma”), and you have to switch between panels constantly.
- If you want two subscripts (and in linear algebra, two subscripts for an element is common), you get a modal dialog. (Really? How many subscripts does an element need?)
- Where’s the special “M” symbol for matrix spaces? (I’d post it, but WordPress eats everything after that U+1D544 character!) We can get the real number set with ℝ..
- The UI for Amaya is hard-coded, so I can’t change it at all.
- Amaya’s copy & paste support is terrible.
- It takes about 2 seconds to write [Ai]1j with pen & paper. In Amaya that takes somewhere around ten seconds, plus the dialog I mentioned earlier.
- Oh, and the instructor’s going on, keeping a pace for students using pen & paper… there’s no chance of me keeping up with him.
After twenty minutes of trying to quickly jot down what he’s saying, without comprehension, I end up with some symbolic gobbledygook that’s probably about 80% of a match to what the instructor is actually saying. But what I was able to write down was complete nonsense.
I ended up switching to scratch paper and pen, where I was not only able to keep up, but ask some insightful questions.
(Incidentally, I glanced at LibreOffice tonight as well. I saw immediately that I’d have fundamentally the same problems: unfamiliar UI and lots of context switching. Too much to really listen to what the instructor’s saying.)How does a computer compete with pen & paper?
Later tonight, I realized, if it only takes five quick, essentially subconscious penstrokes to draw Ai, and a fair bit of training to teach someone the equivalent keystrokes in an editor… then maybe a keyboard and mouse are the wrong tools to give a student. Maybe something closer to pen & paper is best for quickly jotting down something, and then translating it to markup later… which sounds like character recognition.
Hmm, isn’t that something digital tablets and styluses are somewhat good at? Maybe not perfect, but easier for a human to work with than a memorized set of keystrokes.
Now, I am starting to understand why computer manufacturers (and Firefox OS developers) are putting so much effort into supporting touchscreens: because they’re useful for taking notes, at least. Once again, I’ve somewhat missed the boat.How does this impact my editor project?
The good news is this is way too complicated for me to even attempt in my proof-of-concept editor that I’m trying to build. (The proof of concept is about making each XML language an add-on to the core editor.)
The bad news is if I ever want students to regularly use computers in a mathematics classroom (which is the raison d’être I even started working with computers as a child), I’m going to need to support tablet computers and styluses. That’s a whole can of worms I’m not even remotely prepared to look at. This raises the bar extremely high. I’m writing this blog post mainly for myself as a future reference, but it means I’ve just discovered a Very Hard Problem is really a Much, Much Harder Problem than I ever imagined.
Bugzilla code critters blab your security sinners, warns Mozilla
The Mozilla Foundation has warned of a number of recently discovered vulnerabilities in its Bugzilla bug-tracking tool that could give attackers access to sensitive information about software projects. One particularly serious flaw allows attackers to ...
Bugzilla 0-day can reveal 0-day bugs in OSS giants like Mozilla, Red HatArs Technica
Bugzilla Vulnerability Puts Bug Collections in Harm's WayThreatpost
Bugzilla Zero-Day Exposes Zero-Day BugsKrebs on Security
alle 6 nieuwsartikelen »