Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Firefox use counter data to HTTPArchive #971

Open
zcorpan opened this issue Jan 31, 2023 · 8 comments
Open

Adding Firefox use counter data to HTTPArchive #971

zcorpan opened this issue Jan 31, 2023 · 8 comments
Assignees

Comments

@zcorpan
Copy link

zcorpan commented Jan 31, 2023

Hey folks,

In HTTPArchive/legacy.httparchive.org#59 Chrome's use counter data was added to httparchive.

The relevant code in this repo seems to be https://github.com/HTTPArchive/data-pipeline/blob/41fe511951797d25cebc71097726c8b65497212b/modules/import_all.py#L146

I'd like to have Firefox use counter data also be available in httparchive. (Maybe more things also, but starting with use counters.) To that end, I've filed https://bugzilla.mozilla.org/show_bug.cgi?id=1813593 so that the data can be extracted locally.

Are there considerations we should know about for this to work?

cc @emilio @janodvarko

@rviscomi
Copy link
Member

We only run tests in Chrome, so I don't think this would be feasible. @pmeenan WDYT?

@zcorpan
Copy link
Author

zcorpan commented Jan 31, 2023

Why not? We'd need to also run tests in desktop Firefox, in addition to current desktop Chrome and Android Chrome. I don't think it's necessarily useful to collect and store everything for Firefox that is currently stored for Chrome, that would increase storage with 50%. But only use counter data seems negligible for storage.

@tunetheweb
Copy link
Member

It’s not just storage. It’s also crawl capacity.

@pmeenan
Copy link
Member

pmeenan commented Jan 31, 2023

As it stands right now, it takes ~25,000 VM's the better part of a week to collect the data. Technically it is pretty easy to support but financially it would increase the running costs by ~30% (assuming we'd only run one config). I'm guessing some form of additional sponsorship would be needed to cover the costs.

@zcorpan
Copy link
Author

zcorpan commented Jan 31, 2023

OK, thanks. What would that amount to in USD?

@rviscomi
Copy link
Member

30% of our current crawl expenses would come out to about $20k per month.

@zcorpan
Copy link
Author

zcorpan commented Jan 31, 2023

That is likely more than the value Mozilla would get from the data. 🙂

For web compat analysis, the sample_data URLs (10k pages) would still be better than nothing. Assuming a full run would be 12,500,000 URLs (httparchive.pages.2023_01_01_desktop has 12,647,566 rows), 10k pages would be 0.08% of the cost, which is .... $16.

Would it be feasible to start there?

@zcorpan
Copy link
Author

zcorpan commented Feb 6, 2023

Update: https://bugzilla.mozilla.org/show_bug.cgi?id=1813593 is now fixed (thanks @emilio!). It's possible to set these prefs to log use counter data to stderr:

  • dom.use_counters.dump.document
  • dom.use_counters.dump.worker
  • dom.use_counters.dump.page

For the purpose of this issue, using page and worker but not document makes most sense. (document is for each document, including e.g. SVGs; these accumulate into page which is per top-level page.) worker use counters don't accumulate into page so need to be included separately.

The logged output looks like this:

USE_COUNTER_PAGE: USE_COUNTER2_DOCUMENTOPEN_PAGE - http://software.hixie.ch/utilities/js/live-dom-viewer/
USE_COUNTER_PAGE: USE_COUNTER2_CSS_PROPERTY_Display_PAGE - http://software.hixie.ch/utilities/js/live-dom-viewer/
USE_COUNTER_PAGE: USE_COUNTER2_CSS_PROPERTY_FontStyle_PAGE - http://software.hixie.ch/utilities/js/live-dom-viewer/
USE_COUNTER_PAGE: USE_COUNTER2_CSS_PROPERTY_FontWeight_PAGE - http://software.hixie.ch/utilities/js/live-dom-viewer/

You need to close the page for some of the use counters to be added to the log.

@max-ostapenko max-ostapenko transferred this issue from HTTPArchive/data-pipeline Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants