-
-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support standard citation options #1325
Comments
I think, we should configure also each chapter in that way, to get better impression |
@rviscomi looks like the limit for Google Scholar is 5Mb
Our ebook is 17Mb so not eligible.
We actually include most of this data as Structured Data in the chapters already (JavaScript example). We could include the extra meta data too in an effort to index there but again most of the chapters are large than the 5MB maximum - for example the CSS chapter comes in at 52MB in a high resolution screen once the interactive graphs have loaded! The non-interactive version with fallback images is 1.2M but not sure if scholar bot will crawl for that. |
My understanding is the 5MB limit applies to the PDF version but not the HTML version, which can be indexed and searchable in Scholar. There may be metadata that we can add to the HTML version of the ebook to make it more search friendly. |
Not convinced about that:
Also, at the moment we explicitly stop Google from indexing the HTML ebook page, to stop it competing with the PDF version:
Similarly only the PDF version is in our sitemap. Would presumably need to change both of those as part of this if we did want to proceed. |
There is some Scholar-friendly metadata we could add: https://scholar.google.com/intl/en/scholar/inclusion.html#indexing For example: <meta name="citation_title" content="The testis isoform of the phosphorylase kinase catalytic subunit (PhK-T) plays a critical role in regulation of glycogen mobilization in developing lung">
<meta name="citation_author" content="Liu, Li">
<meta name="citation_author" content="Rannels, Stephen R.">
<meta name="citation_author" content="Falconieri, Mary">
<meta name="citation_author" content="Phillips, Karen S.">
<meta name="citation_author" content="Wolpert, Ellen B.">
<meta name="citation_author" content="Weaver, Timothy E.">
<meta name="citation_publication_date" content="1996/05/17">
<meta name="citation_journal_title" content="Journal of Biological Chemistry">
<meta name="citation_volume" content="271">
<meta name="citation_issue" content="20">
<meta name="citation_firstpage" content="11761">
<meta name="citation_lastpage" content="11766">
<meta name="citation_pdf_url" content="http://www.example.com/content/271/20/11761.full.pdf"> Would be great to have our own content appearing in Scholar, in addition to the citations from other research papers! |
@rviscomi, I think it's interesting to provide a citing recommendation for each chapter (as text and BibTeX). See an example here. Then people know how they should cite, and all references will be uniform. Otherwise, it will be hard for the scholar to assign references to a chapter if authors reference the chapters differently. Maybe we can use this as the title in our recommendation: |
Really like that idea @nrllh! Adding the design label to loop in @HTTPArchive/designers to think about how to expose the citation UX. |
Is the idea of this to provide a standard MLA/latex type citation block that can be copied elsewhere? |
@shantsis yes, that's our goal. |
@shantsis nice work, I like it! Does anyone else have any feedback or suggestions? If not we can pass this to the dev team for implementation. |
Here my suggestion: BibTex - based on this template:
The output for security chapter 2020 will be then:
MLA - based on this template:
The output for security chapter 2020 will be then:
IEEE - based on this template
The output for security chapter 2020 will be then:
I think APA style is irrelevant for us (s. here). |
Nice. Any thoughts on where it should go? After the Conclusion, before Explore the results? |
I think this is a good place |
Yup or right below that and above the author. Either works :) |
OK seems like we have the agreed approach and the design. So I've changed the title of the issue and updated the first comment. @HTTPArchive/developers anyone want to take this one? |
|
Some of them have started to show in skeleton form - but not sure if that's because of #2191 or because they were already being cited (interestingly one shows in a translated form - which suggests it's probably the later): The do say:
Will be interesting to see if the 2021 chapters are indexed quicker since they have these. Or maybe they're just deemed a good fit for Google Scholar (despite being cited by several of the other papers). 🤷 Either way I still think it would be good to have the human readable citation options at the bottom of the page as have been asked once or twice about how to cite this officially. |
Pretty sure it's the latter, the meta data has as title "The 2019 Web Almanac: JavaScript", not what is shown in the image. I'm pretty sure Google Scholar creates
Indeed! Though having just looked at the Privacy chapter, the publication date is weird: 2021/05/02?
That Google Scholar guide also mentions:
So it might fail because it does not think there is enough information...
Certainly, we might cite it as well at some point :) |
Update from the university: since this is targeted towards a non-academic audience, they think "Scientific outreach" is the best category, so they don't consider it a book or journal. (maybe they'd be happier if we had an ISBN/ISSN) Google Scholar also doesn't appear to have picked up the 2021 chapters (yet). |
@VictorLeP I think Google Scholar will also not index it, if we get a DOI or ISBN it'll be perfect. Our 2020 version was published in Google Books[1] (and Play Store) but it still doesn't appear in the Scholar. [1] https://www.google.de/books/edition/The_2020_Web_Almanac/wqcPEAAAQBAJ?hl=de&gbpv=0 |
An ISBN seems to cost $125 (in the US); a DOI can be derived from an ISBN. There seem to be a number of ways to get only a DOI, possibly for free. It seems you usually do have to upload some file. One provider missing in those posts is OSF, which provides DOIs and has an option to "soft redirect" to a link (that is, you get a pop-up). It might actually be nice if we could get one DOI per chapter instead of one for the Almanac as a whole. |
Also need to remember the translations. So at $125 per language, per chapter, per year that could add up! Though you ca often buy them in bulk much cheaper. We discussed getting ISBNs here: #1219 I'm not sure we need to get into Google Scholar. It's a nice to have since we are cited in so many articles in there already. and it's potentially another way of making the content available to those that might not otherwise find it. But other than that I'm not desperate to invest in an ISBN or DOI just to get cited in there. However I do think it would be good to tell people how to cite our articles with the above suggested addition to our web pages, since we are cited a lot and we have been asked the question before. |
I don't think you need to/can get an ISBN per chapter, but it would still be X years times Y languages so it could indeed get expensive fast. A standard way to cite might actually be sufficient. As I mentioned, Google Scholar picks up on these citations, so it might be an indirect way to get indexed there. I think that my submission of the chapter metadata to the KU Leuven repository will also trigger a Google Scholar entry (albeit only for the Privacy chapter). In terms of the citation itself, I don't really see what we couldn't go for an actual (book) chapter, for example with this (BibLaTeX!) template: @inbook{ WebAlmanac.{year}.{chapter_number},
author = "{author1_lastname, author1_firstname}
{ and author2_lastname, author2_firstname}
{ and author3_lastname, author3_firstname}",
title = "{chapter}",
booktitle = "{year} Web Almanac",
chapter = {chapter_number},
pages = "{ebook_pages}",
publisher = "HTTP Archive",
year = "{year}",
url = "{url}"
} |
BTW we have this meta data in the chapters already: <meta name="citation_title" content="The 2021 Web Almanac: Privacy">
<meta name="citation_author" content="Yana Dimova">
<meta name="citation_author" content="Victor Le Pochat">
<meta name="citation_publication_date" content="2021/11/17">
<meta name="citation_journal_title" content="The 2021 Web Almanac">
<meta name="citation_volume" content="3">
<meta name="citation_issue" content="11">
<meta name="citation_publisher" content="HTTP Archive">
<meta name="citation_technical_report_institution" content="HTTP Archive">
<meta name="citation_language" content="English">
<meta name="citation_fulltext_html_url" content="https://almanac.httparchive.org/en/2021/privacy"> This was added in May this year. And we've had this JSON-LD meta data in there too since the original 2019 launch: {
"@context": "http://schema.org",
"@type": "Article",
"mainEntityOfPage": {
"@type": "WebPage",
"@id": "https://almanac.httparchive.org/en/2021/privacy"
},
"headline": "Privacy | 2021 | The Web Almanac by HTTP Archive",
"image": {
"@type": "ImageObject",
"url": "https://almanac.httparchive.org/static/images/2020/privacy/hero_lg.jpg",
"height": 433,
"width": 866
},
"publisher": {
"@type": "Organization",
"name": "HTTP Archive",
"logo": {
"@type": "ImageObject",
"url": "https://almanac.httparchive.org/static/images/ha.png",
"height": 160,
"width": 320
},
"sameAs": [
"https://httparchive.org",
"https://twitter.com/HTTPArchive",
"https://github.com/HTTPArchive"
]
},
"author":
[{
"@type": "Person",
"sameAs": [
"https://almanac.httparchive.org/en/2021/contributors#ydimova"
,"https://github.com/ydimova"
],
"name": "Yana Dimova"
},{
"@type": "Person",
"sameAs": [
"https://almanac.httparchive.org/en/2021/contributors#victorlep"
,"https://twitter.com/VictorLePochat"
,"https://github.com/VictorLeP"
,"https://www.linkedin.com/in/victor-le-pochat/"
],
"name": "Victor Le Pochat"
}]
,
"description": "Privacy chapter of the 2021 Web Almanac covering adoption and impact of online tracking, privacy preference signals and browser initiatives for a privacy-friendlier web.",
"datePublished": "2021-11-17T00:00:00.000Z",
"dateModified": "2021-12-04T00:00:00.000Z"
} |
If all we want is a DOI, I was recommended Zenodo. It’s a CERN project, completely free, allows 50GB per upload. Takes about 2min to upload one PDF with minimal metadata, longer if we fill in a lot of details. Here is an upload I made of three pages from this year’s accessibility chapter, on their sandbox server: https://sandbox.zenodo.org/record/1112032. Those three pages got a pretend DOI of 10.5072/zenodo.1112032. |
* Add BibTeX citation box (#1325) Implemented a citation box in BibTeX format as per the discussion in issue #1325. Details: #1325 * Update page.css Fixing fff issue from linter. * Linting errors * Add DOI * Formatting * Internationalisation --------- Co-authored-by: Mike Gifford <[email protected]> Co-authored-by: Barry Pollard <[email protected]>
See https://scholar.google.com/intl/en/scholar/inclusion.html#indexing
I'd love to see Google Scholar automatically crawling and indexing our ebook content.
Edit - this was completed in #2191
We should also add citation options at the bottom of each chapter as discussed in #1325 (comment) and #1325 (comment)
The text was updated successfully, but these errors were encountered: