[WIP] [R] Replace vignettes and examples #11123
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
ref #9810
closes #10746
This PR adds a new introductory vignette which replaces most of the previous ones, and modifies the code examples throughout functions aimed at interactive usage to call
xgboost()
instead ofxgb.train()
.Motivation
Since the time that XGBoost was first published at CRAN, its adoption and mindshare have risen substantially, to the point that it has become the standard when it comes to boosted decision trees. In this day and age, I don't think the package needs to provide any introduction to the concepts of gradient-boosting, cross-validation, evaluation metrics, and so on - people who use R are already going to be familiar with those, and the things it compares against (like the package 'gbm') have become obsolete by now.
As well, the documentation and tutorials for XGBoost have mostly moved to the online docs - any R-specific documents become outdated rather soon, and are less likely to be seen by a random user. Most of the python examples and guides should in any event work with the R interface with very minimal modifications like dict->list.
Apart from becoming a standard-use library, the features supported by XGBoost have expanded over time, and lots of the materials that were there before, such as the first vignette, contained tips that are not applicable to the current state of the library, like manually one-hot encoding categorical features.
Hence, I decided to remove the previous vignettes and create a new one from scratch, which contains only examples around the usage of the R interface and its conventions.
Help needed
It would be ideal if this vignette could also get added to the online docs.
Thus, I created the vignette as a quarto file (.qmd), which has the option to render to both .html (what CRAN hosts) and .md (which can be included in the .rst files).
Only, getting it to render to .md required building the vignette with jupyter instead of knitr, which in turn requires installs of python, jupyter, ipykernel, and the "ir" kernel that runs R in jupyter, plus registering that kernel in the user-level config for jupyter. By adding that line "jupyter: ir", it additionally makes the default quarto render (e.g. as used by the "knit" button in RStudio) build the .html vignette using jupyter instead of knitr, which is most definitely not going to work out in CRAN servers. I don't know how to solve this.
Would also be nice if some CI job could be auto-building the .md file for the online docs from the .qmd source of the vignette.
(CCing @mayer79 and @jameslamb )