Adding purged cross-validation for time series datasets#115
Open
mmerce wants to merge 1 commit intowhizzml:masterfrom
Open
Adding purged cross-validation for time series datasets#115mmerce wants to merge 1 commit intowhizzml:masterfrom
mmerce wants to merge 1 commit intowhizzml:masterfrom
Conversation
jaor
approved these changes
May 23, 2025
Member
jaor
left a comment
There was a problem hiding this comment.
looks good to me, just a few cosmetics nits below.
| - The second, third and fourth steps are repeated with each of the k parts, | ||
| so that k evaluations are generated | ||
| - Finally, the evaluation metrics are averaged to provide the cross-validation | ||
| metrics. |
Member
There was a problem hiding this comment.
just a note outside the scope of this PR: it'd be nice if in the metadata's description we could use a pointer to a file, something like:
"description": {"file": "./readme.md"}
| @@ -0,0 +1,18 @@ | |||
| # Script for purged k-fold cross-validation | |||
|
|
|||
| The objective of this script is create a purged k-fold cross validation | |||
| # Script for purged k-fold cross-validation | ||
|
|
||
| The objective of this script is create a purged k-fold cross validation | ||
| starting form any classification model |
| " predict " (if regression? "regressions" | ||
| "classifications") | ||
| ".") | ||
| "code" 106})))) |
| (and (= model-type "linearregression") (not regression?)))) | ||
| (when error | ||
| (raise {"message" (str "The " model-type " cannot be used to" | ||
| " predict " (if regression? "regressions" |
| batch (round (/ rows k-folds)) | ||
| k-fold-fn (lambda (x) | ||
| (log-info "range" (str (+ 1 (* x batch))) (str (+ 1 (* (+ x 1) batch)))) | ||
| (create-dataset {"origin_dataset" dataset-id |
Member
There was a problem hiding this comment.
i would define a variable for range in the let, instead of computing it twice
| pruning-rows (round (* (/ rows 100) 7.5))) | ||
| (log-info "range" (str (+ 1 pruning-rows)) (str (- rows pruning-rows 1))) | ||
| (create-dataset {"origin_dataset" ds-id | ||
| "range" [(+ 1 pruning-rows) (- rows pruning-rows 1)]}))) |
Member
There was a problem hiding this comment.
same thing about defining a variable for range (there could even be a function for computing it)
|
|
||
| The following script performs a k-fold cross-validation compatible with time | ||
| series datasets. Test datasets are created by sampling linearly the original | ||
| dataset and some data is removed from the test dataset edges to avoid leakage. |
Member
There was a problem hiding this comment.
maybe it is worth mentioning what a dataset "edge" is, unless it's pretty common terminology in this context (i for one don't know what it is :))
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.