Overview
The fastcpd (fast change point detection) is a fast implmentation of change point detection methods in R/Python.
Documentation
- R documentation: fastcpd.xingchi.li
- Python documentation: fastcpd.xingchi.li/python
Installation
R
# install.packages("pak")
pak::pak("doccstat/fastcpd")
# or install from CRAN
install.packages("fastcpd")Usage
R
set.seed(1)
n <- 1000
x <- rep(0, n + 3)
for (i in 1:600) {
x[i + 3] <- 0.6 * x[i + 2] - 0.2 * x[i + 1] + 0.1 * x[i] + rnorm(1, 0, 3)
}
for (i in 601:1000) {
x[i + 3] <- 0.3 * x[i + 2] + 0.4 * x[i + 1] + 0.2 * x[i] + rnorm(1, 0, 3)
}
result <- fastcpd::fastcpd.ar(x[3 + seq_len(n)], 3, r.progress = FALSE)
summary(result)
#>
#> Call:
#> fastcpd::fastcpd.ar(data = x[3 + seq_len(n)], order = 3, r.progress = FALSE)
#>
#> Change points:
#> 614
#>
#> Cost values:
#> 2754.116 2038.945
#>
#> Parameters:
#> segment 1 segment 2
#> 1 0.57120256 0.2371809
#> 2 -0.20985108 0.4031244
#> 3 0.08221978 0.2290323
plot(result)
Python WIP
import fastcpd.segmentation
from numpy import concatenate
from numpy.random import normal, multivariate_normal
covariance_mat = [[100, 0, 0], [0, 100, 0], [0, 0, 100]]
data = concatenate((multivariate_normal([0, 0, 0], covariance_mat, 300),
multivariate_normal([50, 50, 50], covariance_mat, 400),
multivariate_normal([2, 2, 2], covariance_mat, 300)))
fastcpd.segmentation.mean(data)
import fastcpd.variance_estimation
fastcpd.variance_estimation.mean(data)Comparison
set.seed(1)
n <- 10^8
mean_data <- c(rnorm(n / 2, 0, 1), rnorm(n / 2, 50, 1))
run_isolated <- function(expr) {
callr::r(function(e, n) {
set.seed(1)
mean_data <- c(rnorm(n / 2, 0, 1), rnorm(n / 2, 50, 1))
system.time(eval(e))
}, args = list(e = substitute(expr), n = n))
}
print(run_isolated(fastcpd::fastcpd.mean(mean_data, r.progress = FALSE, cp_only = TRUE, variance_estimation = 1)))
#> user system elapsed
#> 9.497 6.734 15.928
print(run_isolated(mosum::mosum(c(mean_data), G = 40)))
#> user system elapsed
#> 9.145 6.797 16.007
print(run_isolated(fpop::Fpop(mean_data, 2 * log(n))))
#> user system elapsed
#> 44.749 2.635 47.486
print(run_isolated(changepoint::cpt.mean(mean_data, method = "PELT")))
#> user system elapsed
#> 31.332 6.178 37.555
library(microbenchmark)
mb_result <- microbenchmark(
baseline = callr::r(function(n) {
set.seed(1)
mean_data <- c(rnorm(n / 2, 0, 1), rnorm(n / 2, 50, 1))
}, args = list(n = n)),
changepoint = callr::r(function(n) {
set.seed(1)
mean_data <- c(rnorm(n / 2, 0, 1), rnorm(n / 2, 50, 1))
changepoint::cpt.mean(mean_data, method = "PELT")
}, args = list(n = n)),
fpop = callr::r(function(n) {
set.seed(1)
mean_data <- c(rnorm(n / 2, 0, 1), rnorm(n / 2, 50, 1))
fpop::Fpop(mean_data, 2 * log(n))
}, args = list(n = n)),
mosum = callr::r(function(n) {
set.seed(1)
mean_data <- c(rnorm(n / 2, 0, 1), rnorm(n / 2, 50, 1))
mosum::mosum(c(mean_data), G = 40)
}, args = list(n = n)),
fastcpd = callr::r(function(n) {
set.seed(1)
mean_data <- c(rnorm(n / 2, 0, 1), rnorm(n / 2, 50, 1))
fastcpd::fastcpd.mean(mean_data, r.progress = FALSE, cp_only = TRUE, variance_estimation = 1)
}, args = list(n = n)),
times = 5
)
baseline_median <- median(mb_result$time[mb_result$expr == "baseline"])
mb_net <- mb_result[mb_result$expr != "baseline", ]
mb_net$time <- mb_net$time - baseline_median
mb_net$expr <- droplevels(mb_net$expr)
class(mb_net) <- class(mb_result)
ggplot2::autoplot(mb_net)
#> Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
#> ℹ Please use tidy evaluation idioms with `aes()`.
#> ℹ See also `vignette("ggplot2-in-packages")` for more information.
#> ℹ The deprecated feature was likely used in the microbenchmark package.
#> Please report the issue at
#> <https://github.com/joshuaulrich/microbenchmark/issues/>.
#> This warning is displayed once per session.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
FAQ
Should I install suggested packages?
The suggested packages are not required for the main functionality of the package. They are only required for the vignettes. If you want to learn more about the package comparison and other vignettes, you could either check out vignettes on CRAN or pkgdown generated documentation.
I countered problems related to gfortran on Mac OSX or Linux!
The package should be able to install on Mac and any Linux distribution without any problems if all the dependencies are installed. However, if you encountered problems related to gfortran, it might be because RcppArmadillo is not installed previously. Try Mac OSX stackoverflow solution or Linux stackover solution if you have trouble installing RcppArmadillo.
We welcome contributions from everyone. Please follow the instructions below to make contributions.
Fork the repo.
Create a new branch from
mainbranch.-
Make changes and commit them.
- Please follow the Google’s R style guide for naming variables and functions.
- If you are adding a new family of models with new cost functions with corresponding gradient and Hessian, please add them to
src/fastcpd_class_cost.ccwith proper example and tests invignettes/gallery.Rmdandtests/testthat/test-gallery.R. - Add the family name to
src/fastcpd_constants.h. - [Recommended] Add a new wrapper function in
R/fastcpd_wrappers.Rfor the new family of models and move the examples to the new wrapper function as roxygen examples. - Add the new wrapper function to the corresponding section in
_pkgdown.yml.
Push the changes to your fork.
Create a pull request.
Make sure the pull request does not create new warnings or errors in
devtools::check().
Trouble installing Python package.
Python headers are required to install the Python package. If you are using Ubuntu, you can install the headers with:
sudo apt install python3-dev
Encountered a bug or unintended behavior?
- File a ticket at GitHub Issues.
- Contact the authors specified in DESCRIPTION.
Acknowledgements
Special thanks to clODE.
