Learning symbolic expressions directly from experiment data is a vital step
in AI-driven scientific discovery. Nevertheless, state-of-the-art approaches
are limited to learning simple expressions. Regressing expressions involving
many independent variables still remain out of reach. Motivated by the control
variable experiments widely utilized in science, we propose Control Variable
Genetic Programming (CVGP) for symbolic regression over many independent
variables. CVGP expedites symbolic expression discovery via customized
experiment design, rather than learning from a fixed dataset collected a
priori. CVGP starts by fitting simple expressions involving a small set of
independent variables using genetic programming, under controlled experiments
where other variables are held as constants. It then extends expressions
learned in previous generations by adding new independent variables, using new
control variable experiments in which these variables are allowed to vary.
Theoretically, we show CVGP as an incremental building approach can yield an
exponential reduction in the search space when learning a class of expressions.
Experimentally, CVGP outperforms several baselines in learning symbolic
expressions involving multiple independent variables.
Description
[2306.08057] Symbolic Regression via Control Variable Genetic Programming
%0 Generic
%1 jiang2023symbolic
%A Jiang, Nan
%A Xue, Yexiang
%D 2023
%K ak-symbolic-numeric control deep-learning from:adulny genetic genetic-programming scientific-discovery sem_ws23 symbolic-regression variable
%T Symbolic Regression via Control Variable Genetic Programming
%U http://arxiv.org/abs/2306.08057
%X Learning symbolic expressions directly from experiment data is a vital step
in AI-driven scientific discovery. Nevertheless, state-of-the-art approaches
are limited to learning simple expressions. Regressing expressions involving
many independent variables still remain out of reach. Motivated by the control
variable experiments widely utilized in science, we propose Control Variable
Genetic Programming (CVGP) for symbolic regression over many independent
variables. CVGP expedites symbolic expression discovery via customized
experiment design, rather than learning from a fixed dataset collected a
priori. CVGP starts by fitting simple expressions involving a small set of
independent variables using genetic programming, under controlled experiments
where other variables are held as constants. It then extends expressions
learned in previous generations by adding new independent variables, using new
control variable experiments in which these variables are allowed to vary.
Theoretically, we show CVGP as an incremental building approach can yield an
exponential reduction in the search space when learning a class of expressions.
Experimentally, CVGP outperforms several baselines in learning symbolic
expressions involving multiple independent variables.
@misc{jiang2023symbolic,
abstract = {Learning symbolic expressions directly from experiment data is a vital step
in AI-driven scientific discovery. Nevertheless, state-of-the-art approaches
are limited to learning simple expressions. Regressing expressions involving
many independent variables still remain out of reach. Motivated by the control
variable experiments widely utilized in science, we propose Control Variable
Genetic Programming (CVGP) for symbolic regression over many independent
variables. CVGP expedites symbolic expression discovery via customized
experiment design, rather than learning from a fixed dataset collected a
priori. CVGP starts by fitting simple expressions involving a small set of
independent variables using genetic programming, under controlled experiments
where other variables are held as constants. It then extends expressions
learned in previous generations by adding new independent variables, using new
control variable experiments in which these variables are allowed to vary.
Theoretically, we show CVGP as an incremental building approach can yield an
exponential reduction in the search space when learning a class of expressions.
Experimentally, CVGP outperforms several baselines in learning symbolic
expressions involving multiple independent variables.},
added-at = {2023-10-23T12:11:27.000+0200},
author = {Jiang, Nan and Xue, Yexiang},
biburl = {https://www.bibsonomy.org/bibtex/21d501b9fb5098bac4065d68531f0c35f/adulny},
description = {[2306.08057] Symbolic Regression via Control Variable Genetic Programming},
interhash = {c18928685daacfdd5e88cf632835fc64},
intrahash = {1d501b9fb5098bac4065d68531f0c35f},
keywords = {ak-symbolic-numeric control deep-learning from:adulny genetic genetic-programming scientific-discovery sem_ws23 symbolic-regression variable},
note = {cite arxiv:2306.08057},
timestamp = {2023-10-23T12:13:15.000+0200},
title = {Symbolic Regression via Control Variable Genetic Programming},
url = {http://arxiv.org/abs/2306.08057},
year = 2023
}