Goals
- Discuss the Pitfalls of Passive Sampling: The Latin Hypercube and Sobol sampling algorithms do not consider existing information about the current data set or any previously trained models.
- Introduce Selective Sobol Sampling: Our proposed algorithm actively utilizes existing data and previously trained models to identify the most informative samples.
- Analytical Examples: Simple examples illustrate how Selective Sobol Sampling outperforms conventional methods.
- DWSIM Example: A chemical distillation column example demonstrates that SSS yields better surrogates.
Key Takeaways

Sampling Processes
Latin Hypercube and Sobol Sampling
Selective Sobol Sampling
Outline of the Selective Sobol Sampling Algorithm:
Selective Sobol Sampling does not require the existing data set to be of any type. Therefore, SSS can be used, for example, when a Latin hypercube sample set fails to meet performance requirements and additional samples are needed. This flexibility makes the algorithm a powerful tool when a well-fitted surrogate model is necessary.

Summary of the Algorithm’s Inputs and Outputs
- Candidate Inputs: A large collection of potential inputs (examples below use a Sobol set of size 100,000)
- Current Training Data: The existing data used for training the model.
- Surrogate Model: A model trained using the current training data.
- Number of Input Points to Select: The desired quantity of input points to be chosen from the candidate set.
- Selected Inputs: A subset of the candidate inputs that have been chosen based on the algorithm’s criteria.
Experimental Setup
- Selective Sobol Sampling: The proposed approach creates input samples. If available, the network is warm-started with a previously trained surrogate model.
- Warm Start Sobol Sampling: MATLAB’s sobolset function creates input samples. If available, the network is warm-started with a previously trained surrogate model.
- Cold Start Sobol Sampling: MATLAB’s sobolset function creates input samples.
- Latin Hypercube: MATLAB’s lhsdesign function creates input samples.
- Random: MATLAB’s rand function creates input samples.
Analytical Examples
Motivating Example


The figures below show the surrogate model’s convergence behavior when using different sampling approaches. We used the same testing set (2000 samples) to evaluate all metrics presented; see the associated code for details. Warm start sampling approaches are displayed with dotted lines to indicate dependence on a previous model. Sample sets were incremented by 100. See the provided code for details on the model architecture and training methods. SSS outperforms all passive sampling methods and consistently reduces the error metrics as more samples are used.


Quality, Not Quantity

Multiple-Input Multiple-Output Example


Simple Example

Heaviside Example
y =
\begin{cases}
0 & \text{if } x < 0.5 \\
1 & \text{if } x \geq 0.5
\end{cases}
$$



DWSIM Example



The animated figures below display the convergence behavior of all three Sobol sampling methods used in this experiment. The first figure displays all predicted outputs, while the second only shows the propane molar fraction. Note that SSS dramatically increased the accuracy of the model’s prediction of the propane molar fraction, especially when the parameter is larger than average. As shown in the final plot, this parameter tends to be small in most cases. Given the bounds of the input parameters and the system’s behavior, there are some instances when this parameter can be larger than usual. A surrogate model must be able to capture the complete behavior of the system even when some behavior is unusual. This very behavior of distillation columns motivated us to develop Selective Sobol Sampling.



Algorithm’s Computation Effort


Expanding an Existing Set

Other Variations
- Expanding Input Bounds: The algorithm can be adapted when input bounds need to be adjusted for an existing surrogate model. The existing data set and selected inputs do not need to have the same bounds.
- Additional Inputs: The method can be adapted to introduce an additional input to an existing model, preventing unnecessary resampling if additional parameters or inputs are needed as a workflow evolves.
- Continuous Learning: The diagram above can be modified so that model sampling and surrogate training occur concurrently, reducing the time to obtain a well-fitted surrogate model.