04f4305931bedd43ca76190b25de120239d2dfe6,deepchem/splits/tests/test_splitter.py,TestSplitter,test_multitask_stratified_column_indices_masked,#TestSplitter#,381
Before Change
w[:n_samples // 2] = 0
stratified_splitter = dc.splits.RandomStratifiedSplitter()
split_indices = stratified_splitter.get_task_split_indices(
y, w, frac_split=.5)
w_present = (w != 0)
y_present = y * w_present
for task in range(n_tasks):
split_index = split_indices[task]
task_actives = np.count_nonzero(y_present[:, task])
// The split index should partition dataset in half.
assert np.count_nonzero(y_present[:split_index, task]) == int(
task_actives / 2)
After Change
dataset = dc.data.NumpyDataset(X, y, w)
stratified_splitter = dc.splits.RandomStratifiedSplitter()
train, valid, test = stratified_splitter.split(dataset, 0.5, 0, 0.5)
w_present = (w != 0)
y_present = y * w_present
In pattern: SUPERPATTERN
Frequency: 4
Non-data size: 6
Instances
Project Name: deepchem/deepchem
Commit Name: 04f4305931bedd43ca76190b25de120239d2dfe6
Time: 2020-09-28
Author: peastman@stanford.edu
File Name: deepchem/splits/tests/test_splitter.py
Class Name: TestSplitter
Method Name: test_multitask_stratified_column_indices_masked
Project Name: deepchem/deepchem
Commit Name: 04f4305931bedd43ca76190b25de120239d2dfe6
Time: 2020-09-28
Author: peastman@stanford.edu
File Name: deepchem/splits/tests/test_splitter.py
Class Name: TestSplitter
Method Name: test_multitask_stratified_column_indices
Project Name: deepchem/deepchem
Commit Name: 04f4305931bedd43ca76190b25de120239d2dfe6
Time: 2020-09-28
Author: peastman@stanford.edu
File Name: deepchem/splits/tests/test_splitter.py
Class Name: TestSplitter
Method Name: test_singletask_stratified_column_indices
Project Name: deepchem/deepchem
Commit Name: 04f4305931bedd43ca76190b25de120239d2dfe6
Time: 2020-09-28
Author: peastman@stanford.edu
File Name: deepchem/splits/tests/test_splitter.py
Class Name: TestSplitter
Method Name: test_singletask_stratified_column_indices_mask