Transforms a multitask dataset to a collection of singletask datasets.
tasks = dataset.get_task_names()
assert len(tasks) == len(task_dirs)
log("Splitting multitask dataset into singletask datasets", dataset.verbose)
task_datasets = [
DiskDataset.create_dataset([], task_dirs[task_num], [task])
for (task_num, task) in enumerate(tasks)
]
//task_metadata_rows = {task: [] for task in tasks}
for shard_num, (X, y, w, ids) in enumerate(dataset.itershards()):
log("Processing shard %d" % shard_num, dataset.verbose)
basename = "dataset-%d" % shard_num
for task_num, task in enumerate(tasks):
log("\tTask %s" % task, dataset.verbose)
After Change
Transforms a multitask dataset to a collection of singletask datasets.
tasks = dataset.get_task_names()
assert len(tasks) == len(task_dirs)
logger.info("Splitting multitask dataset into singletask datasets")
task_datasets = [
DiskDataset.create_dataset([], task_dirs[task_num], [task])
for (task_num, task) in enumerate(tasks)
]