6367318f455aa8c27b6341c9b98794351dfd168e,examples/pytorch/pinsage/data_utils.py,,train_test_split_by_time,#Any#Any#Any#,9
Before Change
df["test_mask"] = np.zeros((len(df),), dtype=np.bool)
df = df.sort_values([item, timestamp])
for track_id in df[item].unique():
idx = (df[item] == track_id).to_numpy().nonzero()[0]
idx = df.index[idx]
if len(idx) > 1:
df.loc[idx[-1], "train_mask"] = False
df.loc[idx[-1], "test_mask"] = True
if len(idx) > 2:
After Change
df.iloc[-2, -2] = True
return df
df = df.groupby(item).apply(train_test_split).compute(scheduler="processes").sort_index()
print(df[df[item] == df[item].unique()[0]].sort_values(timestamp))
return df["train_mask"].to_numpy().nonzero()[0], \
df["val_mask"].to_numpy().nonzero()[0], \
df["test_mask"].to_numpy().nonzero()[0]
In pattern: SUPERPATTERN
Frequency: 3
Non-data size: 3
Instances
Project Name: dmlc/dgl
Commit Name: 6367318f455aa8c27b6341c9b98794351dfd168e
Time: 2020-08-17
Author: coin2028@hotmail.com
File Name: examples/pytorch/pinsage/data_utils.py
Class Name:
Method Name: train_test_split_by_time
Project Name: deepchem/deepchem
Commit Name: d62d3a866dcb8f937d6aa5e869c309cee8a784ce
Time: 2016-07-11
Author: bharath.ramsundar@gmail.com
File Name: deepchem/models/multitask.py
Class Name: SingletaskToMultitask
Method Name: fit
Project Name: deepchem/deepchem
Commit Name: 2109dc81a70406c98305256587ab57e64135b96a
Time: 2016-07-29
Author: apappu97@gmail.com
File Name: deepchem/splits/tests/test_splitter.py
Class Name: TestSplitters
Method Name: test_stratified_multitask_split