ea692ade194392250df2e4681764090868bdca82,horovod/spark/torch/estimator.py,TorchModel,_transform,#TorchModel#Any#,410
Before Change
// Spark has to infer whether a filed is nullable or not from a limited number of samples.
// It does not always get it right. We copy the nullable boolean variable for the fields
// from the original dataframe to the final DF schema.
nullables = {field.name: field.nullable for field in df.schema.fields}
for field in final_output_schema.fields:
if field.name in nullables:
field.nullable = nullables[field.name]
After Change
// append output schema
override_fields = df.limit(1).rdd.mapPartitions(predict).toDF().schema.fields[-len(output_cols):]
for name, override, label in zip(output_cols, override_fields, label_cols):
// default data type as label type
data_type = metadata[label]["spark_data_type"]()
if type(override.dataType) == VectorUDT:
// Override output to vector. This is mainly for torch"s classification loss
// where label is a scalar but model output is a vector.
data_type = VectorUDT()
final_output_fields.append(StructField(name=name, dataType=data_type, nullable=True))
final_output_schema = StructType(final_output_fields)
pred_rdd = df.rdd.mapPartitions(predict)
In pattern: SUPERPATTERN
Frequency: 3
Non-data size: 7
Instances Project Name: horovod/horovod
Commit Name: ea692ade194392250df2e4681764090868bdca82
Time: 2021-02-04
Author: irasit@users.noreply.github.com
File Name: horovod/spark/torch/estimator.py
Class Name: TorchModel
Method Name: _transform
Project Name: gboeing/osmnx
Commit Name: 313b79ce9cc8538a78edfc82ccc7b02c23766287
Time: 2020-10-20
Author: 44049940+Labulitiolle@users.noreply.github.com
File Name: osmnx/utils_graph.py
Class Name:
Method Name: graph_from_gdfs
Project Name: GPflow/GPflow
Commit Name: 0b9e1f064ab1ce1d994f86686e7d662a46095e36
Time: 2020-03-30
Author: st--@users.noreply.github.com
File Name: doc/source/notebooks/advanced/mcmc.pct.py
Class Name:
Method Name: marginal_samples