71432d1890209628e189edf04d7d623160943718,autokeras/hypermodel/preprocessor.py,TextToNgramVector,transform,#TextToNgramVector#Any#Any#,182

Before Change


        data = self.vectorizer.transform([sentence]).toarray()
        if self.selector:
            data = self.selector.transform(data).astype("float32")
        return data[0]

    def output_types(self):
        return (tf.float32,)

After Change


        // Calculate tf at doc level
        tf = np.zeros(len(self.vocabulary), dtype=int)
        x = nest.flatten(x)[0].numpy().decode("utf-8")
        token_pattern = re.compile(r"(?u)\b\w\w+\b")
        tokens = self._word_ngram(token_pattern.findall(x.lower()))

        for feature in tokens:
            if feature in self.vocabulary:
                feature_idx = self.vocabulary[feature]
                tf[feature_idx] += 1
        result = tf * self.k_best_idf_values
        result = normalize([result], norm=self.norm, copy=False)[0]
        return result
Italian Trulli
In pattern: SUPERPATTERN

Frequency: 3

Non-data size: 7

Instances


Project Name: keras-team/autokeras
Commit Name: 71432d1890209628e189edf04d7d623160943718
Time: 2019-12-18
Author: 33369174+Davidsirui@users.noreply.github.com
File Name: autokeras/hypermodel/preprocessor.py
Class Name: TextToNgramVector
Method Name: transform


Project Name: samuelclay/NewsBlur
Commit Name: bd334ef20fdccb74d310ca00b1134388645ba0a5
Time: 2014-07-21
Author: samuel@ofbrooklyn.com
File Name: vendor/readability/encoding.py
Class Name:
Method Name: get_encoding