d1196006be574c16473df6efed448f9fa308a680,tests/test_preprocess.py,,test_tmpreproc_en_lemmatize,#Any#,455

Before Change




def test_tmpreproc_en_lemmatize(tmpreproc_en):
    tokens = tmpreproc_en.tokenize().tokens
    lemmata = tmpreproc_en.pos_tag().lemmatize().tokens

    assert set(tokens.keys()) == set(lemmata.keys())

After Change


        dt_ = lemmata[dl]
        assert len(dt) == len(dt_)

    assert len(tmpreproc_en.vocabulary) < len(vocab)

    _check_save_load_state(tmpreproc_en)

Italian Trulli
In pattern: SUPERPATTERN

Frequency: 7

Non-data size: 3

Instances


Project Name: WZBSocialScienceCenter/tmtoolkit
Commit Name: d1196006be574c16473df6efed448f9fa308a680
Time: 2019-03-06
Author: markus.konrad@wzb.eu
File Name: tests/test_preprocess.py
Class Name:
Method Name: test_tmpreproc_en_lemmatize


Project Name: WZBSocialScienceCenter/tmtoolkit
Commit Name: bbca1fca586636e0bf90336893937956c9962c7d
Time: 2019-03-07
Author: markus.konrad@wzb.eu
File Name: tests/test_preprocess.py
Class Name:
Method Name: test_tmpreproc_en_clean_tokens


Project Name: WZBSocialScienceCenter/tmtoolkit
Commit Name: 1273d579c5ce666aaf8ff20942ce83681bf5eb06
Time: 2019-03-13
Author: markus.konrad@wzb.eu
File Name: tests/test_preprocess.py
Class Name:
Method Name: test_tmpreproc_de_lemmatize


Project Name: WZBSocialScienceCenter/tmtoolkit
Commit Name: 1070ee6fe00f2a3b03273e6a6dbf5625ab4dffc7
Time: 2019-03-12
Author: markus.konrad@wzb.eu
File Name: tests/test_preprocess.py
Class Name:
Method Name: test_tmpreproc_en_get_dtm


Project Name: WZBSocialScienceCenter/tmtoolkit
Commit Name: ab1359b176f8ac95ac443735395d8a316be2df16
Time: 2019-03-06
Author: markus.konrad@wzb.eu
File Name: tests/test_preprocess.py
Class Name:
Method Name: test_tmpreproc_en_vocabulary


Project Name: WZBSocialScienceCenter/tmtoolkit
Commit Name: 1273d579c5ce666aaf8ff20942ce83681bf5eb06
Time: 2019-03-13
Author: markus.konrad@wzb.eu
File Name: tests/test_preprocess.py
Class Name:
Method Name: test_tmpreproc_de_tokenize


Project Name: WZBSocialScienceCenter/tmtoolkit
Commit Name: bbca1fca586636e0bf90336893937956c9962c7d
Time: 2019-03-07
Author: markus.konrad@wzb.eu
File Name: tests/test_preprocess.py
Class Name:
Method Name: test_tmpreproc_en_remove_special_chars_in_tokens