0e364f4b0f70679ca984c0ba5629c569135804a4,pysbd/segmenter.py,Segmenter,sentences_with_char_spans,#Segmenter#Any#,50

Before Change


        // since SENTENCE_BOUNDARY_REGEX doesnt account
        // for trailing whitespaces \s* is used as suffix
        // to keep non-destructive text after segments joins
        sent_spans = set((match.group(), match.start(), match.end()) for sent in sentences
                for match in re.finditer("{0}\s*".format(re.escape(sent)),
                self.original_text))
        sorted_spans = sorted(sent_spans, key=lambda x: x[1])

After Change


        // since SENTENCE_BOUNDARY_REGEX doesnt account
        // for trailing whitespaces \s* & is used as suffix
        // to keep non-destructive text after segments joins
        sent_spans = []
        prior_start_char_idx = 0
        for sent in sentences:
            for match in re.finditer(r"{0}\s*".format(re.escape(sent)), self.original_text):
                match_str = match.group()
                match_start_idx, match_end_idx = match.span()
                if match_start_idx >= prior_start_char_idx:
                    // making sure if curren sentence and its span
                    // is either first sentence along with its char spans
                    // or current sent spans adjacent to prior sentence spans
                    sent_spans.append(
                        TextSpan(match_str, match_start_idx, match_end_idx))
                    prior_start_char_idx = match_start_idx
                    break
        return sent_spans
Italian Trulli
In pattern: SUPERPATTERN

Frequency: 3

Non-data size: 5

Instances


Project Name: nipunsadvilkar/pySBD
Commit Name: 0e364f4b0f70679ca984c0ba5629c569135804a4
Time: 2020-07-26
Author: nipunsadvilkar@gmail.com
File Name: pysbd/segmenter.py
Class Name: Segmenter
Method Name: sentences_with_char_spans


Project Name: MaybeShewill-CV/CRNN_Tensorflow
Commit Name: ed66679b71989f55cc25d7adf69e386ad27c2063
Time: 2019-03-22
Author: luoyao@baidu.com
File Name: data_provider/tf_io_pipline_fast_tools.py
Class Name: CrnnFeatureWriter
Method Name: run


Project Name: studioml/studio
Commit Name: 13986978d4545aa429a7fc233d8e39718d52e255
Time: 2020-08-11
Author: andrei.denissov@cognizant.com
File Name: studio/keyvalue_provider.py
Class Name: KeyValueProvider
Method Name: checkpoint_experiment