0e364f4b0f70679ca984c0ba5629c569135804a4,pysbd/segmenter.py,Segmenter,sentences_with_char_spans,#Segmenter#Any#,50

Before Change


        // since SENTENCE_BOUNDARY_REGEX doesnt account
        // for trailing whitespaces \s* is used as suffix
        // to keep non-destructive text after segments joins
        sent_spans = set((match.group(), match.start(), match.end()) for sent in sentences
                for match in re.finditer("{0}\s*".format(re.escape(sent)),
                self.original_text))
        sorted_spans = sorted(sent_spans, key=lambda x: x[1])

After Change


        // since SENTENCE_BOUNDARY_REGEX doesnt account
        // for trailing whitespaces \s* & is used as suffix
        // to keep non-destructive text after segments joins
        sent_spans = []
        prior_start_char_idx = 0
        for sent in sentences:
            for match in re.finditer(r"{0}\s*".format(re.escape(sent)), self.original_text):
                match_str = match.group()
                match_start_idx, match_end_idx = match.span()
                if match_start_idx >= prior_start_char_idx:
                    // making sure if curren sentence and its span
                    // is either first sentence along with its char spans
                    // or current sent spans adjacent to prior sentence spans
                    sent_spans.append(
                        TextSpan(match_str, match_start_idx, match_end_idx))
                    prior_start_char_idx = match_start_idx
                    break
        return sent_spans

In pattern: SUPERPATTERN

Frequency: 3

Non-data size: 5

Instances

Link

Project Name: nipunsadvilkar/pySBD

Commit Name: 0e364f4b0f70679ca984c0ba5629c569135804a4

Time: 2020-07-26

Author: nipunsadvilkar@gmail.com

File Name: pysbd/segmenter.py

Class Name: Segmenter

Method Name: sentences_with_char_spans

Link

Project Name: MaybeShewill-CV/CRNN_Tensorflow

Commit Name: ed66679b71989f55cc25d7adf69e386ad27c2063

Time: 2019-03-22

Author: luoyao@baidu.com

File Name: data_provider/tf_io_pipline_fast_tools.py

Class Name: CrnnFeatureWriter

Method Name: run

Link

Project Name: studioml/studio

Commit Name: 13986978d4545aa429a7fc233d8e39718d52e255

Time: 2020-08-11

Author: andrei.denissov@cognizant.com

File Name: studio/keyvalue_provider.py

Class Name: KeyValueProvider

Method Name: checkpoint_experiment