29d9d12a26718405763103e997285026f8fd4563,pysbd/segmenter.py,Segmenter,sentences_with_char_spans,#Segmenter#Any#,50

Before Change


        // for trailing whitespaces \s* is used as suffix
        // to keep non-destructive text after segments joins
        return [TextSpan(m.group(), m.start(), m.end()) for sent in sentences
                for m in re.finditer("{0}\s*".format(re.escape(sent)),
                self.original_text)]

    def segment(self, text):

After Change


        // since SENTENCE_BOUNDARY_REGEX doesnt account
        // for trailing whitespaces \s* is used as suffix
        // to keep non-destructive text after segments joins
        sent_spans = set((match.group(), match.start(), match.end()) for sent in sentences
                for match in re.finditer("{0}\s*".format(re.escape(sent)),
                self.original_text))
        sorted_spans = sorted(sent_spans, key=lambda x: x[1])
Italian Trulli
In pattern: SUPERPATTERN

Frequency: 3

Non-data size: 5

Instances


Project Name: nipunsadvilkar/pySBD
Commit Name: 29d9d12a26718405763103e997285026f8fd4563
Time: 2020-07-21
Author: nipunsadvilkar@gmail.com
File Name: pysbd/segmenter.py
Class Name: Segmenter
Method Name: sentences_with_char_spans


Project Name: nipunsadvilkar/pySBD
Commit Name: 29d9d12a26718405763103e997285026f8fd4563
Time: 2020-07-21
Author: nipunsadvilkar@gmail.com
File Name: pysbd/segmenter.py
Class Name: Segmenter
Method Name: sentences_with_char_spans


Project Name: estnltk/estnltk
Commit Name: cd1167b2085bbc51f606e9bac647c8ac0ad21576
Time: 2015-07-06
Author: amatsin@gmail.com
File Name: estnltk/wiki/internalLink.py
Class Name:
Method Name: addIntLinks


Project Name: snipsco/snips-nlu
Commit Name: 346705a7703d6beebaa3e033520865943440e259
Time: 2017-02-24
Author: clement.doumouro@snips.ai
File Name: custom_intent_parser/entity_extractor/regex_entity_extractor.py
Class Name: RegexEntityExtractor
Method Name: get_entities