63bde2f7cf28dbf6bca6e77fe0b0a9966dc6aee6,finetune/encoding/input_encoder.py,,tokenize_context,#Any#Any#Any#,203

Before Change


        // (this would not be the case if multiple context spans make up the same token)
        if char_loc == -1:
            tokenized_context.append(default_context)
        elif token in ["\n"]:
            tokenized_context.append(context_by_char_loc[current_char_loc][1])
        else:
            if char_loc > context_by_char_loc[current_char_loc][0]:

After Change


                    // TODO: this is a workaround that has no guarantees of being correct
                    raise ValueError("Context cannot be fully matched as it appears to not cover the end of the sequence for token {}".format(token))
            if token.strip() not in context_by_char_loc[current_char_loc][2]:
                warnings.warn("subtoken: {} has matched up with the context for token: {}".format(repr(token), repr(context_by_char_loc[current_char_loc][2])))
            tokenized_context.append(context_by_char_loc[current_char_loc][1])

    assert len(tokenized_context) == len(encoded_output.token_ends)

In pattern: SUPERPATTERN

Frequency: 3

Non-data size: 3

Instances

Link

Project Name: IndicoDataSolutions/finetune

Commit Name: 63bde2f7cf28dbf6bca6e77fe0b0a9966dc6aee6

Time: 2020-05-14

Author: benlt@hotmail.co.uk

File Name: finetune/encoding/input_encoder.py

Class Name:

Method Name: tokenize_context

Link

Project Name: brian-team/brian2

Commit Name: 2e1ca5383704af21a4285984c5d00d2c11f13c22

Time: 2017-03-14

Author: marcel.stimberg@inserm.fr

File Name: brian2/units/fundamentalunits.py

Class Name: UnitRegistry

Method Name: add

Link

Project Name: facebookresearch/ParlAI

Commit Name: 9ad1d2da68aa4acf817562502340bf319276b283

Time: 2019-05-14

Author: jju@fb.com

File Name: parlai/mturk/core/dev/socket_manager.py

Class Name: Packet

Method Name: from_dict