:
Training involved masking selective tokens based on a lightweight predictor—a small binary classifier attached to the embedding layer. Tokens predicted as "low-information" (e.g., prepositions "de, para, com" or conjunctions "e, ou, mas") are assigned a null vector, bypassing the middle transformer layers. This reduces FLOPs by roughly 30% while maintaining >98% of the full model’s F1 score on standard benchmarks like the LeNER-Br (legal named entity recognition) and the MiniHateBR (hate speech detection). fg-selective-brazilian.bin
If you skip the file during the initial download, you cannot simply "turn on" Brazilian Portuguese later. You would need to download the specific .bin file and re-run the installer. : Training involved masking selective tokens based on
: By separating these files, the repacker allows users to skip downloading languages they don't speak, significantly reducing the initial download size. How the File is Used During Installation If you skip the file during the initial