Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper β’ 1908.10084 β’ Published β’ 14
How to use barealek/peftech-v1-plus with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("barealek/peftech-v1-plus")
sentences = [
"Instruct: Retrieve text with a similar pragmatic profile, including safety, emotion, sentiment, language, and identity-target signals\nQuery: \"Since women say men only think with their dicks do you think she would get offended if I asked her to blow my mind.\" π I hate the people I work with fucking clowns",
"Instruct: Retrieve text with a similar pragmatic profile, including safety, emotion, sentiment, language, and identity-target signals\nQuery: /r/ENLIGHTENEDCENTRISM Because someone who wants equality and a nazi are equally as bad, and homophobes have absolutely *no track record* of not letting gays keep practicing their ~~comedy~~ life. As opposed to SJWs who have gone into history responsible for villifying, suppressing and outright killing sexual minorities. But yeah no, middle ground all the way babyyy. You're the smartest guy on Reddit!",
"Instruct: Retrieve text with a similar pragmatic profile, including safety, emotion, sentiment, language, and identity-target signals\nQuery: they do not care about me or you, they care about what they can take from you and what they can make you do for them.",
"Instruct: Retrieve text with a similar pragmatic profile, including safety, emotion, sentiment, language, and identity-target signals\nQuery: @smkndofpnutdssr @ACLU 70 years ago everyone was brainwashed into being christian and also had coathanger abortions because it was the Great Depression and then thousands on women died because they had unsafe abortions π"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from microsoft/harrier-oss-v1-270m. It maps sentences & paragraphs to a 896-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
(1): Pooling({'word_embedding_dimension': 640, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
(2): Dense({'in_features': 640, 'out_features': 896, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
(3): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the π€ Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
"Instruct: Retrieve text with a similar pragmatic profile, including safety, emotion, sentiment, language, and identity-target signals\nQuery: Everyone in my country has been killing each other for years over religion and they're not even different religion just different branches of Christianity and I quickly realised it's all pointless",
'Instruct: Retrieve text with a similar pragmatic profile, including safety, emotion, sentiment, language, and identity-target signals\nQuery: Me when my family confronts me about all the queer content on my social media URL',
'Instruct: Retrieve text with a similar pragmatic profile, including safety, emotion, sentiment, language, and identity-target signals\nQuery: Good to see Tomas Rosicky playing tdae #ARSvQPR',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 896]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0078, 0.6172, 0.5234],
# [0.6172, 1.0000, 0.5859],
# [0.5234, 0.5859, 1.0000]], dtype=torch.bfloat16)
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
Instruct: Retrieve text with a similar pragmatic profile, including safety, emotion, sentiment, language, and identity-target signals |
Instruct: Retrieve text with a similar pragmatic profile, including safety, emotion, sentiment, language, and identity-target signals |
1.0 |
Instruct: Retrieve text with a similar pragmatic profile, including safety, emotion, sentiment, language, and identity-target signals |
Instruct: Retrieve text with a similar pragmatic profile, including safety, emotion, sentiment, language, and identity-target signals |
0.0 |
Instruct: Retrieve text with a similar pragmatic profile, including safety, emotion, sentiment, language, and identity-target signals |
Instruct: Retrieve text with a similar pragmatic profile, including safety, emotion, sentiment, language, and identity-target signals |
1.0 |
main.SplitHeadContrastiveDistillationLossper_device_train_batch_size: 16per_device_eval_batch_size: 16num_train_epochs: 1multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss |
|---|---|---|
| 0.3399 | 500 | 0.0316 |
| 0.6798 | 1000 | 0.0315 |
| 1.0197 | 1500 | 0.031 |
| 1.3596 | 2000 | 0.0298 |
| 1.6995 | 2500 | 0.0302 |
| 0.3399 | 500 | 0.0288 |
| 0.6798 | 1000 | 0.029 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
Base model
microsoft/harrier-oss-v1-270m