Matryoshka Representation Learning
Paper • 2205.13147 • Published • 27
How to use juanlofer/bge-base-fastdds-summaries-20epochs-666seed with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("juanlofer/bge-base-fastdds-summaries-20epochs-666seed")
sentences = [
"<summary>The \"_fastdds_statistics_sample_datas\" topic tracks the number of data messages or fragments sent by a DataWriter to deliver a single sample, excluding built-in and statistics DataWriters.</summary>",
" If several new data changes are received at once, the callbacks may\n be triggered just once, instead of once per change. The application\n must keep *reading* or *taking* until no new changes are available.",
"The \"_fastdds_statistics_sample_datas\" statistics topic collects the\nnumber of user's data messages (or data fragments in case that the\nmessage size is large enough to require RTPS fragmentation) that have\nbeen sent by the user's DataWriter to completely deliver a single\nsample. This topic does not apply to builtin (related to Discovery)\nand statistics DataWriters.",
"+------------------------------------------------+-----------------------------------------+------------+-------------+\n| Name | Description | Values | Default |\n|================================================|=========================================|============|=============|\n| \"<disable_heartbeat_piggyback>\" | See DisableHeartbeatPiggyback. | \"bool\" | \"false\" |\n+------------------------------------------------+-----------------------------------------+------------+-------------+"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("juanlofer/bge-base-fastdds-summaries-20epochs-666seed")
# Run inference
sentences = [
'The transport layer provides communication services between DDS entities, using UDPv4, UDPv6, TCPv4, TCPv6, and SHM transports.',
'* **TCPv4**: TCP communication over IPv4 (see TCP Transport).',
'The following table shows the supported primitive types and their\ncorresponding "TypeKind". The "TypeKind" is used to query the\nDynamicTypeBuilderFactory for the specific primitive DynamicType.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
dim_768InformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.3341 |
| cosine_accuracy@3 | 0.4455 |
| cosine_accuracy@5 | 0.5035 |
| cosine_accuracy@10 | 0.5661 |
| cosine_precision@1 | 0.3341 |
| cosine_precision@3 | 0.1485 |
| cosine_precision@5 | 0.1007 |
| cosine_precision@10 | 0.0566 |
| cosine_recall@1 | 0.3341 |
| cosine_recall@3 | 0.4455 |
| cosine_recall@5 | 0.5035 |
| cosine_recall@10 | 0.5661 |
| cosine_ndcg@10 | 0.4437 |
| cosine_mrr@10 | 0.4054 |
| cosine_map@100 | 0.416 |
dim_512InformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.3364 |
| cosine_accuracy@3 | 0.4478 |
| cosine_accuracy@5 | 0.4965 |
| cosine_accuracy@10 | 0.5777 |
| cosine_precision@1 | 0.3364 |
| cosine_precision@3 | 0.1493 |
| cosine_precision@5 | 0.0993 |
| cosine_precision@10 | 0.0578 |
| cosine_recall@1 | 0.3364 |
| cosine_recall@3 | 0.4478 |
| cosine_recall@5 | 0.4965 |
| cosine_recall@10 | 0.5777 |
| cosine_ndcg@10 | 0.4463 |
| cosine_mrr@10 | 0.4057 |
| cosine_map@100 | 0.4154 |
dim_256InformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.3271 |
| cosine_accuracy@3 | 0.4478 |
| cosine_accuracy@5 | 0.4988 |
| cosine_accuracy@10 | 0.5754 |
| cosine_precision@1 | 0.3271 |
| cosine_precision@3 | 0.1493 |
| cosine_precision@5 | 0.0998 |
| cosine_precision@10 | 0.0575 |
| cosine_recall@1 | 0.3271 |
| cosine_recall@3 | 0.4478 |
| cosine_recall@5 | 0.4988 |
| cosine_recall@10 | 0.5754 |
| cosine_ndcg@10 | 0.4414 |
| cosine_mrr@10 | 0.3997 |
| cosine_map@100 | 0.4105 |
dim_128InformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.3155 |
| cosine_accuracy@3 | 0.4292 |
| cosine_accuracy@5 | 0.4803 |
| cosine_accuracy@10 | 0.5754 |
| cosine_precision@1 | 0.3155 |
| cosine_precision@3 | 0.1431 |
| cosine_precision@5 | 0.0961 |
| cosine_precision@10 | 0.0575 |
| cosine_recall@1 | 0.3155 |
| cosine_recall@3 | 0.4292 |
| cosine_recall@5 | 0.4803 |
| cosine_recall@10 | 0.5754 |
| cosine_ndcg@10 | 0.4328 |
| cosine_mrr@10 | 0.389 |
| cosine_map@100 | 0.3994 |
dim_64InformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.2854 |
| cosine_accuracy@3 | 0.4153 |
| cosine_accuracy@5 | 0.4687 |
| cosine_accuracy@10 | 0.5568 |
| cosine_precision@1 | 0.2854 |
| cosine_precision@3 | 0.1384 |
| cosine_precision@5 | 0.0937 |
| cosine_precision@10 | 0.0557 |
| cosine_recall@1 | 0.2854 |
| cosine_recall@3 | 0.4153 |
| cosine_recall@5 | 0.4687 |
| cosine_recall@10 | 0.5568 |
| cosine_ndcg@10 | 0.4098 |
| cosine_mrr@10 | 0.3641 |
| cosine_map@100 | 0.3744 |
eval_strategy: epochper_device_train_batch_size: 16per_device_eval_batch_size: 16gradient_accumulation_steps: 16learning_rate: 2e-05num_train_epochs: 20lr_scheduler_type: cosinewarmup_ratio: 0.1fp16: Truetf32: Falseload_best_model_at_end: Trueoptim: adamw_torch_fusedbatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: epochprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 16eval_accumulation_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 20max_steps: -1lr_scheduler_type: cosinelr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Falselocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falsebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_64_cosine_map@100 | dim_768_cosine_map@100 |
|---|---|---|---|---|---|---|---|
| 0.6584 | 10 | 5.9441 | - | - | - | - | - |
| 0.9877 | 15 | - | 0.3686 | 0.3792 | 0.3819 | 0.3414 | 0.3795 |
| 1.3128 | 20 | 4.7953 | - | - | - | - | - |
| 1.9712 | 30 | 3.77 | 0.3854 | 0.3963 | 0.3962 | 0.3682 | 0.3995 |
| 2.6255 | 40 | 2.9211 | - | - | - | - | - |
| 2.9547 | 45 | - | 0.3866 | 0.3919 | 0.3958 | 0.3759 | 0.3963 |
| 3.2798 | 50 | 2.4548 | - | - | - | - | - |
| 3.9383 | 60 | 2.0513 | - | - | - | - | - |
| 4.0041 | 61 | - | 0.3808 | 0.4018 | 0.3980 | 0.3647 | 0.3962 |
| 4.5926 | 70 | 1.5898 | - | - | - | - | - |
| 4.9877 | 76 | - | 0.3829 | 0.4029 | 0.4035 | 0.3625 | 0.4014 |
| 5.2469 | 80 | 1.4677 | - | - | - | - | - |
| 5.9053 | 90 | 1.1974 | - | - | - | - | - |
| 5.9712 | 91 | - | 0.3918 | 0.4006 | 0.4041 | 0.3654 | 0.4033 |
| 6.5597 | 100 | 0.9285 | - | - | - | - | - |
| 6.9547 | 106 | - | 0.3914 | 0.4019 | 0.4033 | 0.3678 | 0.4014 |
| 7.2140 | 110 | 0.9214 | - | - | - | - | - |
| 7.8724 | 120 | 0.8141 | - | - | - | - | - |
| 8.0041 | 122 | - | 0.3914 | 0.3993 | 0.4071 | 0.3670 | 0.4027 |
| 8.5267 | 130 | 0.6706 | - | - | - | - | - |
| 8.9877 | 137 | - | 0.3903 | 0.4033 | 0.4060 | 0.3721 | 0.4060 |
| 9.1811 | 140 | 0.6388 | - | - | - | - | - |
| 9.8395 | 150 | 0.5466 | - | - | - | - | - |
| 9.9712 | 152 | - | 0.3915 | 0.4020 | 0.4079 | 0.3673 | 0.4046 |
| 10.4938 | 160 | 0.466 | - | - | - | - | - |
| 10.9547 | 167 | - | 0.3963 | 0.4069 | 0.4112 | 0.3697 | 0.4078 |
| 11.1481 | 170 | 0.4709 | - | - | - | - | - |
| 11.8066 | 180 | 0.437 | - | - | - | - | - |
| 12.0041 | 183 | - | 0.4003 | 0.4051 | 0.4096 | 0.3701 | 0.4059 |
| 12.4609 | 190 | 0.3678 | - | - | - | - | - |
| 12.9877 | 198 | - | 0.3976 | 0.4075 | 0.4088 | 0.3713 | 0.4080 |
| 13.1152 | 200 | 0.3944 | - | - | - | - | - |
| 13.7737 | 210 | 0.361 | - | - | - | - | - |
| 13.9712 | 213 | - | 0.3966 | 0.4091 | 0.4096 | 0.3724 | 0.4107 |
| 14.4280 | 220 | 0.2977 | - | - | - | - | - |
| 14.9547 | 228 | - | 0.3979 | 0.4102 | 0.4149 | 0.3744 | 0.4143 |
| 15.0823 | 230 | 0.3306 | - | - | - | - | - |
| 15.7407 | 240 | 0.3075 | - | - | - | - | - |
| 16.0041 | 244 | - | 0.3991 | 0.4102 | 0.4156 | 0.3726 | 0.4148 |
| 16.3951 | 250 | 0.2777 | - | - | - | - | - |
| 16.9877 | 259 | - | 0.3990 | 0.4101 | 0.4154 | 0.3743 | 0.4167 |
| 17.0494 | 260 | 0.3044 | - | - | - | - | - |
| 17.7078 | 270 | 0.2885 | - | - | - | - | - |
| 17.9712 | 274 | - | 0.3991 | 0.4099 | 0.4153 | 0.3746 | 0.4167 |
| 18.3621 | 280 | 0.2862 | - | - | - | - | - |
| 18.9547 | 289 | - | 0.3994 | 0.4105 | 0.4154 | 0.3743 | 0.4156 |
| 19.0165 | 290 | 0.2974 | - | - | - | - | - |
| 19.6749 | 300 | 0.2648 | 0.3994 | 0.4105 | 0.4154 | 0.3744 | 0.4160 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
BAAI/bge-base-en-v1.5