You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

SentenceTransformer

This is a sentence-transformers model trained on the parquet dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Supported Modality: Text
  • Training Dataset:
    • parquet

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'EuroBertModel'})
  (1): Pooling({'embedding_dimension': 768, 'pooling_mode': 'lasttoken', 'include_prompt': True})
  (2): Normalize({})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("TeraflopAI/teraflopai-jina-nano-legal")
# Run inference
queries = [
    "Under what circumstances can a plaintiff recover damages for lost profits when a defendant's breach of contract involves the failure to provide telephone service?",
]
documents = [
    'UNITED STATES COURT OF APPEALS\n                            UNITED STATES COURT OF APPEALS\n                    FOR THE FIRST CIRCUIT\n                                FOR THE FIRST CIRCUIT\n                                         \nNo. 94-1711\n\n                  SAS OF PUERTO RICO, INC.,\n\n                    Plaintiff, Appellant,\n\n                              v.\n\n                PUERTO RICO TELEPHONE COMPANY,\n\n                     Defendant, Appellee. \n\n                                         \n\n         APPEAL FROM THE UNITED STATES DISTRICT COURT\n\n               FOR THE DISTRICT OF PUERTO RICO\n\n        [Hon. Jose Antonio Fuste, U.S. District Judge]\n                                                                 \n\n                                         \n\n                            Before\n\n                    Torruella, Chief Judge,\n                                                      \n\n                    Boudin, Circuit Judge,\n                                                     \n\n              and Boyle,* Senior District Judge.\n                                                           \n\n                                         \n\nLaurence  Z.  Shiekman  with  whom  M.  Duncan  Grant,  Frank   M.\n                                                                              \nRapoport,  Michael A. Ceramella and Pepper, Hamilton & Scheetz were on\n                                                                      \nbrief for appellant.\nPhilip J. Mause with whom Joaquin A. Marquez and Drinker Biddle  &\n                                                                              \nReath were on brief for appellee.\n             \n\n                                         \n\n                      February 21, 1995\n                                         \n\n                \n\n*Of the District of Rhode Island, sitting by designation.',
    'Nor did the BIA abuse its discretion by denying Chen\'s motion to reopen, which alleged that he suffered from the ineffective assistance of counsel. To prevail on such a claim, the alien must first comply with certain procedures set forth in Matter of Lozada, 19 I. & N. Dec. 637 (BIA 1988). Here, the BIA properly noted that besides filing a supporting affidavit, Chen made no effort to comply with the requirements enumerated in Lozada. Chen not only failed to notify his former counsel of the allegations of ineffective assistance and to allow him an opportunity to respond, he also failed to file a complaint with a disciplinary authority or provide an explanation for not doing so. See Twum v. INS, 411 F.3d 54, 59 (2d Cir.2005) (citing Lozada, 19 I. & N. Dec. at 639). \n\nBy failing to substantially comply with Lozada, Chen "forfeit[ed][his] ineffective assistance of counsel claim." Jian Yun Zheng v. U.S. Dep\'t of Justice, 409 F.3d 43, 47 (2d Cir.2005). While it is true that "slavish adherence" to Lozada\'s requirements is not necessary in certain circumstances, and while the BIA acknowledged in its June 2006 decision that the brief written by Chen\'s former counsel was deficient, this is not a case in which the facts supporting a "claim of ineffective assistance are clear on the face of the record," which may excuse the failure to comply with Lozada. Yi Long Yang, 478 F.3d at 142-43. The facts here are distinct from the circumstances presented in Yi Long Yang, in that Chen\'s former counsel was not disbarred, nor was there evidence that the agency explicitly assumed his competence. See id. at 142.',
    'We believe the same prerequisite should operate in this\ncase. The requirement that parties seeking Rule 60(b) relief\nshow some prospect of succeeding on the merits flows from\nthe basic principle that courts should revive previously-\ndismissed claims only if they have some reason to believe that\ndoing so will not ultimately waste judicial resources. See\nMurray,\n52 F.3d at 355\n. This principle holds true here:\nreviving Thomas\'s appeal will constitute an "empty exercise\nor futile gesture,"\nid.,\n unless Thomas has some possibility of\nprevailing. \n\n       Indeed, we see two especially good reasons to condition\nthe grant of Thomas\'s motion for reconsideration on his\ndemonstrating a chance of succeeding on the merits. First,\nThomas claims that his appeal should be reinstated because\nthe PLRA\'s three-strikes provision is unconstitutional as\napplied to him. For this court to reach out and decide this\ndifficult and important question simply to reinstate a pointless\nappeal would violate the norm of constitutional avoidance to\nwhich we generally adhere. See Kalka v. Hawk,\n215 F.3d 90, 97\n (D.C. Cir. 2000) ("Federal courts should not decide\nconstitutional questions unless it is necessary to do so.").\nSecond, the PLRA provides that a court "shall dismiss" an\nIFP litigant\'s case if the "appeal . . . is frivolous or malicious\n. . . [or] fails to state a claim on which relief may be granted."\n28 U.S.C. § 1915\n(e)(2). Thus, even were we to grant Thomas\nIFP status and reinstate his appeal, we would then have to\n\x0c                                7\npromptly dismiss the case if his claims lack merit. What could\nbe a more "futile gesture" than reinstating an appeal only to\nthen immediately dismiss it?',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.6921, -0.0759,  0.0219]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.8919
cosine_accuracy@3 0.9569
cosine_accuracy@5 0.9727
cosine_accuracy@10 0.9853
cosine_precision@1 0.8919
cosine_precision@3 0.319
cosine_precision@5 0.1945
cosine_precision@10 0.0985
cosine_recall@1 0.8919
cosine_recall@3 0.9569
cosine_recall@5 0.9727
cosine_recall@10 0.9853
cosine_ndcg@10 0.9413
cosine_mrr@10 0.9269
cosine_map@100 0.9276

Training Details

Training Dataset

parquet

  • Dataset: parquet

  • Size: 36,118,859 training samples

  • Columns: question and answer

  • Approximate statistics based on the first 1000 samples:

    question answer
    type string string
    details
    • min: 15 tokens
    • mean: 28.74 tokens
    • max: 50 tokens
    • min: 40 tokens
    • mean: 308.21 tokens
    • max: 512 tokens
  • Samples:

    question answer
    What is the legal standard and procedure for granting a defendant's request for leave to withdraw as counsel when the attorney certifies that no nonfrivolous issues exist for appeal? Appeal by the defendant from two judgments of the Supreme Court, Queens County (Rosengarten, J.), both rendered May 5, 2003, convicting him of burglary in the first degree, robbery in the first degree, and burglary in the second degree under indictment No. 3417/01, and burglary in the first degree, robbery in the first degree, and burglary in the second degree under indictment No. 1182/02, upon his pleas of guilty, and imposing sentences.

    Ordered that the judgments are affirmed.

    We have reviewed the record and agree with the defendant's assigned counsel that there are no nonfrivolous issues which could be raised on appeal. Counsel's application for leave to withdraw as counsel is granted (see Anders v California, 386 US 738 [1967]; People v Paige, 54 AD2d 631 [1976]; cf. People v *606Gonzalez, 47 NY2d 606 [1979]). Adams, J.P., Cozier, Ritter and Skelos, JJ., concur.
    Are state-law tort claims alleging defective labeling of generic drugs preempted by federal law? ORDER

    JOSEPH N. LAPLANTE, District Judge.

    This case presents a question currently pending before three different federal courts of appeal: whether state-law tort claims alleging the defective labeling of generic drugs are preempted by federal law. See Morris v. Wyeth, Inc., No. 09-5509 (6th Cir. Apr. 27, 2009); Demahy v. Wyeth, Inc., No. 08-31204 (5th Cir. Dec. 16, 2008); Mensing v. Wyeth, Inc., No. 08-3850 (8th Cir. Dec. 10, 2008). The defendants, Mutual Pharmaceutical Company, Inc. and United Research Laboratories, Inc., move for judgment on the pleadings, see Fed.R.Civ.P. 12(c), on claims by the plaintiffs, Karen L. and Gregory S. Bartlett, alleging that Karen suffered serious injuries from Sulindac, a generic drug manufactured by the defendants. The defendants argue that all of the plaintiffs' state-law causes of action are pre-empted by Title I of the Drug Price Competition and Patent Term Restoration Act of 1984, 1 part of the Hatch-Waxman Amendments to the Federal Food, Drug,... | | Under what circumstances can a trial court's finding of competency to stand trial be challenged when multiple expert evaluations conclude the defendant is competent but exhibit bizarre conduct? | Prior to trial, Card was examined by two court-appointed psychologists for the purpose of determining whether he was competent to stand trial. Following examinations, both psychologists concluded that Card was competent to stand trial pursuant to the criteria set forth in Rule 3.211, Florida Rule of Criminal Procedure. After the initial reports of the two court-appointed *1175 experts were filed, the defense filed a motion for the appointment of a forensic psychiatrist to examine Card. The court acquiesced to this request. Although the forensic psychiatrist did not file his report with the court until a few months after the court issued its order finding Card competent to stand trial, the forensic psychiatrist also concluded that Card was competent. Further, although the various reports filed by the experts indicate bizarre conduct and behavioral problems, the trial court was never presented with evidence providing reasonable grounds to believe that Card was not competent to stand tria... |

  • Loss: MatryoshkaLoss with these parameters:

    {
        "loss": "CachedMultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Evaluation Dataset

parquet

  • Dataset: parquet
  • Size: 10,000 evaluation samples
  • Columns: question and answer
  • Approximate statistics based on the first 1000 samples:
    question answer
    type string string
    details
    • min: 14 tokens
    • mean: 28.88 tokens
    • max: 52 tokens
    • min: 67 tokens
    • mean: 312.84 tokens
    • max: 512 tokens
  • Samples:
    question answer
    What specific factors do Texas courts consider when determining if terminating a parent's rights serves the child's best interest? In determining whether termination is in the child's best interest, we apply the following factors laid out in Holley v. Adams, 544 S.W.2d 367, 371–72 (Tex. 1976). Those factors include, but are not limited to:

              1.       The child's desires;

     

              2.       The child's physical and emotional needs, now and in the future;

     

              3.       The emotional and physical danger to the child, now and in the future;

     

              4.       The parental ability of the individuals seeking custody;

     

              5.       The programs available to assist these individuals in promoting the child's best interest;

     

              6.       The plans for the child by the individual or agency seeking custody;

     

              7.       The stability of the home or proposed placement;

     

              8.       The parent's act or omissions that may indicate the existing parent-child relationship is not the proper one; and

     

              9.       Any excuse for the parent's acts or omissions.
    Under what circumstances do separate criminal acts fail to constitute a single continuous transaction for the purpose of admitting evidence of one act to prove another? ¶37 The case before us is far more analogous to Hildreth than to the others. Gallegos stabbed Victim in a park and was later apprehended. Then, while at the police station, Gallegos acted violently, resulting in additional charges. Gallegos's violent behavior at the police station did not "facilitate[ ] flight" from the earlier attack, nor could the later crimes be characterized as "a single [violent] spree," as we would characterize a string of robberies, for example. See Benson , 2014 UT App 92 , ¶¶ 13-14, 325 P.3d 855 . Neither do Gallegos's crimes demonstrate "a distinct behavioral arc of increasingly aggressive and opportunistic transgressions." Burke , 2011 UT App 168 , ¶ 24, 256 P.3d 1102 . Instead, this case is more like Hildreth , where the defendant committed a sequence of offenses, but those offenses were not otherwise related to each other. See 2010 UT App 209 , ¶ 32, 238 P.3d 444 . Here, the stabbing at the park and the violent behavior at the police station are so indepen...
    What level of culpability, such as actual knowledge or reckless disregard, must a plaintiff prove to establish an Eighth Amendment violation for deliberate indifference? Wilson v. Seiter, ___ U.S. at -, ___, 111 S.Ct. at 2324-25, 2327.
    The Seventh Circuit recently observed that "[i]n order to show `deliberate indifference,' a plaintiff is required to prove that the prison official's action was deliberate or reckless in the criminal sense." Santiago v. Lane, 894 F.2d 218 (7th Cir.1990) (emphasis added) (footnote omitted). The United States Supreme Court has cited the Seventh Circuit's criminal recklessness standard with approval. Whitley v. Albers, 475 U.S. 312, 321, 106 S.Ct. 1078, 1085, 89 L.Ed.2d 251 (1986), citing Duckworth v. Franzen, 780 F.2d 645, 653 (7th Cir.1985), cert. denied, 479 U.S. 816, 107 S.Ct. 71, 93 L.Ed.2d 28 (1986). In Franzen, the Seventh Circuit noted that punishment under the Eighth Amendment "implies at a minimum actual knowledge of impending harm easily preventable, so that a conscious, culpable refusal to prevent the harm can be inferred from the defendant's failure to prevent it." 780 F.2d at 653. See also Wilks v. You... |
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "CachedMultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 2048
  • num_train_epochs: 1
  • lr_scheduler_type: cosine
  • warmup_steps: 0.1
  • bf16: True
  • per_device_eval_batch_size: 512
  • prompts: {'question': 'Query: ', 'answer': 'Document: '}
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 2048
  • num_train_epochs: 1
  • max_steps: -1
  • learning_rate: 5e-05
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: None
  • warmup_steps: 0.1
  • optim: adamw_torch_fused
  • optim_args: None
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 1
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: True
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: False
  • project: huggingface
  • trackio_space_id: None
  • trackio_bucket_id: None
  • trackio_static_space_id: None
  • per_device_eval_batch_size: 512
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: False
  • hub_private_repo: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: None
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: True
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_static_graph: None
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: {'question': 'Query: ', 'answer': 'Document: '}
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss jina-nano_lr5e-5_warmup0.1_bs8k-caselaw_cosine_ndcg@10
0.0023 10 5.4249 - -
0.0045 20 5.3218 - -
0.0068 30 4.9505 - -
0.0091 40 4.4023 - -
0.0113 50 3.3952 - -
0.0136 60 2.6740 - -
0.0159 70 2.1146 - -
0.0181 80 1.8614 - -
0.0204 90 1.6447 - -
0.0227 100 1.5237 - -
0.0249 110 1.4115 - -
0.0272 120 1.3367 - -
0.0295 130 1.2759 - -
0.0318 140 1.2270 - -
0.0340 150 1.1613 - -
0.0363 160 1.1364 - -
0.0386 170 1.0903 - -
0.0408 180 1.0284 - -
0.0431 190 1.0257 - -
0.0454 200 0.9858 - -
0.0476 210 0.9542 - -
0.0499 220 0.9389 - -
0.0522 230 0.9077 - -
0.0544 240 0.8852 - -
0.0567 250 0.8622 - -
0.0590 260 0.8439 - -
0.0612 270 0.8217 - -
0.0635 280 0.8145 - -
0.0658 290 0.8084 - -
0.0680 300 0.7711 - -
0.0703 310 0.7579 - -
0.0726 320 0.7618 - -
0.0748 330 0.7377 - -
0.0771 340 0.7256 - -
0.0794 350 0.7078 - -
0.0817 360 0.7000 - -
0.0839 370 0.6873 - -
0.0862 380 0.6894 - -
0.0885 390 0.6586 - -
0.0907 400 0.6632 - -
0.0930 410 0.6472 - -
0.0953 420 0.6528 - -
0.0975 430 0.6439 - -
0.0998 440 0.6319 - -
0.1021 450 0.6030 - -
0.1043 460 0.6269 - -
0.1066 470 0.6085 - -
0.1089 480 0.5979 - -
0.1111 490 0.5931 - -
0.1134 500 0.5888 - -
0.1157 510 0.5724 - -
0.1179 520 0.5808 - -
0.1202 530 0.5661 - -
0.1225 540 0.5597 - -
0.1247 550 0.5463 - -
0.1270 560 0.5348 - -
0.1293 570 0.5364 - -
0.1315 580 0.5389 - -
0.1338 590 0.5351 - -
0.1361 600 0.5199 - -
0.1384 610 0.5188 - -
0.1406 620 0.5135 - -
0.1429 630 0.5083 - -
0.1452 640 0.4999 - -
0.1474 650 0.4986 - -
0.1497 660 0.4980 - -
0.1520 670 0.4951 - -
0.1542 680 0.4930 - -
0.1565 690 0.4842 - -
0.1588 700 0.4848 - -
0.1610 710 0.4835 - -
0.1633 720 0.4846 - -
0.1656 730 0.4760 - -
0.1678 740 0.4719 - -
0.1701 750 0.4746 - -
0.1724 760 0.4576 - -
0.1746 770 0.4709 - -
0.1769 780 0.4430 - -
0.1792 790 0.4496 - -
0.1814 800 0.4570 - -
0.1837 810 0.4452 - -
0.1860 820 0.4495 - -
0.1883 830 0.4468 - -
0.1905 840 0.4468 - -
0.1928 850 0.4446 - -
0.1951 860 0.4372 - -
0.1973 870 0.4376 - -
0.1996 880 0.4287 - -
0.2019 890 0.4268 - -
0.2041 900 0.4239 - -
0.2064 910 0.4274 - -
0.2087 920 0.4234 - -
0.2109 930 0.4190 - -
0.2132 940 0.4260 - -
0.2155 950 0.4247 - -
0.2177 960 0.4137 - -
0.2200 970 0.4141 - -
0.2223 980 0.4114 - -
0.2245 990 0.4137 - -
0.2268 1000 0.4088 - -
0.2291 1010 0.3989 - -
0.2313 1020 0.4162 - -
0.2336 1030 0.4048 - -
0.2359 1040 0.4026 - -
0.2381 1050 0.4044 - -
0.2404 1060 0.4017 - -
0.2427 1070 0.4064 - -
0.2450 1080 0.3970 - -
0.2472 1090 0.3985 - -
0.2495 1100 0.3884 - -
0.2518 1110 0.3841 - -
0.2540 1120 0.3960 - -
0.2563 1130 0.3964 - -
0.2586 1140 0.3910 - -
0.2608 1150 0.3892 - -
0.2631 1160 0.3839 - -
0.2654 1170 0.3786 - -
0.2676 1180 0.3800 - -
0.2699 1190 0.3865 - -
0.2722 1200 0.3812 - -
0.2744 1210 0.3748 - -
0.2767 1220 0.3823 - -
0.2790 1230 0.3771 - -
0.2812 1240 0.3732 - -
0.2835 1250 0.3707 - -
0.2858 1260 0.3709 - -
0.2880 1270 0.3689 - -
0.2903 1280 0.3793 - -
0.2926 1290 0.3704 - -
0.2949 1300 0.3697 - -
0.2971 1310 0.3647 - -
0.2994 1320 0.3687 - -
0.3001 1323 - 0.1011 0.9300
0.3017 1330 0.3662 - -
0.3039 1340 0.3656 - -
0.3062 1350 0.3626 - -
0.3085 1360 0.3625 - -
0.3107 1370 0.3663 - -
0.3130 1380 0.3525 - -
0.3153 1390 0.3496 - -
0.3175 1400 0.3588 - -
0.3198 1410 0.3579 - -
0.3221 1420 0.3487 - -
0.3243 1430 0.3537 - -
0.3266 1440 0.3521 - -
0.3289 1450 0.3512 - -
0.3311 1460 0.3576 - -
0.3334 1470 0.3510 - -
0.3357 1480 0.3467 - -
0.3379 1490 0.3488 - -
0.3402 1500 0.3410 - -
0.3425 1510 0.3425 - -
0.3447 1520 0.3558 - -
0.3470 1530 0.3483 - -
0.3493 1540 0.3490 - -
0.3516 1550 0.3435 - -
0.3538 1560 0.3420 - -
0.3561 1570 0.3363 - -
0.3584 1580 0.3470 - -
0.3606 1590 0.3415 - -
0.3629 1600 0.3491 - -
0.3652 1610 0.3386 - -
0.3674 1620 0.3389 - -
0.3697 1630 0.3334 - -
0.3720 1640 0.3403 - -
0.3742 1650 0.3330 - -
0.3765 1660 0.3359 - -
0.3788 1670 0.3319 - -
0.3810 1680 0.3359 - -
0.3833 1690 0.3354 - -
0.3856 1700 0.3308 - -
0.3878 1710 0.3359 - -
0.3901 1720 0.3312 - -
0.3924 1730 0.3374 - -
0.3946 1740 0.3260 - -
0.3969 1750 0.3307 - -
0.3992 1760 0.3269 - -
0.4015 1770 0.3285 - -
0.4037 1780 0.3296 - -
0.4060 1790 0.3332 - -
0.4083 1800 0.3350 - -
0.4105 1810 0.3291 - -
0.4128 1820 0.3253 - -
0.4151 1830 0.3256 - -
0.4173 1840 0.3274 - -
0.4196 1850 0.3200 - -
0.4219 1860 0.3147 - -
0.4241 1870 0.3221 - -
0.4264 1880 0.3194 - -
0.4287 1890 0.3250 - -
0.4309 1900 0.3249 - -
0.4332 1910 0.3159 - -
0.4355 1920 0.3291 - -
0.4377 1930 0.3179 - -
0.4400 1940 0.3256 - -
0.4423 1950 0.3223 - -
0.4445 1960 0.3182 - -
0.4468 1970 0.3120 - -
0.4491 1980 0.3151 - -
0.4513 1990 0.3162 - -
0.4536 2000 0.3090 - -
0.4559 2010 0.3098 - -
0.4582 2020 0.3136 - -
0.4604 2030 0.3171 - -
0.4627 2040 0.3138 - -
0.4650 2050 0.3142 - -
0.4672 2060 0.3118 - -
0.4695 2070 0.3207 - -
0.4718 2080 0.3166 - -
0.4740 2090 0.3205 - -
0.4763 2100 0.3201 - -
0.4786 2110 0.3060 - -
0.4808 2120 0.3158 - -
0.4831 2130 0.3118 - -
0.4854 2140 0.3107 - -
0.4876 2150 0.3037 - -
0.4899 2160 0.3071 - -
0.4922 2170 0.3126 - -
0.4944 2180 0.3108 - -
0.4967 2190 0.3061 - -
0.4990 2200 0.3023 - -
0.5012 2210 0.3134 - -
0.5035 2220 0.3065 - -
0.5058 2230 0.3100 - -
0.5081 2240 0.3067 - -
0.5103 2250 0.3087 - -
0.5126 2260 0.3021 - -
0.5149 2270 0.3022 - -
0.5171 2280 0.3109 - -
0.5194 2290 0.3072 - -
0.5217 2300 0.3043 - -
0.5239 2310 0.2994 - -
0.5262 2320 0.3024 - -
0.5285 2330 0.3041 - -
0.5307 2340 0.3021 - -
0.5330 2350 0.2989 - -
0.5353 2360 0.3041 - -
0.5375 2370 0.2980 - -
0.5398 2380 0.2995 - -
0.5421 2390 0.2961 - -
0.5443 2400 0.2929 - -
0.5466 2410 0.2972 - -
0.5489 2420 0.3008 - -
0.5511 2430 0.3028 - -
0.5534 2440 0.2995 - -
0.5557 2450 0.3066 - -
0.5579 2460 0.3032 - -
0.5602 2470 0.2961 - -
0.5625 2480 0.2957 - -
0.5648 2490 0.3023 - -
0.5670 2500 0.3043 - -
0.5693 2510 0.3009 - -
0.5716 2520 0.2968 - -
0.5738 2530 0.2969 - -
0.5761 2540 0.2882 - -
0.5784 2550 0.2956 - -
0.5806 2560 0.2983 - -
0.5829 2570 0.2951 - -
0.5852 2580 0.2885 - -
0.5874 2590 0.2977 - -
0.5897 2600 0.2970 - -
0.5920 2610 0.2925 - -
0.5942 2620 0.2835 - -
0.5965 2630 0.2973 - -
0.5988 2640 0.2915 - -
0.6001 2646 - 0.0793 0.9393
0.6010 2650 0.2957 - -
0.6033 2660 0.2978 - -
0.6056 2670 0.2961 - -
0.6078 2680 0.2990 - -
0.6101 2690 0.2980 - -
0.6124 2700 0.2937 - -
0.6147 2710 0.2954 - -
0.6169 2720 0.2903 - -
0.6192 2730 0.2992 - -
0.6215 2740 0.2894 - -
0.6237 2750 0.2895 - -
0.6260 2760 0.2905 - -
0.6283 2770 0.2907 - -
0.6305 2780 0.2919 - -
0.6328 2790 0.2821 - -
0.6351 2800 0.2964 - -
0.6373 2810 0.2917 - -
0.6396 2820 0.2910 - -
0.6419 2830 0.2914 - -
0.6441 2840 0.2965 - -
0.6464 2850 0.2905 - -
0.6487 2860 0.2951 - -
0.6509 2870 0.2909 - -
0.6532 2880 0.2849 - -
0.6555 2890 0.2879 - -
0.6577 2900 0.2895 - -
0.6600 2910 0.2921 - -
0.6623 2920 0.2875 - -
0.6645 2930 0.2876 - -
0.6668 2940 0.2953 - -
0.6691 2950 0.2885 - -
0.6714 2960 0.2860 - -
0.6736 2970 0.2850 - -
0.6759 2980 0.2943 - -
0.6782 2990 0.2877 - -
0.6804 3000 0.2856 - -
0.6827 3010 0.2951 - -
0.6850 3020 0.2856 - -
0.6872 3030 0.2945 - -
0.6895 3040 0.2860 - -
0.6918 3050 0.2917 - -
0.6940 3060 0.2905 - -
0.6963 3070 0.2978 - -
0.6986 3080 0.2915 - -
0.7008 3090 0.2860 - -
0.7031 3100 0.2864 - -
0.7054 3110 0.2920 - -
0.7076 3120 0.2813 - -
0.7099 3130 0.2853 - -
0.7122 3140 0.2827 - -
0.7144 3150 0.2865 - -
0.7167 3160 0.2829 - -
0.7190 3170 0.2816 - -
0.7213 3180 0.2835 - -
0.7235 3190 0.2866 - -
0.7258 3200 0.2846 - -
0.7281 3210 0.2835 - -
0.7303 3220 0.2874 - -
0.7326 3230 0.2808 - -
0.7349 3240 0.2898 - -
0.7371 3250 0.2850 - -
0.7394 3260 0.2912 - -
0.7417 3270 0.2878 - -
0.7439 3280 0.2880 - -
0.7462 3290 0.2932 - -
0.7485 3300 0.2818 - -
0.7507 3310 0.2840 - -
0.7530 3320 0.2833 - -
0.7553 3330 0.2868 - -
0.7575 3340 0.2871 - -
0.7598 3350 0.2825 - -
0.7621 3360 0.2914 - -
0.7643 3370 0.2899 - -
0.7666 3380 0.2845 - -
0.7689 3390 0.2875 - -
0.7711 3400 0.2921 - -
0.7734 3410 0.2823 - -
0.7757 3420 0.2869 - -
0.7780 3430 0.2909 - -
0.7802 3440 0.2818 - -
0.7825 3450 0.2902 - -
0.7848 3460 0.2845 - -
0.7870 3470 0.2838 - -
0.7893 3480 0.2889 - -
0.7916 3490 0.2831 - -
0.7938 3500 0.2898 - -
0.7961 3510 0.2846 - -
0.7984 3520 0.2904 - -
0.8006 3530 0.2870 - -
0.8029 3540 0.2733 - -
0.8052 3550 0.2823 - -
0.8074 3560 0.2805 - -
0.8097 3570 0.2847 - -
0.8120 3580 0.2785 - -
0.8142 3590 0.2895 - -
0.8165 3600 0.2793 - -
0.8188 3610 0.2875 - -
0.8210 3620 0.2883 - -
0.8233 3630 0.2839 - -
0.8256 3640 0.2833 - -
0.8279 3650 0.2841 - -
0.8301 3660 0.2867 - -
0.8324 3670 0.2882 - -
0.8347 3680 0.2772 - -
0.8369 3690 0.2921 - -
0.8392 3700 0.2866 - -
0.8415 3710 0.2773 - -
0.8437 3720 0.2826 - -
0.8460 3730 0.2802 - -
0.8483 3740 0.2847 - -
0.8505 3750 0.2828 - -
0.8528 3760 0.2926 - -
0.8551 3770 0.2869 - -
0.8573 3780 0.2807 - -
0.8596 3790 0.2897 - -
0.8619 3800 0.2833 - -
0.8641 3810 0.2856 - -
0.8664 3820 0.2880 - -
0.8687 3830 0.2913 - -
0.8709 3840 0.2857 - -
0.8732 3850 0.2864 - -
0.8755 3860 0.2796 - -
0.8778 3870 0.2796 - -
0.8800 3880 0.2850 - -
0.8823 3890 0.2863 - -
0.8846 3900 0.2912 - -
0.8868 3910 0.2837 - -
0.8891 3920 0.2907 - -
0.8914 3930 0.2849 - -
0.8936 3940 0.2822 - -
0.8959 3950 0.2839 - -
0.8982 3960 0.2810 - -
0.9002 3969 - 0.0754 0.9412
0.9004 3970 0.2905 - -
0.9027 3980 0.2904 - -
0.9050 3990 0.2768 - -
0.9072 4000 0.2836 - -
0.9095 4010 0.2885 - -
0.9118 4020 0.2873 - -
0.9140 4030 0.2857 - -
0.9163 4040 0.2850 - -
0.9186 4050 0.2828 - -
0.9208 4060 0.2848 - -
0.9231 4070 0.2858 - -
0.9254 4080 0.2841 - -
0.9276 4090 0.2792 - -
0.9299 4100 0.2784 - -
0.9322 4110 0.2845 - -
0.9345 4120 0.2869 - -
0.9367 4130 0.2810 - -
0.9390 4140 0.2841 - -
0.9413 4150 0.2788 - -
0.9435 4160 0.2938 - -
0.9458 4170 0.2870 - -
0.9481 4180 0.2785 - -
0.9503 4190 0.2795 - -
0.9526 4200 0.2863 - -
0.9549 4210 0.2869 - -
0.9571 4220 0.2815 - -
0.9594 4230 0.2842 - -
0.9617 4240 0.2828 - -
0.9639 4250 0.2822 - -
0.9662 4260 0.2806 - -
0.9685 4270 0.2820 - -
0.9707 4280 0.2858 - -
0.9730 4290 0.2810 - -
0.9753 4300 0.2864 - -
0.9775 4310 0.2790 - -
0.9798 4320 0.2803 - -
0.9821 4330 0.2815 - -
0.9844 4340 0.2767 - -
0.9866 4350 0.2899 - -
0.9889 4360 0.2839 - -
0.9912 4370 0.2831 - -
0.9934 4380 0.2911 - -
0.9957 4390 0.2913 - -
0.9980 4400 0.2834 - -
1.0 4409 - 0.0753 0.9413
-1 -1 - - 0.9413

Training Time

  • Training: 11.6 hours
  • Evaluation: 21.5 minutes
  • Total: 11.9 hours

Framework Versions

  • Python: 3.12.13
  • Sentence Transformers: 5.4.1
  • Transformers: 5.8.0
  • PyTorch: 2.11.0+cu130
  • Accelerate: 1.13.0
  • Datasets: 4.8.5
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
10
Safetensors
Model size
0.2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including TeraflopAI/teraflopai-jina-nano-caselaw

Papers for TeraflopAI/teraflopai-jina-nano-caselaw

Evaluation results

  • Cosine Accuracy@1 on jina nano lr5e 5 warmup0.1 bs8k caselaw
    self-reported
    0.892
  • Cosine Accuracy@3 on jina nano lr5e 5 warmup0.1 bs8k caselaw
    self-reported
    0.957
  • Cosine Accuracy@5 on jina nano lr5e 5 warmup0.1 bs8k caselaw
    self-reported
    0.973
  • Cosine Accuracy@10 on jina nano lr5e 5 warmup0.1 bs8k caselaw
    self-reported
    0.985
  • Cosine Precision@1 on jina nano lr5e 5 warmup0.1 bs8k caselaw
    self-reported
    0.892
  • Cosine Precision@3 on jina nano lr5e 5 warmup0.1 bs8k caselaw
    self-reported
    0.319
  • Cosine Precision@5 on jina nano lr5e 5 warmup0.1 bs8k caselaw
    self-reported
    0.195
  • Cosine Precision@10 on jina nano lr5e 5 warmup0.1 bs8k caselaw
    self-reported
    0.099