tanaos-NER-v1: A small but performant Named Entity Recognition model

This model was created by Tanaos with the Artifex Python library.

This is a multilingual (it supports 16+ languages) Named Entity Recognition model based on FacebookAI/roberta-base and fine-tuned on a synthetic dataset to recognize and classify entities in text into the following 14 entity categories:

Entity	Description
`PERSON`	Individual people, fictional characters
`ORG`	Companies, institutions, agencies
`LOCATION`	Geographical areas
`DATE`	Absolute or relative dates, including years, months and/or days
`TIME`	Specific time of the day
`PERCENT`	Percentage expressions
`NUMBER`	Numeric measurements or expressions
`FACILITY`	Buildings, airports, highways, etc.
`PRODUCT`	Objects, vehicles, food, etc. bearing a specific name
`WORK_OF_ART`	Titles of creative works
`LANGUAGE`	Natural or programming languages
`NORP`	National, religious or political groups
`ADDRESS`	Full addresses
`PHONE_NUMBER`	Telephone numbers

These entities were chosen to cover a wide range of common named entity types that are useful in various NLP applications, regardless of the specific application domain, in order to create a versatile and general-purpose Named Entity Recognition model, applicable across various industries and use cases.

How to Use

Via the Artifex library (`pip install artifex`)

from artifex import Artifex

ner = Artifex().named_entity_recognition

print(ner("John landed in Barcelona at 15:45."))
# >>> [{'entity_group': 'PERSON', 'score': np.float32(0.92174554), 'word': 'John', 'start': 0, 'end': 4}, {'entity_group': 'LOCATION', 'score': np.float32(0.9853817), 'word': ' Barcelona', 'start': 15, 'end': 24}, {'entity_group': 'TIME', 'score': np.float32(0.98645407), 'word': ' 15:45.', 'start': 28, 'end': 34}]

Via the Transformers library

from transformers import pipeline

ner = pipeline(
    task="token-classification",
    model="tanaos/tanaos-NER-v1",
    aggregation_strategy="first"
)

print(ner("John landed in Barcelona at 15:45."))
# >>> [{'entity_group': 'PERSON', 'score': np.float32(0.92174554), 'word': 'John', 'start': 0, 'end': 4}, {'entity_group': 'LOCATION', 'score': np.float32(0.9853817), 'word': ' Barcelona', 'start': 15, 'end': 24}, {'entity_group': 'TIME', 'score': np.float32(0.98645407), 'word': ' 15:45.', 'start': 28, 'end': 34}]

Model Description

Base model: FacebookAI/roberta-base
Task: Text classification (Named Entity Recognition)
Languages: Multilingual (16+ languages)
Fine-tuning data: A synthetic, custom dataset of around 10,000 passages, each containing multiple named entities across 14 categories.

Training Details

This model was trained using the Artifex Python library

pip install artifex

by providing the following instructions and generating 10,000 synthetic training samples:

from artifex import Artifex

ner = Artifex().named_entity_recognition

ner.train(
    named_entities={
        "PERSON": "Individual people, fictional characters",
        "ORG": "Companies, institutions, agencies",
        "LOCATION": "Geographical areas",
        "DATE": "Absolute or relative dates, including years, months and/or days",
        "TIME": "Specific time of the day",
        "PERCENT": "Percentage expressions",
        "NUMBER": "Numeric measurements or expressions",
        "FACILITY": "Buildings, airports, highways, etc.",
        "PRODUCT": "Objects, vehicles, food, etc. bearing a specific name",
        "WORK_OF_ART": "Titles of creative works",
        "LANGUAGE": "Natural or programming languages",
        "NORP": "National, religious or political groups",
        "ADDRESS": "full addresses",
        "PHONE_NUMBER": "telephone numbers",
    },
    domain="general",
    num_samples=10000
)

Intended Uses

This model is intended to:

Extract and classify named entities from text in a variety of applications, such as chatbots, information extraction systems, and data analysis tools.
Be used in multilingual contexts, supporting over 16 languages.
Serve as a general-purpose NER model applicable across various industries and use cases.

Not intended for:

Highly specialized domains requiring custom entity types not covered by the 14 categories in this model.
Idioms, slang, or very informal text where entity recognition may be less reliable.

Downloads last month: 423

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for tanaos/tanaos-NER-v1

Base model

FacebookAI/roberta-base

Finetuned

(2058)

this model

Finetunes

1 model

tanaos
/

tanaos-NER-v1

tanaos-NER-v1: A small but performant Named Entity Recognition model

How to Use

Via the Artifex library (`pip install artifex`)

Via the Transformers library

Model Description

Training Details

Intended Uses

Model tree for tanaos/tanaos-NER-v1

Dataset used to train tanaos/tanaos-NER-v1

tanaos-NER-v1: A small but performant Named Entity Recognition model

How to Use

Via the Artifex library (pip install artifex)

Via the Transformers library

Model Description

Training Details

Intended Uses

Model tree for tanaos/tanaos-NER-v1

Dataset used to train tanaos/tanaos-NER-v1

Via the Artifex library (`pip install artifex`)