Skip to content

Processors

Processors are meant as transformers for DL datasets that can be used by Dataset object to transform objects before returning a requested item.

CategoryProcessor

Bases: Processor

Create a vocabulary from training data and use it to numericalize categories/text.

Attributes:

Name Type Description
vocab Sequence[str]

Vocabulary used for numericalizing tokens.

otoi Dict[str, int]

Mapping of tokens to their integer indices.

__call__(items)

Create a vocabulary from items if it doesn't already exist and return their numerical IDs.

Parameters:

Name Type Description Default
items Sequence[str] | str

Data to numericalize.

required

Returns:

Type Description
list[int] | int

Numerical IDs of items.

__init__(vocab=None)

Parameters:

Name Type Description Default
vocab Sequence[str] | None

Vocabulary to use for numericalizing tokens.

None

deprocess(idxs)

Denumericalize item(s) by converting IDs to actual tokens.

Parameters:

Name Type Description Default
idxs Iterable[int] | int

IDs to denumricalize.

required

Returns:

Type Description
list[str] | str

Tokens that correspond for each ID.

process(items)

Numericalize item(s).

Parameters:

Name Type Description Default
items Iterable[str] | str

Items to numericalize.

required

Returns:

Type Description
list[int] | int

Numerical IDs of passed items.

Processor

Base class for all processors.