Processors
Processors are meant as transformers for DL datasets that can be used by Dataset object to transform objects before returning a requested item.
CategoryProcessor
Bases: Processor
Create a vocabulary from training data and use it to numericalize categories/text.
Attributes:
Name | Type | Description |
---|---|---|
vocab |
Sequence[str]
|
Vocabulary used for numericalizing tokens. |
otoi |
Dict[str, int]
|
Mapping of tokens to their integer indices. |
__call__(items)
Create a vocabulary from items if it doesn't already exist and return their numerical IDs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
items |
Sequence[str] | str
|
Data to numericalize. |
required |
Returns:
Type | Description |
---|---|
list[int] | int
|
Numerical IDs of items. |
__init__(vocab=None)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vocab |
Sequence[str] | None
|
Vocabulary to use for numericalizing tokens. |
None
|
deprocess(idxs)
Denumericalize item(s) by converting IDs to actual tokens.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
idxs |
Iterable[int] | int
|
IDs to denumricalize. |
required |
Returns:
Type | Description |
---|---|
list[str] | str
|
Tokens that correspond for each ID. |
process(items)
Numericalize item(s).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
items |
Iterable[str] | str
|
Items to numericalize. |
required |
Returns:
Type | Description |
---|---|
list[int] | int
|
Numerical IDs of passed items. |
Processor
Base class for all processors.