Contextual Embeddings

Contextual embeddings are dynamic word representations that change based on the context in which words appear. Unlike static word embeddings, they capture word meaning in context, addressing polysemy and contextual variations.

Core Concepts

Context-Dependent Representation

Words have different vectors in different contexts
Captures word sense disambiguation
Preserves contextual information

Types of Models

1. BERT (Bidirectional Encoder Representations from Transformers)

from transformers import BertModel, BertTokenizer

def get_bert_embeddings(text):
    tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
    model = BertModel.from_pretrained('bert-base-uncased')
    
    inputs = tokenizer(text, return_tensors="pt",
                      padding=True, truncation=True)
    outputs = model(**inputs)
    
    return outputs.last_hidden_state

2. GPT (Generative Pre-trained Transformer)

from transformers import GPT2Model, GPT2Tokenizer

def get_gpt_embeddings(text):
    tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
    model = GPT2Model.from_pretrained('gpt2')
    
    inputs = tokenizer(text, return_tensors="pt")
    outputs = model(**inputs)
    
    return outputs.last_hidden_state

3. RoBERTa

from transformers import RobertaModel, RobertaTokenizer

def get_roberta_embeddings(text):
    tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
    model = RobertaModel.from_pretrained('roberta-base')
    
    inputs = tokenizer(text, return_tensors="pt")
    outputs = model(**inputs)
    
    return outputs.last_hidden_state

Implementation

Basic Usage

class ContextualEmbedder:
    def __init__(self, model_name='bert-base-uncased'):
        self.tokenizer = BertTokenizer.from_pretrained(model_name)
        self.model = BertModel.from_pretrained(model_name)
        
    def get_embeddings(self, text, layer=-1):
        inputs = self.tokenizer(text,
                              return_tensors="pt",
                              padding=True,
                              truncation=True)
        
        outputs = self.model(**inputs)
        
        # Get specific layer embeddings
        if layer == -1:
            return outputs.last_hidden_state
        else:
            return outputs.hidden_states[layer]

Advanced Features

1. Layer Selection

Different layers capture different aspects:

def get_layer_embeddings(text, model, layer_nums):
    outputs = model(text, output_hidden_states=True)
    
    # Get specified layers
    selected_layers = [outputs.hidden_states[i] 
                      for i in layer_nums]
    
    return selected_layers

2. Subword Aggregation

def aggregate_subwords(tokens, embeddings):
    word_embeddings = []
    current_word = []
    
    for token, embedding in zip(tokens, embeddings):
        if token.startswith('##'):
            current_word.append(embedding)
        else:
            if current_word:
                word_embeddings.append(torch.mean(
                    torch.stack(current_word), dim=0))
            current_word = [embedding]
    
    return word_embeddings

Applications

Advanced NLP Tasks:
- Named Entity Recognition
- Question Answering
- Text Classification
Semantic Analysis:
- Word Sense Disambiguation
- Semantic Role Labeling
- Coreference Resolution

Best Practices

1. Model Selection

Consider task requirements
Balance size and performance
Evaluate domain compatibility

2. Fine-tuning

Adapt to specific domains
Use task-specific data
Monitor overfitting

3. Resource Management

Handle batch processing
Implement caching
Optimize memory usage

Implementation Example

class AdvancedContextualEmbedder:
    def __init__(self,
                 model_name='bert-base-uncased',
                 max_length=512):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModel.from_pretrained(model_name)
        self.max_length = max_length
        
    def get_contextual_embeddings(self,
                                text,
                                layers=None,
                                pooling='mean'):
        # Tokenize
        inputs = self.tokenizer(text,
                              return_tensors="pt",
                              padding=True,
                              truncation=True,
                              max_length=self.max_length)
        
        # Get embeddings
        outputs = self.model(**inputs,
                           output_hidden_states=True)
        
        # Select layers
        if layers is None:
            embeddings = outputs.last_hidden_state
        else:
            embeddings = torch.stack(
                [outputs.hidden_states[i] for i in layers]
            ).mean(dim=0)
        
        # Apply pooling
        if pooling == 'mean':
            return embeddings.mean(dim=1)
        elif pooling == 'max':
            return embeddings.max(dim=1)[0]
        
        return embeddings

Advantages

Context Awareness:
- Dynamic representations
- Word sense disambiguation
- Contextual understanding
Performance:
- State-of-the-art results
- Transfer learning
- Few-shot capabilities

Limitations

Computational Cost:
- Resource intensive
- Slower processing
- Memory requirements
Complexity:
- Model complexity
- Training difficulty
- Parameter tuning

Summary

Contextual embeddings represent a significant advancement in text representation, offering dynamic, context-aware word vectors. While they require more computational resources than static embeddings, their superior performance in capturing word meaning and context makes them invaluable for modern NLP applications.

PreviousEmbeddings

NextCore Tasks

Getting Started

Math

Machine Learning

Deep Learning

Natural Language Processing

Reinforcement Learning

References