Contextual Embeddings
Contextual embeddings are dynamic word representations that change based on the context in which words appear. Unlike static word embeddings, they capture word meaning in context, addressing polysemy and contextual variations.
Core Concepts
Context-Dependent Representation
- Words have different vectors in different contexts
- Captures word sense disambiguation
- Preserves contextual information
Types of Models
1. BERT (Bidirectional Encoder Representations from Transformers)
from transformers import BertModel, BertTokenizer
def get_bert_embeddings(text):
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
inputs = tokenizer(text, return_tensors="pt",
padding=True, truncation=True)
outputs = model(**inputs)
return outputs.last_hidden_state
2. GPT (Generative Pre-trained Transformer)
from transformers import GPT2Model, GPT2Tokenizer
def get_gpt_embeddings(text):
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2Model.from_pretrained('gpt2')
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
return outputs.last_hidden_state
3. RoBERTa
from transformers import RobertaModel, RobertaTokenizer
def get_roberta_embeddings(text):
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
model = RobertaModel.from_pretrained('roberta-base')
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
return outputs.last_hidden_state
Implementation
Basic Usage
class ContextualEmbedder:
def __init__(self, model_name='bert-base-uncased'):
self.tokenizer = BertTokenizer.from_pretrained(model_name)
self.model = BertModel.from_pretrained(model_name)
def get_embeddings(self, text, layer=-1):
inputs = self.tokenizer(text,
return_tensors="pt",
padding=True,
truncation=True)
outputs = self.model(**inputs)
# Get specific layer embeddings
if layer == -1:
return outputs.last_hidden_state
else:
return outputs.hidden_states[layer]
Advanced Features
1. Layer Selection
Different layers capture different aspects:
def get_layer_embeddings(text, model, layer_nums):
outputs = model(text, output_hidden_states=True)
# Get specified layers
selected_layers = [outputs.hidden_states[i]
for i in layer_nums]
return selected_layers
2. Subword Aggregation
def aggregate_subwords(tokens, embeddings):
word_embeddings = []
current_word = []
for token, embedding in zip(tokens, embeddings):
if token.startswith('##'):
current_word.append(embedding)
else:
if current_word:
word_embeddings.append(torch.mean(
torch.stack(current_word), dim=0))
current_word = [embedding]
return word_embeddings
Applications
-
Advanced NLP Tasks:
- Named Entity Recognition
- Question Answering
- Text Classification
-
Semantic Analysis:
- Word Sense Disambiguation
- Semantic Role Labeling
- Coreference Resolution
Best Practices
1. Model Selection
- Consider task requirements
- Balance size and performance
- Evaluate domain compatibility
2. Fine-tuning
- Adapt to specific domains
- Use task-specific data
- Monitor overfitting
3. Resource Management
- Handle batch processing
- Implement caching
- Optimize memory usage
Implementation Example
class AdvancedContextualEmbedder:
def __init__(self,
model_name='bert-base-uncased',
max_length=512):
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.model = AutoModel.from_pretrained(model_name)
self.max_length = max_length
def get_contextual_embeddings(self,
text,
layers=None,
pooling='mean'):
# Tokenize
inputs = self.tokenizer(text,
return_tensors="pt",
padding=True,
truncation=True,
max_length=self.max_length)
# Get embeddings
outputs = self.model(**inputs,
output_hidden_states=True)
# Select layers
if layers is None:
embeddings = outputs.last_hidden_state
else:
embeddings = torch.stack(
[outputs.hidden_states[i] for i in layers]
).mean(dim=0)
# Apply pooling
if pooling == 'mean':
return embeddings.mean(dim=1)
elif pooling == 'max':
return embeddings.max(dim=1)[0]
return embeddings
Advantages
-
Context Awareness:
- Dynamic representations
- Word sense disambiguation
- Contextual understanding
-
Performance:
- State-of-the-art results
- Transfer learning
- Few-shot capabilities
Limitations
-
Computational Cost:
- Resource intensive
- Slower processing
- Memory requirements
-
Complexity:
- Model complexity
- Training difficulty
- Parameter tuning
Summary
Contextual embeddings represent a significant advancement in text representation, offering dynamic, context-aware word vectors. While they require more computational resources than static embeddings, their superior performance in capturing word meaning and context makes them invaluable for modern NLP applications.