Large-language-models

All Posts

ai-agents (16)
claude-code (6)
llm (5)
quantitative-finance (4)
autonomous-ai (4)
evaluation (4)
rag (3)
context-engineering (3)
observability (3)
anthropic (3)
large-language-models (3)
bayesian-optimization (3)
open-source (3)
machine-learning (2)
walk-forward (2)
agentic-workflows (2)
quant-research (2)
commodities (2)
evolutionary-algorithms (2)
bleu-score (2)
natural-language-processing (2)
gaussian-processes (2)
openclaw (2)
false-discovery-rate (2)
red-teaming (2)
synthetic-data (2)
agentic-rag (1)
retrieval (1)
chroma (1)
context-1 (1)
cispo (1)
lora (1)
alpha-generation (1)
debugging (1)
reliability (1)
bayesian-statistics (1)
hierarchical-models (1)
partial-pooling (1)
empirical-bayes (1)
mcmc (1)
pymc (1)
nuts (1)
sharpe-ratio (1)
quantitative-trading (1)
crypto (1)
hyperliquid (1)
copy-trading (1)
shrinkage (1)
github-actions (1)
sdlc (1)
code-review (1)
cicd (1)
factor-mining (1)
self-improving-ai (1)
hermes (1)
nous-research (1)
llm-agents (1)
agent-architecture (1)
memory-systems (1)
skill-learning (1)
clio (1)
agent-traces (1)
distributional (1)
clustering (1)
hierarchical-clustering (1)
k-means (1)
privacy (1)
telemetry (1)
opentelemetry (1)
generalized-product-of-experts (1)
acquisition-function (1)
occam-razor (1)
hyperparameter-tuning (1)
loop-engineering (1)
autonomous-agents (1)
sub-agents (1)
mcp (1)
verification (1)
ai-benchmarking (1)
machine-learning-engineering (1)
kaggle-competitions (1)
aide-framework (1)
openai-research (1)
investment-research (1)
financial-modeling (1)
agent-orchestration (1)
heartbeat-monitoring (1)
multi-agent-systems (1)
risk-management (1)
prompt-optimization (1)
reverse-engineered-prompt-attack (1)
large-language-model (1)
transformers (1)
weights-and-biases (1)
lightgbm (1)
evolving-deeper-llm-thinking (1)
ragas (1)
llm-as-a-judge (1)
metrics (1)
recursive-self-improvement (1)
llm-trading (1)
mind-evolution (1)
evolutionary-ai (1)
poetiq (1)
google-colab (1)
google-drive (1)
python (1)
json (1)
benchmark (1)
quantitative-research (1)
look-ahead-bias (1)
alpha (1)
frontier-models (1)
multiple-testing (1)
claude-fable-5 (1)
claude-opus (1)
harness-engineering (1)
phoenix (1)
meta-harness (1)
langchain (1)
causal-discovery (1)
transformer-neural-network (1)
attention-mechanism (1)
layer-normalization (1)
deep-learning (1)
adia-lab (1)
forecasting (1)
prediction-markets (1)
llms (1)
startups (1)

Published on
December 5, 2025
A Practical Guide to RAG Evaluation With RAGAS Metrics and Confidence Intervals
RAG Large-Language-Models Machine-Learning Natural-Language-Processing
How to model query quality, use bootstrapping, and report realistic RAG performance with RAGAS metrics and confidence intervals.
Published on
September 30, 2023
Red Teaming Large Language Models
Red-Teaming Large-Language-Models
Exploring Recent Techniques to Uncover and Mitigate Undesirable Behaviors in Language Models
Published on
September 28, 2023
Using Bayesian Optimization for Red Teaming Large Language Models
Bayesian-Optimization Large-Language-Models Red-Teaming
Using Bayesian Optimization for Red Teaming Large Language Models

Large-language-models

large-language-models (3)

A Practical Guide to RAG Evaluation With RAGAS Metrics and Confidence Intervals

Red Teaming Large Language Models

Using Bayesian Optimization for Red Teaming Large Language Models