YouTube Knowledge Base

OpenClaw skill status — generated 2026-03-31T11:20:00Z

Pipeline Operational — 30/30 Videos Processed

All videos from both channels have been transcribed, chunked, and embedded. Semantic search is live. Fix applied: --sleep-subtitles 2 + --remote-components ejs:github + bgutil-ytdlp-pot-provider plugin resolved YouTube 429 rate limiting on the timedtext API.

Health
Operational
Last run: 30 processed, 0 failed
Channels
2
Dr. Jones DC, Josh Holyfield
Videos
30 / 30
100% success rate
Chunks
77
All embedded (1536-dim)
Database
692 KB
~/.openclaw/memory/youtube_kb.sqlite
Cron
Every 6h
youtube-kb-ingest, isolated session

Monitored Channels

ChannelHandleStatusVideosChunksLast Polled
Dr. Jones, DC @DrJonesDC active 15 all done 44 2026-03-31 10:59
Josh Holyfield @josh_holyfield active 15 all done 33 2026-03-31 10:59

Videos (30)

TitleChannelPublishedStatus
The PERFECT GLP-1 DinnerDrJonesDC2026-03-29done
The BEST GLP-1 Snacks!DrJonesDC2026-03-26done
10 Things You Should NEVER Do on RetatrutideDrJonesDC2026-03-24done
The PERFECT GLP-1 LunchDrJonesDC2026-03-22done
Insulin ISN'T What You Think It IsDrJonesDC2026-03-20done
The GLP-1 Weight Loss Cheat SheetDrJonesDC2026-03-17done
GLP-1 Plateau FixDrJonesDC2026-03-14done
GERD While Dieting?DrJonesDC2026-03-12done
GLP-1 Gut FixDrJonesDC2026-03-10done
Treat The Root CauseDrJonesDC2026-03-07done
Oral vs Injectable BPCDrJonesDC2026-03-05done
How Much Muscle LOSS From Ozempic?DrJonesDC2026-03-03done
Peptide Stacking: Genius or Mistake?DrJonesDC2026-02-28done
5 Early Warning Signs of LupusDrJonesDC2026-02-27done
Always Tired? WATCH THIS!DrJonesDC2026-02-24done
New MOTS-C Study Changes How I Stackjosh_holyfield2026-03-29done
What's the Best Workout Split for Muscle Growth?josh_holyfield2026-03-26done
How to Stack SS-31 + MOTS-Cjosh_holyfield2026-03-22done
What Is Selank? The Russian Anxiety Peptidejosh_holyfield2026-03-19done
Why TRT Raised Your Blood Pressurejosh_holyfield2026-03-15done
Metformin Vicious Cyclejosh_holyfield2026-03-12done
Does TRT Raise Heart Attack Risk?josh_holyfield2026-03-08done
Peptide Injection Lumps and Nodulesjosh_holyfield2026-03-05done
Does TRT Cause Prostate Cancer?josh_holyfield2026-03-01done
Free Testosterone vs Total Testosteronejosh_holyfield2026-02-26done
Thymosin Alpha-1 vs Thymosin Beta-4josh_holyfield2026-02-22done
Best Testosterone Boosting Supplementsjosh_holyfield2026-02-19done
Cerebrolysin: The Peptide for Brain Repairjosh_holyfield2026-02-15done
The Ultimate BPC-157 Guidejosh_holyfield2026-02-12done
How to Actually Optimize Your Testosteronejosh_holyfield2026-02-08done

Ingestion Run History

#StartedDurationDiscoveredProcessedFailedChunksEmbeddings
810:59:2318m 44s0 30 07777
710:14:0914m 32s0 0 2700
610:12:2816m 11s0 0 3000
502:38:33<1s15 0000
402:17:1028s15 0 1500
301:49:0619s0 0 1500
201:48:57<1s0 0000
101:48:321s15 0000

All times UTC, 2026-03-31. Runs 1-5: initial setup, discovery, and pre-fix failures. Run 6-7: 429 failures before fix. Run 8: first fully successful run after yt-dlp fix.

Pipeline Architecture

RSS Feeds
Working
Poll & Discover
Working
Extract Transcript
yt-dlp + PO tokens
Chunk Text
~400 tokens
Embed
OpenAI 3-small
SQLite
692 KB

Transcript extraction via yt-dlp with Node.js runtime, --sleep-subtitles 2, --remote-components ejs:github, and bgutil-ytdlp-pot-provider plugin. Google cookies for authenticated requests. 30s delay between videos for rate limiting.

Scripts & Capabilities

manage_channels.py

Channel lifecycle management

  • add — Add by @handle, channel ID, or URL
  • remove — Soft-delete (preserves data)
  • list — All channels with poll timestamps
  • stats [channel] — Video counts, chunks, DB size

ingest.py

Full ingestion pipeline with rate limiting

  • RSS → transcript → chunk → embed → store
  • yt-dlp + cookies + PO tokens (primary)
  • youtube-transcript-api (fallback)
  • --dry-run --no-embed --retry-failed --channel --verbose

search_kb.py

Semantic and keyword search

  • Vector search via OpenAI text-embedding-3-small
  • SQL LIKE keyword fallback (no API key needed)
  • --channel --since --top N --json
  • Compact output for agent context windows

schema.sql

SQLite schema (4 tables, 8 indexes)

  • channels — Metadata, RSS URL, active flag
  • videos — Status (0/1/-1/-2), fail tracking
  • transcript_chunks — Text, timestamps, BLOB embeddings
  • ingest_runs — Per-run metrics

Configuration

Environment
OPENAI_API_KEYset
CLOUDFLARE_API_TOKENset
CLOUDFLARE_ACCOUNT_IDset
YT_KB_PROXY_URLnot needed
Google Cookiespresent
Infrastructure
Python3.12.3
yt-dlp2026.3.17
openai SDK2.30.0
PO Token Pluginbgutil 1.3.1
Embedding modeltext-embedding-3-small
Chunk target400 tokens / 50 overlap
Request delay30s between videos
Subtitle sleep2s per request

Cron Schedule

JobScheduleAgentTimeoutStatus
youtube-kb-ingest 0 */6 * * * ET main (isolated) 900s needs OpenRouter credits
youtube-kb-retry-failed 15 3 * * 0 ET main (isolated) deferred

Remaining Items

Cron Agent Billing

Cron jobs require OpenRouter credits for the agent session. User has topped up; next scheduled run will validate end-to-end automated ingestion.

Cron Timeout

Current 900s timeout is tight for 30+ videos with 30s delays. Consider bumping to 1800s once more channels are added.

Weekly Retry Job

youtube-kb-retry-failed should be registered after the first successful week of automated ingestion.

Build Timeline

01:45 UTCSkill created: directories, venv, scripts, SKILL.md, schema
01:48First dry run — 15 Fireship videos discovered
01:49Full ingest attempt — all 429'd (datacenter IP blocked)
02:00Tailscale exit node configured (residential IP)
02:17Channels switched to DrJonesDC + josh_holyfield
02:20Diagnosed: timedtext endpoint 429 even on residential IP
02:35Google cookies added — auth confirmed but still 429
10:12Cron runs fail (OpenRouter billing + 429 still active)
10:50Research: PO tokens required for timedtext API
10:55Fix: --sleep-subtitles 2 + bgutil-ytdlp-pot-provider
10:59Run 8: 30/30 videos processed, 77 chunks, 77 embeddings