If you watch a lot of educational material on YouTube, you might wish that you could more easily remember what you learnt. You can now use Large Language Models (LLMs) like ChatGPT and Gemini to help with this process.
To create a demo, I used a small playlist of about 15 videos from the Google Cloud Next 24 conference [the conference covered many topics such as Vertex AI, Retrieval Augmented Generation (RAG), Vector databases, llamaindex, langchain etc] and built an AI powered question answering system on top of it using off-the-shelf software tools.
Using my system, I can ask questions over the entire playlist, and get a summary answer with relevant citations to the specific locations in the video.
Here are the steps to do this:
1 Use Airtable to manage the workflow
If you have more than 10-15 videos in the playlist, I recommend using Airtable to manage the process. First of all, it is much easier to manage the process using some software like Airtable. I recommend Airtable because of its simple user interface and overall pleasant UX.
Create a new Airtable, and add the YouTube URL, title and description to your Airtable as shown below. I have shared the Airtable table at the bottom of this article.

2 Transcribe YouTube video using TurboScribe
I recommend getting a monthly or yearly subscription to TurboScribe.
Here is why:
- it offers nearly unlimited transcription
- it is very accurate, even with multiple speakers who speak with a non-native accent (remember, this is very common for technical presentations like the ones at Google Next conference)
- it provides an easy to use interface for labeling speakers and doing minor edits to the transcript
- it provides the transcript output in multiple different formats, which makes it quite versatile for our purpose
How to use TurboScribe:
First you will select a video from your playlist and upload to TurboScribe. I suggest creating a new folder (I have called it demo) if you want to follow along.
You can do this by clicking on the Transcribe Files button.

Click on the hyperlink icon.

Provide the YouTube video link (from Airtable)

It will load video details. Choose the following settings: highest accuracy, detect speakers automatically, and transcribe to English language.

Click on Transcribe button at the bottom. This will begin the transcription process and you will see a status each time you refresh the page.

A few minutes later, the transcription is complete and you can see the Status has changed to a green checkmark. When you go for the paid plan, the transcription is much faster.

3 Identify the speakers
You can click on the transcript to see the contents.
On the right pane, click on “Edit transcript” to rename the speaker labels to speaker names.

You can identify the speaker names by reading the transcript or by watching the video.
Hover on the Speaker label and you will see a button to edit the speaker name.

When you click on the popup, it shows all the speakers and you can edit all their names in one place.

Click on “Done Editing” after adding all the speaker names.

Now do the same for all the videos in your playlist.
I recommend using some prefix to indicate that a file has been labelled.
4 Upload PDF files to ChatPDF and query them
Download the PDF files for the entire folder by clicking on the Export Folder button.

Export in PDF format, and make sure you include the Section Timestamps.

Create a paid account on ChatPDF (you need a paid account to be able to query the entire folder).
Create a folder inside ChatPDF based on the playlist name. In this example, I have called it demo.

Now copy all the PDF files into the folder you just created.

Now you can query across the entire folder and ask questions specific to its contents. Notice that the answer includes citations, and you can in fact click on the citations to go to the specific page in the PDF file.
Once you get to the page, you should be able to see the video name and the approximate location and play the specific video segment, if you are interested.

In addition to searching across the videos, you can also ask questions which are a combination of “general knowledge” and specific information from the videos in the playlist.
Since the folder maps directly to the playlist, asking a question across the folder is the same as asking the question across the video playlist
Here is an example:

In my opinion, the most interesting thing about that answer is not the fact that you can “ground” the reply based on the video playlist, but you in fact get very specific examples (the citations).
This provides a pretty unique type of video search capability which is only possible because of LLMs.
5 Create a video outline for active learning and better recall
While text summarization is a well known benefit of LLMs, one of the things you can do with this system is also get some timestamps.
Here is an example video:
When you copy/paste the transcript PDF contents, this is the video summary using Google Gemini Flash.
When I tried this using GPT-4o the results were not as good
Generative AI Use Cases and Design Patterns with Databases – Outline
1. Introduction (0:00 – 3:09)
- Pranav Nambiar and Curtis Van Gint introduce the session and the topic of Generative AI (Gen AI) use cases and design patterns. (0:00 – 0:48)
- Pranav asks about audience experience with Gen AI and design patterns. (0:49 – 3:09)
- Highlighting the rapid growth of Gen AI across industries and its impact.
2. Building for the Future (3:12 – 3:38)
- Curtis emphasizes the need to design Gen AI applications for the future, focusing on hyper-personalization, intelligence, reliability, efficiency, and user experience.
3. Key Concepts and Design Patterns (3:41 – 7:39)
- Pranav outlines the session’s structure covering semantic search, element-based generation, RAG, React, orchestration, and a summary. (3:41 – 4:09)
- A brief explanation of how the session aims to cater to both beginners and advanced Gen AI users. (4:10 – 7:39)
4. Semantic Search/Vector Search (7:40 – 10:36)
- Curtis explains the two phases of semantic search: ingestion and retrieval. (7:40 – 8:15)
- Detailing the four steps of ingestion: loading data, chunking, embedding, and storing. (8:16 – 9:00)
- Illustrating the retrieval process using a coffee shop recommendation example. (9:01 – 9:59)
- Presenting a real-world use case of legal Q&A and its implementation with AlloDB and vector search. (10:00 – 10:36)
5. Vector Search Integration and Use Cases (10:39 – 15:20)
- Pranav emphasizes the importance of vector search integration within databases for efficiency and data management. (10:39 – 11:34)
- Discussing the availability of vector search capabilities in various Google Cloud database offerings. (11:35 – 12:13)
- Exploring key use cases of vector search across different industries: basic question answering, product recommendations, record matching, anomaly/fraud detection. (12:14 – 15:20)
6. LLM-Based Generation and Prompt Engineering (15:21 – 18:55)
- Curtis introduces LLMs, their training, and their capabilities in recognizing, predicting, and generating human language. (15:21 – 15:57)
- Prompting as the fundamental concept of working with LLMs, illustrated with an example of document summarization. (15:58 – 16:47)
- Discussing different prompt engineering strategies: zero-shot, few-shot, and chain-of-thought prompting. (16:48 – 18:05)
- Emphasizing best practices for prompt engineering: including examples, being concise, and involving explanation. (18:06 – 18:55)
7. LLM-Based Generation Use Cases (18:58 – 21:34)
- Pranav presents key use cases for LLM-based generation: content generation, document summarization, natural language classification, and translation. (18:58 – 21:34)
8. Retrieval Augmented Generation (RAG) (21:36 – 23:22)
- Curtis introduces RAG as a method to ground LLMs with contextual information from external sources. (21:36 – 22:11)
- Distinguishing between static grounding (fixed information) and dynamic grounding (live/operational data). (22:12 – 22:52)
- Demonstrating the process of incorporating grounded information into prompts for personalized responses. (22:53 – 23:22)
9. RAG Use Cases and the React Pattern (23:23 – 27:13)
- Pranav presents key use cases for RAG: personalized content generation, Q&A bots, enhanced search, and adaptive recommendations/explanations. (23:23 – 24:52)
- Introducing the React pattern as an advanced approach for reasoning and acting with LLMs. (24:53 – 25:32)
- Explaining the three parts of the reasoning exercise: thoughts, actions, and observations. (25:33 – 27:13)
10. React and LLM Agents: Building Agents with Tools (27:16 – 34:19)
- Curtis explains the process of building agents using tools within the React framework, highlighting the prompt generation and tool manifest. (27:16 – 28:58)
- Illustrating the thought-action-observation loop in agent execution, emphasizing the use of tools for specific actions. (28:59 – 30:16)
- Introducing agentic RAG for dynamic information retrieval within the agent. (30:17 – 31:50)
- Providing best practices for building tools: keeping them simple, addressing security, and optimizing performance. (31:51 – 33:40)
- Exploring advanced patterns: plan and execute, retrieval augmented fine tuning (RAFT), and multi-agent architectures. (33:41 – 34:19)
11. LLM Agent Use Cases and Orchestration (34:22 – 40:11)
- Pranav presents key use cases for LLM agents: task assistants, workflow automations, smart chatbots, and on-demand reporting. (34:22 – 36:51)
- Introducing orchestration frameworks as a means to integrate and manage components of a Gen AI application. (36:52 – 37:10)
- Curtis explains the role of orchestration tools in simplifying Gen AI application development and leveraging existing patterns. (37:11 – 37:46)
12. Orchestration Tools and Frameworks (37:47 – 40:11)
- Discussing the capabilities of Vertex AI extensions, function calling, and open-source frameworks like LangChain and Llama Index. (37:47 – 39:25)
- Highlighting Google Cloud’s integration with LangChain and its components: VectorStore, DocumentLoaders/Savers, and ChatHistory. (39:26 – 40:11)
13. Demo: Symbol Air Customer Assistant (40:12 – 45:07)
- Pranav and Curtis demonstrate the Symbol Air Customer Assistant, showcasing its functionality and use of agentic flow. (40:12 – 45:07)
- The demo involves booking a flight, checking flight status, and highlighting security concerns through an attempt to access another user’s information.
14. Architecture and Summary (45:08 – 46:35)
- Curtis presents the architecture of the Symbol Air Customer Assistant, highlighting the interaction between the user, agent, Vertex AI, tools, and databases. (45:08 – 45:50)
- Pranav summarizes the key takeaways from the session: the importance of grounding LLMs, the usefulness of various design patterns, and the significance of orchestration frameworks. (45:51 – 46:14)
- Emphasizing the potential of AI and data to create magical and impactful experiences. (46:15 – 46:35)
Now you can simply copy-paste this outline into your note taking software.
This is a good system if you are interested in "active learning".
You spend a few extra minutes taking the time to summarize the video using an LLM, and you have some notes for the video that you can read/search anytime in the future.
I use Workflowy which (in my opinion) is the best outlining software as it provides very fast search and the ability to expand and collapse information in a way which makes it easy to review your notes.
