subscribe to my blog
Building a golden dataset is an essential part of AI engineering. A golden dataset, also known as a ground truth dataset, is a carefully curated collection of data that serves as a benchmark for evaluating AI model performance. Referencing this ground truth set of answers instills confidence in your AI's response quality and accuracy, allowing you to run quantitative metrics such as those listed in RAGAS (Retrieval Augmented Generation Assessment System).
Creating a dataset with multiple contributors can be challenging, often feeling like an awkward dance where you're trying to balance not overburdening others while still accomplishing substantial work.
My First Experience Building a Golden Dataset
Identifying Topic Areas
When I created my first golden dataset at a large organization, I began by compiling a list of idea spaces or product areas we wanted to base our questions upon. To determine these areas, we analyzed our most frequently accessed documentation. For example, if you find that customers are primarily looking at features A, B, and C, this provides a guide for generating questions for each category.
Generating Questions
After establishing categories, I used a combination of sources to generate questions:
- FAQs
- User forums
- Customer feedback
- Subject Matter Experts (SMEs)
- AI-generated questions
Initial Approach
Once I had a solid list of questions, I placed them in an Excel spreadsheet with columns for:
- SME answers
- Reference links
I then distributed this spreadsheet to numerous SMEs, asking them to select and answer two to three questions within their expertise.
Challenges Encountered
Throughout this process, I encountered several challenges:
- Incomplete answers: Responses were often brief (e.g., "yes," "no," "kind of, you can find X here: <link>").
- Inconsistent use of columns: Some answers lacked reference links, while others combined text and links in a single column.
- Low response rate: Many SMEs didn't participate due to time constraints or unclear instructions.
Improving the Process
"Don't Make Me Think" Principle
People are busy, and adding to their cognitive load can be counterproductive. Applying the "Don't Make Me Think" principle from design can significantly improve the process. It's generally easier for people to review and correct pre-existing content rather than create it from scratch.
Revised Approach
For future golden dataset creation, I would:
- Compile a list of topic categories to test the AI against.
- Use various sources to generate a comprehensive list of questions.
- Use AI to generate initial responses to each question.
- Assign specific sets of questions to each SME, asking them to review and correct only if necessary.
This approach reduces the workload on SMEs while still leveraging their expertise to ensure accuracy.
Best Practices for Creating and Maintaining a Golden Dataset
- Regularly update the dataset to reflect new information and product changes.
- Implement a version control system to track changes and maintain dataset integrity.
- Establish a review cycle to ensure ongoing accuracy and relevance.
- Use a diverse group of SMEs to cover various aspects of your product or service.
- Implement a user-friendly interface for SMEs to review and edit entries easily.
- Idea: Create a Google Form linked to a Google Sheets backend. Generate unique links for each SME, directing them to a specific set of questions. This approach would streamline the response process, making it easier for SMEs to contribute.
Conclusion
Building a golden dataset is crucial for AI engineering, but it requires careful planning and execution. By learning from past experiences and implementing best practices, you can create a more efficient and effective process for developing and maintaining your golden dataset. This, in turn, will lead to more accurate AI models and better user experiences.
Cheers,
Paulo