Building a Golden Dataset for AI Engineering: Lessons and Best Practices

subscribe to my blog

Building a golden dataset is an essential part of AI engineering. A golden dataset, also known as a ground truth dataset, is a carefully curated collection of data that serves as a benchmark for evaluating AI model performance. Referencing this ground truth set of answers instills confidence in your AI's response quality and accuracy, allowing you to run quantitative metrics such as those listed in RAGAS (Retrieval Augmented Generation Assessment System).

Creating a dataset with multiple contributors can be challenging, often feeling like an awkward dance where you're trying to balance not overburdening others while still accomplishing substantial work.

My First Experience Building a Golden Dataset

Identifying Topic Areas

When I created my first golden dataset at a large organization, I began by compiling a list of idea spaces or product areas we wanted to base our questions upon. To determine these areas, we analyzed our most frequently accessed documentation. For example, if you find that customers are primarily looking at features A, B, and C, this provides a guide for generating questions for each category.

Generating Questions

After establishing categories, I used a combination of sources to generate questions:

FAQs
User forums
Customer feedback
Subject Matter Experts (SMEs)
AI-generated questions

Initial Approach

Once I had a solid list of questions, I placed them in an Excel spreadsheet with columns for:

SME answers
Reference links

I then distributed this spreadsheet to numerous SMEs, asking them to select and answer two to three questions within their expertise.

Challenges Encountered

Throughout this process, I encountered several challenges:

Incomplete answers: Responses were often brief (e.g., "yes," "no," "kind of, you can find X here: <link>").
Inconsistent use of columns: Some answers lacked reference links, while others combined text and links in a single column.
Low response rate: Many SMEs didn't participate due to time constraints or unclear instructions.

Improving the Process

"Don't Make Me Think" Principle

People are busy, and adding to their cognitive load can be counterproductive. Applying the "Don't Make Me Think" principle from design can significantly improve the process. It's generally easier for people to review and correct pre-existing content rather than create it from scratch.

Revised Approach

For future golden dataset creation, I would:

Compile a list of topic categories to test the AI against.
Use various sources to generate a comprehensive list of questions.
Use AI to generate initial responses to each question.
Assign specific sets of questions to each SME, asking them to review and correct only if necessary.

This approach reduces the workload on SMEs while still leveraging their expertise to ensure accuracy.

Best Practices for Creating and Maintaining a Golden Dataset

Regularly update the dataset to reflect new information and product changes.
Implement a version control system to track changes and maintain dataset integrity.
Establish a review cycle to ensure ongoing accuracy and relevance.
Use a diverse group of SMEs to cover various aspects of your product or service.
Implement a user-friendly interface for SMEs to review and edit entries easily.
- Idea: Create a Google Form linked to a Google Sheets backend. Generate unique links for each SME, directing them to a specific set of questions. This approach would streamline the response process, making it easier for SMEs to contribute.

Conclusion

Building a golden dataset is crucial for AI engineering, but it requires careful planning and execution. By learning from past experiences and implementing best practices, you can create a more efficient and effective process for developing and maintaining your golden dataset. This, in turn, will lead to more accurate AI models and better user experiences.

Cheers,

Paulo