Hallucination Detection: How We Measure AI Model Accuracy
What Is AI Hallucination?
AI hallucination is when artificial intelligence models generate information that does not actually exist in an extremely convincing manner. This means an AI model provides completely fabricated information about your brand, and the consequences can be serious.
Real-World Examples
- Fake products: AI may list a product your company has never made as one of your "most popular products".
- Wrong dates: It can state incorrect founding years, relocation dates, or important milestones.
- Fabricated employees: It may refer to people who never worked there as "founder" or "CEO".
- Wrong location: It can claim you are located in a different city or country.
- Imaginary awards: It may attribute awards or certifications you never received to you.
Why Does This Matter?
Consumers increasingly trust AI responses. Research shows that 73% of users trust AI recommendations. If an AI model provides false information about your brand:
- Customer trust is damaged: Customers who arrive with wrong information become disappointed when they learn the truth.
- False expectations are created: Customers expecting a service you do not offer become unhappy.
- Reputation damage: Incorrect negative information harms your brand image.
- Competitive disadvantage: If competitors receive accurate information while yours is wrong, you fall into a disadvantaged position.
AURA's 4 Hallucination Detection Methods
1. Fake Product Test
In this method, we ask each AI model about a product that does not actually exist, using your brand name. For example, for a software company, a question like "How is X brand's smartwatch product?"
If the AI model responds "Yes, X's smartwatch is very successful, its features include..." then this model is hallucinating. The fact that it describes a non-existent product in detail calls into question the reliability of other information it provides about your brand.
This test is applied separately for each model, and results are reflected in the reliability score.
2. Cross-LLM Consistency
We ask the same question to 9 different AI models and compare the responses. For example, for the question "When was company X founded?":
- 7 models say "2015"
- 1 model says "2020"
- 1 model says "2012"
Here, "2015" on which 7 models agree is most likely the correct information. The other 2 models are hallucinating. This inconsistency is shown separately for each model in the accuracy report.
Cross-consistency checking is particularly effective because it is difficult to catch a single model's error, but when you place 9 models side by side, inconsistencies immediately stand out.
3. Sonar Grounding (Web-Based Verification)
Unlike other models, Perplexity Sonar performs real-time internet searches to support its responses with actual sources. This capability plays a critical role in AURA's verification process.
How it works:
- Sonar collects factual information about your brand from the web.
- This information is established as "ground truth".
- The other 8 models' responses are compared against this ground truth.
- Deviations and inconsistencies are flagged as hallucinations.
Thanks to Sonar's web access, we can verify with the most up-to-date information. This is especially valuable for newly established companies or brands that have recently undergone changes.
4. Per-Provider Analysis
Each AI model exhibits different characteristics in terms of hallucination tendency. AURA evaluates each model individually and assigns a reliability score:
- Model reliability score: How reliable the information each model provides about your brand is.
- Hallucination frequency: Which model generates more fabricated information.
- Knowledge source quality: How up-to-date the model's training data is.
- Confidence calibration: Does the model indicate when it is uncertain, or does it present wrong information with a definitive tone.
How to Interpret the Results?
Overall Risk Level (overall_risk)
AURA determines an overall risk level as the result of your hallucination analysis:
- Low: AI models generally provide accurate information. There may be minor inconsistencies, but there is no serious hallucination problem.
- Medium: Inconsistencies or incorrect information detected in some models. It is recommended to review your content strategy to correct information in these models.
- High: Serious hallucinations exist in multiple models. Urgent intervention is needed. Publish clear and structured information on your website, update your
llms.txtfile.
What Should You Do?
- Clearly publish fundamental information on your website such as founding year, services, and location.
- Add Schema.org structured data - AI models can read this directly.
- Create an
llms.txtfile and place it at the root of your website. - Regularly run AURA analyses to track whether hallucinations are decreasing.
- Create an FAQ page - the question-answer format is ideal for AI to extract correct information.