Worker AI Usage in Daily Tasks
- Jonathan H. Westover, PhD
- 2 hours ago
- 3 min read
Over a billion people are using AI in 2026, but most do not limit themselves to ChatGPT, trying other options instead. Many tools still experience ‘hallucinations’, making up wrong data.
Analyzing the trend of LLM use for daily tasks, the April 2026 report from Open Resource Applications, a company dedicated to developing and offering free, human-built and human-verified AI tools, compared which assignments users give to AI the most and which of them are most vulnerable to AI's ‘hallucinations’.
Mathematical calculations are the easiest for AI to mess up, with an accuracy of only 0.38/1.
The best LLM model to do daily tasks is Gemini 3 Pro (Preview), which scores the highest on 4 out of 5 most common daily assignments.
Teaching, fitness, and health-related answers have the same issues when generated by AI, and most models have a 33% risk of ‘hallucinating’ a part of the answer without reliable sources.
The study collected the most common tasks assigned to AI based on public records of generative artificial intelligence usage. To assess LLM models’ performance, the research matched each task category to the most relevant benchmarks, using datasets from MMLU-Pro, GPQA, IFEval, WildBench and Omni-MATH. The accuracy scores were calculated for each model and then averaged for each task. The study also includes the models that performed the best in each assignment.
Here are the top 5 most difficult tasks for AI to complete without adding incorrect information:
Everyday Task | Benchmark | Average Accuracy | Best Model |
Mathematical Calculation | Omni-MATH | 0.3861 | GPT-5 mini (2025-08-07) |
Data Analysis | GPQA | 0.522 | Gemini 3 Pro (Preview) |
Tutoring or Teaching | MMLU-Pro | 0.67 | Gemini 3 Pro (Preview) |
Health, Fitness, Beauty or Self-Care | MMLU-Pro | 0.67 | Gemini 3 Pro (Preview) |
Specific Information | MMLU-Pro | 0.67 | Gemini 3 Pro (Preview) |
You can check the full research findings by following this link.
AI Is Bad At Math
Large Language Models (LLMs) are created to analyze and generate texts, and calculations are not part of their primary function. This is one of the reasons why AI is often wrong when given even the simplest math tasks. Most AIs score only 0.38/1 on the accuracy, meaning 2 times out of 3 the final result can be ‘hallucinated’.
AI Cannot Perform Data Analysis With Incomplete Datasets
Data analysis includes inspecting, cleaning, and transforming the data, and while it seems that AI should be able to process it easily, only in 52% of the cases will AI give you the correct data. It happens because LLMs prioritize guessing the next logical token, a word or a number, in a longer sequence, rather than displaying the correct data.
AIs Cannot Be Your Teacher
While many digital users turn to AI for teaching, most language models score only 0.67 out of 1 on accuracy when it comes to learning tasks. The best model that can reliably give data or create a useful learning exercise is Gemini 3 Pro (Preview).
“Teaching is 100% about giving students correct information, and right now, most AIs cannot achieve that,” comments a spokesperson from Open Resource Applications.”LLMs’ output is often wrong when the data given to it is incomplete, or when the larger context is required.”
Health, Fitness, Beauty, and Self-Care Are Better Left For Professionals
Similar to teaching materials, most AIs score 0.67/1 for accuracy when it comes to health and beauty-related topics. Most of the time, LLMs will be able to search and summarize information from the Internet, but even one wrong source or a lack of data can lead to AI hallucinations that can be dangerous for users’ health.
AI With Come Up With Information Instead Of Admitting to Not Finding It
AI scores 0.67/1 on average for accuracy when it comes to specific information queries. When LLMs are given a niche topic with few sources or incomplete data, they will ‘predict’ the answer instead of admitting they cannot help. For most of these tasks, Gemini 3 Pro (Preview) showed better results than other language models, but no model was able to avoid making up information 100% of the time.
“LLMs are a very useful tool, but users need to understand their primary function and limitations. AIs are at their best when they help you edit the text you’ve drafted, brainstorm ideas, or are part of a game or role play. Mathematics or medical fields can use AI only with professionals nearby who can check the work. Otherwise, users may end up with completely wrong data.”






















