lingo.lol is one of the many independent Mastodon servers you can use to participate in the fediverse.
A place for linguists, philologists, and other lovers of languages.

Server stats:

61
active users

#deepresearch

2 posts2 participants0 posts today

"Ordinary users don’t want to learn about the relative strengths and weaknesses of various products like Operator and Deep Research. They just want to ask ChatGPT a question and have it figure out the best way to answer it.

It’s a promising idea, but how well does it work in practice? On Friday, I asked ChatGPT Agent to perform four real-world tasks for me: buying groceries, purchasing a light bulb, planning an itinerary, and filtering a spreadsheet.

I found that ChatGPT Agent is dramatically better than its predecessor at grocery shopping. But it still made mistakes at this task. More broadly, the agent is nowhere close to the level of reliability required for me to really trust it.

And as a result I doubt that this iteration of computer-use technology will get a lot of use. Because an agent that frequently does the wrong thing is often worse than useless."

understandingai.org/p/chatgpt-

Understanding AI · ChatGPT Agent: a big improvement but still not very usefulBy Timothy B. Lee

"Kimi-Researcher:
End-to-End RL Training for Emerging Agentic Capabilities"

Kimi-Researcher is an agentic thinking model that can do multi-step planning, reasoning, and tool use. It uses 3 main tools: a parallel, real-time internal search tool; a text-based browser tool for web tasks; and a coding tool for code execution. Kimi-Researcher was trained entirely through end-to-end agentic reinforcement learning.

OpenAI's chatGPT deep research report generator will now let you save your reports as nicely formatted PDF files with inline images and tables nicely formatted. It still has an issue that its references at the end of the report are not in the proper bibliographic format for academic settings.

You can now connect your Box and #Dropbox accounts to #DeepResearch on #ChatGPT and pull data, which will be used by the #AI to conduct #research. Please do not do this - you will add to the unending pool of data being sucked out of creators and repurposed for profit - not your profit or benefit. #copyright #KM #research #legalresearch #education #copyright #intellectualproperty bleepingcomputer.com/news/arti

I am having trouble generating deep research reports from AI2 Scholar QA tool as they are timing out at 240 seconds. So, I am having to add a clause to my prompt "Generate only 5 sections in your report so that you do not time out on writing a complete report." Otherwise, the tool tries to generate a report with from 6 to 8 sections and winds up aborting with a timeout error that does not save any of the generated report text and references.

AIのDeep Researchを使い比べ--ChatGPT、Gemini、Perplexity、Grokの違いは?
(Comparing AI Deep Research--What's the difference between ChatGPT, Gemini, Perplexity, and Grok?)

Nice recent article comparing several tools for deep research. The article is in Japanese so you need to bring it up in a browser like Chrome that will autotranslate each page for you if you select that option after doing a right click.

news.yahoo.co.jp/articles/ac54

I've been exploring this deep research tool mentioned by Tom Dörr on X.
The tool, available at
research.u14.app has many configuration options for how you set it up. The tool has more interactive features during the generation of the report that you can use to steer the direction of the report, ask it to generate more output in the report, etc. It is from the repo: github.com/u14app/deep-researc
This would be good for exploratory
research.

I'm looking a various AI deep research tools and I'm finding that what is just as valuable or sometimes far more valuable than the actual report it produces are tools like Genspark that allow you to see how they reasoned and what articles they "read" in producing their final report. These reasoning breadcrumbs are great for exploratory search and lateral thinking.

"DeepSeek-R1 Thoughtology:
Let’s <think> about LLM reasoning"

Interesting, very long paper about how reasoning works in DeepSeek-R1. One finding was that enhanced reasoning creates dual-use risk - better capabilities but worse safety. Future models might move away from the single chain of reasoning in this model to diverse reasoning strategies to enhance problem-solving flexibility.

mcgill-nlp.github.io/thoughtol

McGill NLPDeepSeek-R1 Thoughtology: Let’s think about LLM reasoningLarge Reasoning Models like DeepSeek-R1 mark a fundamental shift in how LLMs approach complex problems. Instead of directly producing an answer for a given input, DeepSeek-R1 creates detailed multi-step reasoning chains, seemingly “thinking” about a problem before providing an answer. This reasoning process is publicly available to the user, creating endless opportunities for studying the reasoning behaviour of the model and opening up the field of Thoughtology. Starting from a taxonomy of DeepSeek-R1’s basic building blocks of reasoning, our analyses on DeepSeek-R1 investigate the impact and controllability of thought length, management of long or confusing contexts, cultural and safety concerns, and the status of DeepSeek-R1 vis-à-vis cognitive phenomena, such as human-like language processing and world modelling. Our findings paint a nuanced picture. Notably, we show DeepSeek-R1 has a ‘sweet spot’ of reasoning, where extra inference time can impair model performance. Furthermore, we find a tendency for DeepSeek-R1 to persistently ruminate on previously explored problem formulations, obstructing further exploration. We also note strong safety vulnerabilities of DeepSeek-R1 compared to its non-reasoning counterpart, which can also compromise safety-aligned LLMs.

I ran a little experiment with the free "Deep Research" feature of perplexity.ai
on the question of "State of the Art in Generative Retrieval-Augmented Models in Information Retrieval."

The generated report is shown below.
It consulted 71 academic sources, but only 1 source was listed in the report.
Also, I can browse the list of 71 sources but there was no way to export it and it does not show in the
Perplexity Page below:

perplexity.ai/page/state-of-th