lingo.lol is one of the many independent Mastodon servers you can use to participate in the fediverse.
A place for linguists, philologists, and other lovers of languages.

Server stats:

66
active users

#imageannotation

0 posts0 participants0 posts today

If you haven’t checked in on #IMMARKUS lately (understandable—there’s been a lot going on!)—we’ve added even more transcription service options.

You can now run OCR or full-text transcription with a single click using:

• Anthropic Claude
• Azure Computer Vision
• Google Gemini
• Google Vision OCR
• LLaMA & Qwen via kluster.ai
• OCR.space
• OpenAI GPT
• Volcano Engine Doubao 1.5 Vision Pro

Try it out here: immarkus.xmarkus.org

Google's ImageInWords (IIW) is a framework for creating hyper-detailed image descriptions. The process starts with object detectors and a Vision-Language Model (VLM) generating initial captions, which are then refined by human annotators. This results in a high-quality dataset of 9018 images with detailed descriptions, improving AI training for image generation and classification.

google.github.io/imageinwords/
#AI #MachineLearning #ImageAnnotation #ComputerVision #AIResearch

google.github.ioImageInWordsImageInWords: Unlocking Hyper-Detailed Image Descriptions