OpenAI Launches Images in ChatGPT with GPT-4o

OpenAI’s GPT-4o: Where AI meets Typography

The new “Images in ChatGPT” feature from OpenAI enables direct image creation within the ChatGPT interface. The release of the new GPT-4o model empowers users to generate images directly during their conversations, which represents a major advancement in AI content creation.

“Images in ChatGPT” delivers advanced image generation capabilities across all subscription levels, including Plus, Pro, Team, and the free version, to extend access to this innovative tool to everyone. OpenAI spokesperson Taya Christianson stated that free tier users will have usage limits similar to DALL-E 3, with around three images per day, but these limits might change depending on demand. The firm maintains access to DALL-E through a specialized GPT system.

OpenAI research lead Gabriel Goh described GPT-4o as a transformative “omnimodal” system that processes multiple types of data, including text, images, audio, and video. The model now benefits from improved ‘binding’ performance, which solves a frequent problem faced in AI image generation. GPT-4o successfully handles 15 to 20 objects while keeping colors and shapes separate, which previous models failed to accomplish.

The model’s improved text rendering stands out as one of its major advancements. AI-generated images are used to display distorted or meaningless text elements. Goh described their development work as an iterative process that spanned several months before achieving the desired outcome. Although perfect text rendering for small text elements has not yet been achieved, the team managed to reach a level of consistency that makes text in images functionally reliable.

Instead of using diffusion models like most image generators, this system employs an autoregressive architecture. The autoregressive method generates images from left to right and top to bottom, similar to text generation, and this sequential approach may lead to enhanced text rendering and binding capabilities.

OpenAI demonstrated the system’s capabilities during a briefing by showing how it creates scientific diagrams such as Newton’s prism experiment with precise labeling and designs multi-panel comics with consistent characters and dialogue, as well as informational posters with accurate text. Demonstrations included practical applications like the creation of transparent background images for various uses such as stickers, restaurant menus, and logos.

Jackie Shannon, who leads multimodal products at ChatGPT, pointed out how the system takes advantage of global knowledge. She explained that her image creation process is bounded by her personal artistic ability, yet enriched by the comprehensive world knowledge she has acquired. The model integrates world knowledge into the process, so when you request an image of Newton’s prism experiment, you can obtain the image without needing to explain the experiment.

OpenAI maintains that the improved quality and advanced capabilities of their image generation process make the slightly extended wait time worthwhile. Shannon explained that despite needing latency improvements, the enhanced image quality and world knowledge compensate for the extra wait time.

OpenAI emphasized the establishment of strong protective measures to address concerns about possible misuse. The system blocks sexual deepfakes creation and CSAM requests while protecting against watermark removal. All images created by the system will feature standard C2PA metadata, which identifies them as OpenAI products despite the lack of visual watermarks. The company operates its own internal tools, which enable image verification.

Shannon emphasized that while no system reaches perfection in this area, they remain committed to enhancing their security measures and consider this their initial effort. Users retain ownership of ChatGPT-generated images and can utilize them freely as long as they adhere to our usage policies.

OpenAI’s “Images in ChatGPT” feature enhances its main product offering while expanding AI potential to create visually expressive tools within its conversation platform.

OpenAI’s GPT-4o: Where AI meets Typography

Recent Posts

Google Ads

Hot Categories

Business

Education

Entertainment

Events

Investing

News

Sports

Technology

video

Tag