There’s a new Google AI model in town, and it can generate or change images just as effortlessly as text and as part of a chatbot conversation.
The results aren’t ideal, but it’s most likely that everyone in the near future will be able to edit photos in this manner.
Last week, Google expanded access to Gemini 2.0 Flash’s native image-generation capabilities, making the experimental feature available to anyone using Google AI Studio. Previously only available to testers since December, the multimodal technology combines native text and image processing skills into a single AI model.
The new model, titled “Gemini 2.0 Flash (Image Generation) Experimental,” flew somewhat under the radar last week, but it has been garnering more attention over the past few days due to its ability to remove watermarks from images, albeit with artifacts and a reduction in image quality.
That isn’t the only trick. Gemini 2.0. Flash may add and remove objects, edit scenery, change lighting, try to shift image angles, zoom in or out, and conduct other transformations all with varied degrees of success depending on the subject matter, style, and image in question.
In order to make it happen, Google trained Gemini 2.0 on a massive collection of photos (converted into tokens) and text. The model’s “knowledge” of images occupies the same neural network space as its knowledge of world concepts from text sources, allowing it to generate image tokens that are then translated back into images and supplied to the user.
So, what exactly can the Gemini 2.0 Flash do? Notably, its support for conversational picture editing enables users to iteratively enhance photographs using natural language conversations over numerous sequential prompts. You can communicate with it, telling it what you wish to add, remove, or change. It’s not flawless, but it represents the start of a new form of native image editing capabilities in the computer world.
We ran Gemini Flash 2.0 through a battery of casual AI image-editing tests, and the results are shown below. For example, we deleted a rabbit from an image of a grassy yard. We also retrieved a chicken from a cluttered garage. Gemini adds to the background with its best guess.
With this latest innovation, what does the future hold for software like Photoshop?
Comments