| Using AI for photo editing or image generation has often been frustrating, at least in Austin Carr's experience. Today he writes about a new Google tool that's changing that. Plus: A major leap in space science might be blunted by US cuts to research funding, and Nike sees opportunity in WNBA shoe deals. If this email was forwarded to you, click here to sign up. For the past week, I've been nerding out over a new artificial intelligence model called "nano banana." The release began receiving raves for its wild image editing and generation capabilities, only for Google to reveal it was behind the mysterious launch. The company has since incorporated the model into its Gemini AI chatbot, and the photo quality and output speeds are pretty darn nuts—especially considering it's free to use. It wasn't long ago that these kinds of photo services required expensive subscriptions or separate apps for, say, editing pictures with AI versus generating images from prompts. But access to such tools is quickly expanding, and Google in particular is stuffing Gemini with free universal features. Nicole Brichtova, image product lead for Google's AI research division DeepMind, says it's part of a big push to make sure Gemini can interact well with any type of media, be that text, photo, video or voice. She says Google's goal for Gemini is "to be able to input any modality and output any modality—that's 100% where we're going." It's a welcome shift from the fragmented experience that kicked off the AI era. It was annoying (and confusing) to have to navigate to Midjourney's Discord server in order to do text-to-image generations. I paid for, and inevitably canceled, subscriptions to several AI-enhanced editing apps from Adobe Inc. because the capabilities were so narrow and full of frustrating content restrictions. I've resisted buying Apple's latest $1,000 iPhone merely to get native AI photo features already available on my older device through third-party software. By contrast, Gemini can now handle pretty much any image request I have through nano banana. (Brichtova says "nano banana" is a random nickname a tired developer came up with in the middle of the night; the official title is 2.5 Flash Image.) It has easily removed unwanted objects from my photos, altered lighting and text, and completely reimagined the style of my images in endless ways. I even had Gemini restore and colorize some family photographs from the 1940s; my relatives were blown away by the vividness. Other users have discovered incredibly clever ways to tap into nano banana's data savvy, such as asking the AI to show photorealistic views based on Google Maps pinpoints. A family photo, before and after the author used Google's Gemini AI tool. Photographer: Austin Carr/Bloomberg Brichtova says one of the bigger consumer pain points that the team aimed to solve was AI model latency. Anyone who's tried these models knows they often produce images at dial-up-era speeds, a huge source of frustration for people trying to swiftly mock up something at work or in chats with friends. "If you're trying to edit something conversationally and it's really snappy," she says, "it makes for a good user experience." Indeed, the new Gemini usually takes less than five seconds to deliver me an image. In several cases, I copied the same prompt into ChatGPT, and it took more than a minute to handle the processing, with results not all that different from Google's. That makes me question my $20-per-month OpenAI subscription. Another pain point has been visual consistency. Up till now, when you would ask an AI to touch up a photo or do incremental edits, models would often glitch and erroneously alter faces or other objects within the frame. With nano banana, that's much less of an issue, even when it comes to rendering text. Gemini's output isn't perfect—there have been instances where nano banana hallucinated in its photo edits or returned an unchanged image that ignored my prompt. Brichtova says Gemini will continue to improve and is intended to deliver a baseline of quality for most people; if you need more specialized output, there will always be premium services available. The hope is that, just as Flickr proved during the Web 2.0 days and Instagram during the mobile boom, photos will once again be a killer application, this time for AI. "All of these technology shifts tap into some innate human need," Brichtova explains. "Seeing yourself, seeing your family, documenting your life, being able to tell stories about it—new technologies enable those desires in different ways." RELATED BACK-TO-SCHOOL CONTENT: How Chatbots and AI Are Already Transforming Kids' Classrooms |
No comments:
Post a Comment