Google’s attempting to make waves with Gemini, its flagship suite of generative AI fashions, apps, and providers. But what’s Gemini? How can you employ it? And how does it stack as much as different generative AI instruments resembling OpenAI’s ChatGPT, Meta’s Llama, and Microsoft’s Copilot?
To make it simpler to maintain up with the most recent Gemini developments, we’ve put collectively this helpful information, which we’ll maintain up to date as new Gemini fashions, options, and information about Google’s plans for Gemini are launched.
What is Gemini?
Gemini is Google’s long-promised, next-gen generative AI mannequin household. Developed by Google’s AI analysis labs DeepThoughts and Google Research, it is available in 4 flavors:
- Gemini Ultra, a really massive mannequin.
- Gemini Pro, a big mannequin – although smaller than Ultra. The newest model, Gemini 2.0 Pro Experimental, is Google’s flagship.
- Gemini Flash, a speedier, “distilled” model of Pro. It additionally is available in a barely smaller and quicker model, referred to as Gemini Flash-Lite, and a model with reasoning capabilities, referred to as Gemini Flash Thinking Experimental.
- Gemini Nano, two small fashions: Nano-1 and the marginally extra succesful Nano-2, which is supposed to run offline
All Gemini fashions had been educated to be natively multimodal — that’s, in a position to work with and analyze extra than simply textual content. Google says they had been pre-trained and fine-tuned on quite a lot of public, proprietary, and licensed audio, photos, and movies; a set of codebases; and textual content in numerous languages.
This units Gemini aside from fashions resembling Google’s personal LaMDA, which was educated solely on textual content information. LaMDA can’t perceive or generate something past textual content (e.g., essays, emails, and so forth), however that isn’t essentially the case with Gemini fashions.
We’ll be aware right here that the ethics and legality of coaching fashions on public information, in some instances with out the info homeowners’ information or consent, are murky. Google has an AI indemnification coverage to protect sure Google Cloud prospects from lawsuits ought to they face them, however this coverage accommodates carve-outs. Proceed with warning — notably when you’re intending on utilizing Gemini commercially.
What’s the distinction between the Gemini apps and Gemini fashions?
Gemini is separate and distinct from the Gemini apps on the internet and cell (previously Bard).
The Gemini apps are purchasers that join to numerous Gemini fashions and layer a chatbot-like interface on prime. Think of them as entrance ends for Google’s generative AI, analogous to ChatGPT and Anthropic’s Claude household of apps.
Gemini on the internet lives right here. On Android, the Gemini app replaces the prevailing Google Assistant app. And on iOS, the Google and Google Search apps function that platform’s Gemini purchasers.
On Android, it additionally just lately grew to become attainable to convey up the Gemini overlay on prime of any app to ask questions on what’s on the display (e.g., a YouTube video). Just press and maintain a supported smartphone’s energy button or say, “Hey Google”; you’ll see the overlay pop up.
Gemini apps can settle for photos in addition to voice instructions and textual content — together with recordsdata like PDFs and shortly movies, both uploaded or imported from Google Drive — and generate photos. As you’d count on, conversations with Gemini apps on cell carry over to Gemini on the internet and vice versa when you’re signed in to the identical Google Account in each locations.
Gemini Advanced
The Gemini apps aren’t the one technique of recruiting Gemini fashions’ help with duties. Slowly however certainly, Gemini-imbued options are making their approach into staple Google apps and providers like Gmail and Google Docs.
To make the most of most of those, you’ll want the Google One AI Premium Plan. Technically part of Google One, the AI Premium Plan prices $20 and offers entry to Gemini in Google Workspace apps like Docs, Maps, Slides, Sheets, Drive, and Meet. It additionally permits what Google calls Gemini Advanced, which brings the corporate’s extra refined Gemini fashions to the Gemini apps.
Gemini Advanced customers get extras right here and there, too, like precedence entry to new options, the power to run and edit Python code instantly in Gemini, and a bigger “context window.” Gemini Advanced can keep in mind the content material of — and cause throughout — roughly 750,000 phrases in a dialog (or 1,500 pages of paperwork). That’s in comparison with the 24,000 phrases (or 48 pages) the vanilla Gemini app can deal with.

Gemini Advanced additionally provides customers entry to Google’s Deep Research function, which makes use of “superior reasoning” and “lengthy context capabilities” to generate analysis briefs. After you immediate the chatbot, it creates a multi-step analysis plan, asks you to approve it, after which Gemini takes a couple of minutes to go looking the net and generate an intensive report primarily based in your question. It’s meant to reply extra complicated questions resembling, “Can you assist me redesign my kitchen?”
Google additionally gives Gemini Advanced customers a reminiscence function, that permits the chatbot to make use of your outdated conversations with Gemini as context on your present dialog. Gemini Advanced customers additionally get elevated utilization for NotebookLM, the corporate’s product that turns PDFs into AI-generated podcasts.
Gemini Advanced customers additionally get entry to Google’s experimental model of Gemini 2.0 Pro, the corporate’s flagship mannequin that’s optimized for tough coding and math issues.
Another Gemini Advanced unique is journey planning in Google Search, which creates customized journey itineraries from prompts. Taking into consideration issues like flight occasions (from emails in a consumer’s Gmail inbox), meal preferences, and details about native sights (from Google Search and Maps information), in addition to the distances between these sights, Gemini will generate an itinerary that updates mechanically to mirror any modifications.
Gemini throughout Google providers can also be accessible to company prospects via two plans, Gemini Business (an add-on for Google Workspace) and Gemini Enterprise. Gemini Business prices as little as $6 per consumer per thirty days, whereas Gemini Enterprise — which provides assembly note-taking and translated captions in addition to doc classification and labeling — is mostly dearer, however is priced primarily based on a enterprise’s wants. (Both plans require an annual dedication.)
In Gmail, Gemini lives in a facet panel that may write emails and summarize message threads. You’ll discover the identical panel in Docs, the place it helps you write and refine your content material and brainstorm new concepts. Gemini in Slides generates slides and customized photos. And Gemini in Google Sheets tracks and organizes information, creating tables and formulation.
Google’s AI chatbot just lately got here to Maps, the place Gemini can summarize opinions about espresso retailers or supply suggestions about methods to spend a day visiting a international metropolis.
Gemini’s attain extends to Drive as effectively, the place it will probably summarize recordsdata and folders and provides fast details a couple of challenge. In Meet, in the meantime, Gemini interprets captions into further languages.

Gemini just lately got here to Google’s Chrome browser within the type of an AI writing software. You can use it to write down one thing fully new or rewrite current textual content; Google says it’ll contemplate the net web page you’re on to make suggestions.
Elsewhere, you’ll discover hints of Gemini in Google’s database merchandise, cloud safety instruments, and app improvement platforms (together with Firebase and Project IDX), in addition to in apps like Google Photos (the place Gemini handles pure language search queries), YouTube (the place it helps brainstorm video concepts), and the NotebookLM note-taking assistant.
Code Assist (previously Duet AI for Developers), Google’s suite of AI-powered help instruments for code completion and era, is offloading heavy computational lifting to Gemini. So are Google’s safety merchandise underpinned by Gemini, like Gemini in Threat Intelligence, which might analyze massive parts of doubtless malicious code and let customers carry out pure language searches for ongoing threats or indicators of compromise.
Gemini extensions and Gems
Announced at Google I/O 2024, Gemini Advanced customers can create Gems, customized chatbots powered by Gemini fashions. Gems could be generated from pure language descriptions — for instance, “You’re my working coach. Give me a each day working plan” — and shared with others or saved non-public.
Gems can be found on desktop and cell in 150 international locations and most languages. Eventually, they’ll be capable to faucet an expanded set of integrations with Google providers, together with Google Calendar, Tasks, Keep, and YouTube Music, to finish customized duties.

Speaking of integrations, the Gemini apps on the internet and cell can faucet into Google providers by way of what Google calls “Gemini extensions.” Gemini right now integrates with Google Drive, Gmail, and YouTube to reply to queries resembling “Could you summarize my final three emails?” Later this yr, Gemini will be capable to take further actions with Google Calendar, Keep, Tasks, YouTube Music and Utilities, the Android-exclusive apps that management on-device options like timers and alarms, media controls, the flashlight, quantity, Wi-Fi, Bluetooth, and so forth.
Gemini Live in-depth voice chats
An expertise referred to as Gemini Live permits customers to have “in-depth” voice chats with Gemini. It’s accessible within the Gemini apps on cell and the Pixel Buds Pro 2, the place it may be accessed even when your telephone’s locked.
With Gemini Live enabled, you possibly can interrupt Gemini whereas the chatbot’s talking (in one in all a number of new voices) to ask a clarifying query, and it’ll adapt to your speech patterns in actual time. At some level, Gemini is meant to realize visible understanding, permitting it to see and reply to your environment, both by way of pictures or video captured by your smartphones’ cameras.

Live can also be designed to function a digital coach of kinds, serving to you rehearse for occasions, brainstorm concepts, and so forth. For occasion, Live can counsel which expertise to focus on in an upcoming job or internship interview, and it may give public talking recommendation.
You can learn our assessment of Gemini Live right here. Spoiler alert: We assume the function has a methods to go earlier than it’s tremendous helpful — nevertheless it’s early days, admittedly.
Image era by way of Imagen 3
Gemini customers can generate art work and pictures utilizing Google’s built-in Imagen 3 mannequin.
Google says that Imagen 3 can extra precisely perceive the textual content prompts that it interprets into photos versus its predecessor, Imagen 2, and is extra “artistic and detailed” in its generations. In addition, the mannequin produces fewer artifacts and visible errors (a minimum of in line with Google), and is one of the best Imagen mannequin but for rendering textual content.

Back in February 2024, Google was pressured to pause Gemini’s capacity to generate photos of individuals after customers complained of historic inaccuracies. But in August, the corporate reintroduced folks era for sure customers, particularly English-language customers signed up for one in all Google’s paid Gemini plans (e.g., Gemini Advanced) as a part of a pilot program.
Gemini for teenagers
In June, Google launched a teen-focused Gemini expertise, permitting college students to enroll by way of their Google Workspace for Education faculty accounts.
The teen-focused Gemini has “further insurance policies and safeguards,” together with a tailor-made onboarding course of and an “AI literacy information” to (as Google phrases it) “assist teenagers use AI responsibly.” Otherwise, it’s almost similar to the usual Gemini expertise, right down to the “double verify” function that appears throughout the net to see if Gemini’s responses are correct.
Gemini in good dwelling gadgets
A rising variety of Google-made gadgets faucet Gemini for enhanced performance, from the Google TV Streamer to the Pixel 9 and 9 Pro to the most recent Nest Learning Thermostat.
On the Google TV Streamer, Gemini makes use of your preferences to curate content material ideas throughout your subscriptions and summarize opinions and even complete seasons of TV.

On the most recent Nest thermostat (in addition to Nest audio system, cameras, and good shows), Gemini will quickly bolster Google Assistant’s conversational and analytic capabilities.
Subscribers to Google’s Nest Aware plan later this yr will get a preview of recent Gemini-powered experiences like AI descriptions for Nest digicam footage, pure language video search and advisable automations. Nest cameras will perceive what’s taking place in real-time video feeds (e.g., when a canine’s digging within the backyard), whereas the companion Google Home app will floor movies and create gadget automations given an outline (e.g., “Did the youngsters go away their bikes within the driveway?,” “Have my Nest thermostat activate the heating once I get dwelling from work each Tuesday”).

Also later this yr, Google Assistant will get just a few upgrades on Nest-branded and different good dwelling gadgets to make conversations really feel extra pure. Improved voices are on the best way, along with the power to ask follow-up questions and “[more] simply trip.”
What can the Gemini fashions do?
Because Gemini fashions are multimodal, they will carry out a spread of multimodal duties, from transcribing speech to captioning photos and movies in actual time. Many of those capabilities have reached the product stage (as alluded to within the earlier part), and Google is promising far more within the not-too-distant future.
Of course, it’s a bit onerous to take the corporate at its phrase. Google critically underdelivered with the unique Bard launch. More just lately, it ruffled feathers with a video purporting to indicate Gemini’s capabilities that was roughly aspirational — not reside.
Also, Google gives no repair for among the underlying issues with generative AI tech right now, like its encoded biases and tendency to make issues up (i.e., hallucinate). Neither do its rivals, nevertheless it’s one thing to remember when contemplating utilizing or paying for Gemini.
Assuming for the needs of this text that Google is being truthful with its latest claims, right here’s what the completely different tiers of Gemini can do now and what they’ll be capable to do as soon as they attain their full potential:
What you are able to do with Gemini Ultra
Google says that Gemini Ultra — due to its multimodality — can be utilized to assist with issues like physics homework, fixing issues step-by-step on a worksheet, and mentioning attainable errors in already filled-in solutions.
However, we haven’t seen a lot of Gemini Ultra in latest months. The mannequin doesn’t seem within the Gemini app, and isn’t listed on Google Gemini’s API pricing web page. However, that doesn’t imply Google received’t convey Gemini Ultra again to the forefront of its choices sooner or later.
Ultra can be utilized to duties resembling figuring out scientific papers related to an issue, Google says. The mannequin can extract data from a number of papers, as an example, and replace a chart from one by producing the formulation essential to re-create the chart with extra well timed information.
Gemini Ultra technically helps picture era. But that functionality hasn’t made its approach into the productized model of the mannequin but — maybe as a result of the mechanism is extra complicated than how apps resembling ChatGPT generate photos. Rather than feed prompts to a picture generator (like DALL-E 3, in ChatGPT’s case), Gemini outputs photos “natively,” with out an middleman step.
Ultra is offered as an API via Vertex AI, Google’s absolutely managed AI dev platform, and AI Studio, Google’s web-based software for app and platform builders.
Gemini Pro’s capabilities
Google says that its newest Pro mannequin, Gemini 2.0 Pro, is its greatest mannequin but for coding efficiency and complicated prompts. It’s at present accessible as an experimental model, which means it will probably have sudden points.
Gemini 2.0 Pro outperforms its predecessor, Gemini 1.5 Pro, in benchmarks measuring coding, reasoning, math, and factual accuracy. The mannequin can soak up as much as 1.4 million phrases, two hours of video, or 22 hours of audio and may cause throughout or reply questions on that information (roughly).
However, Gemini 1.5 Pro nonetheless powers Google’s Deep Research function.
Gemini 2.0 Pro works alongside a function referred to as code execution, launched in June alongside Gemini 1.5 Pro, which goals to scale back bugs in code that the mannequin generates by iteratively refining that code over a number of steps. (Code execution additionally helps Gemini Flash.)
Within Vertex AI, builders can customise Gemini Pro to particular contexts and use instances by way of a fine-tuning or “grounding” course of. For instance, Pro (together with different Gemini fashions) could be instructed to make use of information from third-party suppliers like Moody’s, Thomson Reuters, ZoomInfo and MSCI, or supply data from company datasets or Google Search as a substitute of its wider information financial institution. Gemini Pro can be linked to exterior, third-party APIs to carry out specific actions, like automating a back-office workflow.
AI Studio gives templates for creating structured chat prompts with Pro. Developers can management the mannequin’s artistic vary and supply examples to offer tone and magnificence directions — and likewise tune Pro’s security settings.
Vertex AI Agent Builder lets folks construct Gemini-powered “brokers” inside Vertex AI. For instance, an organization may create an agent that analyzes earlier advertising campaigns to know a model model after which apply that information to assist generate new concepts in keeping with the model.
Gemini Flash is lighter however packs a punch
Google calls Gemini 2.0 Flash its AI mannequin for the agentic period. The mannequin can natively generate photos and audio, along with textual content, and may use instruments like Google Search and work together with exterior APIs.
The 2.0 Flash mannequin is quicker than Gemini’s earlier era of fashions and even outperforms among the bigger Gemini 1.5 fashions on benchmarks measuring coding and picture evaluation. You can attempt Gemini 2.0 Flash within the Gemini internet or cell app, and thru Google’s AI developer platforms.
In December, Google launched a “pondering” model of Gemini 2.0 Flash that’s able to “reasoning,” through which the AI mannequin takes just a few seconds to work backwards via an issue earlier than it provides a solution.
In February, Google made Gemini 2.0 Flash pondering accessible within the Gemini app. The identical month, Google additionally launched a smaller model referred to as Gemini 2.0 Flash-Lite. The firm says this mannequin outperforms its Gemini 1.5 Flash mannequin, however runs on the identical worth and pace.
An offshoot of Gemini Pro that’s small and environment friendly, constructed for slender, high-frequency generative AI workloads, Flash is multimodal like Gemini Pro, which means it will probably analyze audio, video, photos, and textual content (however it will probably solely generate textual content). Google says that Flash is especially well-suited for duties like summarization and chat apps, plus picture and video captioning and information extraction from lengthy paperwork and tables.
Devs utilizing Flash and Pro can optionally leverage context caching, which lets them retailer massive quantities of data (e.g., a information base or database of analysis papers) in a cache that Gemini fashions can rapidly and comparatively cheaply entry. Context caching is an extra price on prime of different Gemini mannequin utilization charges, nonetheless.
Gemini Nano can run in your telephone
Gemini Nano is a a lot smaller model of the Gemini Pro and Ultra fashions, and it’s environment friendly sufficient to run instantly on (some) gadgets as a substitute of sending the duty to a server someplace. So far, Nano powers a few options on the Pixel 8 Pro, Pixel 8, Pixel 9 Pro, Pixel 9 and Samsung Galaxy S24, together with Summarize in Recorder and Smart Reply in Gboard.
The Recorder app, which lets customers push a button to file and transcribe audio, features a Gemini-powered abstract of recorded conversations, interviews, displays, and different audio snippets. Users get summaries even when they don’t have a sign or Wi-Fi connection — and in a nod to privateness, no information leaves their telephone in course of.

Nano can also be in Gboard, Google’s keyboard substitute. There, it powers a function referred to as Smart Reply, which helps to counsel the subsequent factor you’ll need to say when having a dialog in a messaging app resembling WhatsApp.
In the Google Messages app on supported gadgets, Nano drives Magic Compose, which might craft messages in types like “excited,” “formal,” and “lyrical.”
Google says {that a} future model of Android will faucet Nano to alert customers to potential scams throughout calls. The new climate app on Pixel telephones makes use of Gemini Nano to generate tailor-made climate stories. And DiscussBack, Google’s accessibility service, employs Nano to create aural descriptions of objects for low-vision and blind customers.
How a lot do the Gemini fashions price?
Gemini 1.5 Pro, 1.5 Flash, 2.0 Flash, and a couple of.0 Flash-Lite can be found via Google’s Gemini API for constructing apps and providers — all with free choices. But the free choices impose utilization limits and pass over sure options, like context caching and batching.
Gemini fashions are in any other case pay-as-you-go. Here’s the bottom pricing — not together with add-ons like context caching — as of September 2024:
- Gemini 1.5 Pro: $1.25 per 1 million enter tokens (for prompts as much as 128K tokens) or $2.50 per 1 million enter tokens (for prompts longer than 128K tokens); $5 per 1 million output tokens (for prompts as much as 128K tokens) or $10 per 1 million output tokens (for prompts longer than 128K tokens)
- Gemini 1.5 Flash: 7.5 cents per 1 million enter tokens (for prompts as much as 128K tokens), 15 cents per 1 million enter tokens (for prompts longer than 128K tokens), 30 cents per 1 million output tokens (for prompts as much as 128K tokens), 60 cents per 1 million output tokens (for prompts longer than 128K tokens)
- Gemini 2.0 Flash: 10 cents per 1 million enter tokens, 40 cents per 1 million output tokens. For audio particularly, it prices 70 heart per 1 million enter tokens, and likewise 40 facilities per 1 million output tokens.
- Gemini 2.0 Flash-Lite: 7.5 cents per 1 million enter tokens, 30 cents per 1 million output tokens.
Tokens are subdivided bits of uncooked information, just like the syllables “fan,” “tas,” and “tic” within the phrase “implausible”; 1 million tokens is equal to about 700,000 phrases. Input refers to tokens fed into the mannequin, whereas output refers to tokens that the mannequin generates.
2.0 Pro pricing has but to be introduced, and Nano remains to be in early entry.
What’s the most recent on Project Astra?
Project Astra is Google DeepThoughts’s effort to create AI-powered apps and “brokers” for real-time, multimodal understanding. In demos, Google has proven how the AI mannequin can concurrently course of reside video and audio. Google launched an app model of Project Astra to a small variety of trusted testers in December however has no plans for a broader launch proper now.
The firm wish to put Project Astra in a pair of good glasses. Google additionally gave a prototype of some glasses with Project Astra and augmented actuality capabilities to a couple trusted testers in December. However, there’s not a transparent product at the moment, and it’s unclear when Google would really launch one thing like this.
Project Astra remains to be simply that, a challenge, and never a product. However, the demos of Astra reveal what Google would really like its AI merchandise to do sooner or later.
Is Gemini coming to the iPhone?
It may.
Apple has mentioned that it’s in talks to place Gemini and different third-party fashions to make use of for plenty of options in its Apple Intelligence suite. Following a keynote presentation at WWDC 2024, Apple SVP Craig Federighi confirmed plans to work with fashions, together with Gemini, however he didn’t expose any further particulars.
This put up was initially revealed February 16, 2024, and is up to date commonly.