How Enterprises That Are Serious About Their ESG Goals Should Approach GenAI Adoption

Dec 24, 2024 | By Aharsh MS

Environmental, Social, and Governance (ESG) goals have become a top priority for most large enterprises in recent years. Stakeholders, regulators, and consumers alike expect organizations to not only pursue profits but also demonstrate ethical, environmental, and social responsibility. In response, many companies have introduced a variety of initiatives—such as going paperless, planting trees, and implementing energy-efficient processes—to showcase their commitment to sustainability. These measures are critical, as they highlight a broader shift toward building resilient, socially responsible organizations.

However, a new technological wave is now capturing the attention of business leaders everywhere: Generative AI (GenAI). With capabilities that stretch from automating content creation to revolutionizing customer engagement, GenAI seems poised to reshape entire industries. Yet an important question emerges—how does GenAI’s adoption affect an enterprise’s carbon footprint, and what implications does this have for meeting ESG objectives? 

Powered by large language models, diffusion models, and other computationally heavy engines, GenAI can consume massive amounts of energy, potentially running counter to existing sustainability commitments. In this article, we will delve into the carbon footprint of GenAI, examine how it intersects with ESG priorities, and outline key strategies enterprises can adopt to balance cutting-edge innovation with responsible, sustainable practices.

Growing concern around GenAI’s carbon footprint

Most large AI models are trained, fine-tuned, and run for inference on Graphics Processing Units (GPUs), which require enormous amounts of electricity. Whether an enterprise chooses to build its own GPU-powered infrastructure or rely on third party services like OpenAI or Claude, the underlying reality remains that GenAI models—often boasting billions of parameters—must churn through staggering volumes of data, frequently drawing energy from sources that are neither renewable nor carbon-neutral.

When companies host their own GenAI systems, setting up dedicated GPU clusters and maintaining them at scale involves substantial resource commitments. Aside from the direct electricity costs, ancillary demands like cooling systems and ongoing hardware refreshes can lead to an escalating carbon footprint that may offset any environmental gains the organization has achieved elsewhere. At the other end of the spectrum, relying on cloud-based AI services does not absolve a firm of sustainability responsibilities. Even a simple user query—such as generating an email subject line—can trigger a high-powered GPU inference process. The cumulative effect of thousands or millions of these queries can be profound, leading to indirect but significant carbon emissions that may clash with carefully laid-out ESG targets.

Most existing tools and studies have focused on the electricity required to power hardware (dynamic power consumption) and the associated emissions. However, there have been increasing calls to account for the embodied emissions of ML models—those generated by manufacturing the computing equipment—in carbon footprint estimates. This effort is hampered by limited transparency from designers of common computing hardware, such as GPUs. Nonetheless, recent estimates suggest that the embodied carbon footprint of an LLM trained and deployed on Meta’s compute cluster could represent as much as half of its total carbon emissions.

According to a study from Stanford University, OpenAI’s GPT-4 used an estimated $78 million worth of compute to train, while Google’s Gemini Ultra cost $191 million for compute. Moreover, each new generation of AI technology—such as GPT—demands exponentially more energy. According to the consulting firm Gartner, at this pace, AI could account for up to 3.5% of global electricity consumption by 2030. This figure matches the combined electricity use of agriculture and forestry, and is roughly twice the total consumption of France.

Such high energy usage contributes significantly to climate change. In the United States, where many AI servers are located, 16% of electricity is still generated from coal—the most carbon-intensive energy source. As a result, training GPT-3 alone produced around 502 tonnes of CO₂ emissions, equivalent to driving a car to the moon and back.

Carbon footprint of LLMs vs SLMs

A recent study found that multi-purpose, generative architectures are orders of magnitude more expensive than task-specific systems for a variety of tasks. Task-specific models consistently use less energy than multi-purpose and generative models for discriminative tasks like sentiment analysis and question answering, sometimes by orders of magnitude. For generative tasks such as summarization, the emissions gap narrows, but larger multi-purpose models still consume more energy than their smaller, task-specific counterparts. While multi-purpose models often maintain higher accuracy, their broader output space and significantly larger parameter sizes drive up their carbon footprint.

The graph above shows findings from a recent research that compares the CO2 emissions of task specific SLMs and general purpose LLMs. The authors compare the mean query emissions of three kinds of models—task-specific, sequence-to-sequence multi-purpose, and decoder-only generative—across different NLP tasks, measured in grams of CO₂-equivalent (CO₂eq) per 1,000 inferences. For discriminative tasks like sentiment analysis and extractive question answering, task-specific models emit dramatically less carbon than their multi-purpose and generative counterparts—often by several orders of magnitude. For instance, on SciQ (a QA dataset), task-specific models emit about 0.3g of CO₂eq per 1,000 queries, whereas larger multi-purpose models can emit around 10g for the same workload.

For the generative task of summarization task-specific and multi-purpose models are somewhat closer in terms of emissions, though multi-purpose models still consume notably more energy. Here, the disparity stems from model size: all task-specific summarization models studied are at most 600 million parameters, whereas the multi-purpose architectures can reach up to 11 billion parameters. 

From the table above, it is evident that Multi-purpose models differ considerably in their emissions based on both model size and architecture. Sequence-to-sequence models (Flan-T5) generally emit less carbon than decoder-only models (BLOOMz) of comparable size—e.g., Flan-T5-XL vs. BLOOMz-3B—and this trend even holds when comparing larger Flan-T5 variants to smaller BLOOMz models. Across tasks, summarization is most emissions-intensive, whereas text classification is least demanding, with question answering in between. Sequence-to-sequence models also show a smaller performance gap across tasks, while decoder-only models exhibit more variation in emissions.

The table above shows the amount of energy required by different models or different parameter sizes. A significant insight from these findings is that, even though training (and fine-tuning) a model consumes a large amount of energy, the cumulative energy from inferences can quickly match or exceed that initial cost—especially when many users or multiple copies of the model are deployed. For smaller models like BLOOMz-560M, about 200 million inferences are needed to reach “energy parity” with training, while for the largest models (e.g., BLOOMz-7B), that figure climbs to over 590 million. Real-world usage scenarios, such as ChatGPT’s millions of daily users, show how this threshold can be crossed in just weeks or months, emphasizing the substantial and ongoing energy impact of large-scale model deployment.

Strategies for ESG Goals-Friendly GenAI Adoption

Implementing GenAI in a way that aligns with ESG commitments demands a careful selection of architectures, hardware, and deployment methods. By optimizing for efficiency at every stage—development, training, and inference—companies can substantially shrink their carbon footprint while still harnessing the benefits of advanced AI.

One of the most impactful shifts is to adopt CPU-based infrastructure for GenAI workloads. CPUs typically consume less power than the high-performance GPUs or TPUs commonly used for large model training and inference. By leveraging modern CPUs with efficient libraries, enterprises can reduce energy consumption without necessarily compromising on performance, especially for tasks that do not require real-time responses.

Another way to trim energy use is to employ domain-specific Small Language Models (SLMs) rather than massive, general-purpose LLMs. These compact models focus on specialized tasks, cutting down on the extensive data requirements and computational overhead associated with larger architectures. As the AI research community continues to innovate, more efficient model architectures like liquid models, SSM models, Bitnet models, Hymba, and Jamba are emerging. These next-generation techniques optimize compute, memory, power, and bandwidth utilization, making them ideal for ESG-minded organizations seeking to reduce their environmental impact.

Inference-time optimizations can further improve efficiency and performance. Techniques such as Model Order Adaptation (MOA), Swarm of Models, or Textgrad can help models achieve near-LLM-level accuracy while leveraging SLMs. By fine-tuning how and when inference occurs, companies minimize unnecessary processing cycles, aligning more closely with their sustainability targets.

In addition, enterprises can adopt a hybrid approach that combines cloud-based LLMs with client-side or edge-based SLMs. This federation of inference tasks balances workloads across multiple devices—some queries may be handled locally on efficient, specialized models, while more complex requests can be routed to cloud servers when necessary. The end result is reduced bandwidth usage, lower energy costs, and enhanced resilience, all of which directly support ESG objectives.

Companies seeking to refine or personalize AI solutions may also benefit from recent advancements in model merging, which can sometimes match or surpass the performance of full fine-tuning. These merger techniques can be executed on consumer-grade CPUs, allowing for targeted, low-power modifications that align with specific business needs and reduce the overhead of repeatedly training entire models.

Finally, organizations should reevaluate whether all GenAI applications require real-time interaction. A substantial portion of AI tasks—such as large-scale data analysis or content generation—can be handled through batch processing, which can be scheduled during off-peak hours and run on Xeon or equivalent CPUs. By batching requests, companies can drastically improve their total cost of ownership and return on investment, minimizing their carbon footprint while meeting customer needs in a timely but more sustainable manner.

Collectively, these strategies offer a roadmap for ESG-focused enterprises to adopt GenAI responsibly. From choosing power-conscious hardware and leaning on smaller, specialized models, to exploring modern architectures and thoughtful deployment methods, each step helps ensure that the pursuit of AI-driven innovation does not come at the expense of the environment or broader societal goals.

Aharsh MS
Head of Brand and Partnerships @ Bud Ecosystem

Tech entrepreneur, Marketing & Growth strategist. Independent researcher in my free time. Keynote speaker at tech conferences.

Related Blogs

Optimising Cost Efficiency in LLM Serving Using Heterogeneous Hardware Inferencing
Optimising Cost Efficiency in LLM Serving Using Heterogeneous Hardware Inferencing

Summary: The current industry practice of deploying GenAI-based solutions relies solely on high-end GPU infrastructure. However, several analyses have uncovered that this approach leads to resource wastage, as high-end GPUs are used for inference tasks that could be handled by a CPU or a commodity GPU at a much lower cost. Bud Runtime’s heterogeneous inference […]

Exploring Transformed Multi-Head Latent Attention for Cost-Effective Enterprise GenAI
Exploring Transformed Multi-Head Latent Attention for Cost-Effective Enterprise GenAI

Deepseek’s latest innovation, R1, marks a significant milestone in the GenAI market. The company has achieved performance comparable to OpenAI’s o1, yet claims to have done so at a much lower training cost—a major breakthrough for the industry. However, with 671 billion parameters, R1 remains too large for cost-effective enterprise deployment. While impressive, such massive […]

SLMs fine-tuned like DeepSeek’s R1 + Bud Inference = Most Cost-effective Enterprise GenAI
SLMs fine-tuned like DeepSeek’s R1 + Bud Inference = Most Cost-effective Enterprise GenAI

The recent launch of DeepSeek’s R1 model has made waves in the AI industry—not just for its technological advancements but also for its wider market impact, including a drop in tech stock valuations. However, those who have been closely following the GenAI space knew this moment was inevitable. For the past one and a half […]

Introducing Maxwell TCS v0.2: A Lightweight SOTA Model for Prompt Complexity Scoring
Introducing Maxwell TCS v0.2: A Lightweight SOTA Model for Prompt Complexity Scoring

We are excited to announce the open-source release of Maxwell Task Complexity Scorer v0.2, a breakthrough in efficient instruction complexity scoring. Maxwell represents a significant advancement in task complexity analysis, offering State-of-the-Art performance in a remarkably efficient package. Maxwell leverages a ModernBERT-Large backbone to deliver sophisticated complexity scoring while maintaining exceptional efficiency. With a dense […]