Amazon Web Services (AWS) bolstered its artificial intelligence (AI) offerings last week, seeking to articulate its capabilities in the market to enterprises amid high-profile but consumer-focussed announcements from Microsoft and Google, which are powered by Azure and Google Cloud Platform respectively.
Amazon Bedrock brings a range of foundational models to AWS, debuting with models from AI21 Labs, Anthropic and Stability AI, as well as ones developed by Amazon. The service enables practitioners to customize foundational models with additional data and integrate them on existing AWS workflows.
Newly-announced models from Amazon include Titan Text, a large language model focussed on text generation, summarization and classification. There’s also Titan Embeddings, which translates text into numerical embeddings, aiding in search and personalization. AWS touts the Titan models as being focussed on rejecting inappropriate input from users and proactively eliminating harmful output. Amazon Bedrock and Amazon Titan are available today in limited preview form. It isn’t clear how large the original models in Amazon Titan are.
Custom Silicon Is a Big Differentiator for AI on AWS
Model training is a particularly expensive endeavour for those building their own models. The comparatively power-hungry Nvidia GPUs that are typically used for model training are expensive to operate, even leaving aside procurement cost. Similarly to the Graviton series of server-grade Arm CPUs, AWS designs its Trainium and Inferentia series of custom chips specifically for model training and inference.
The new, network-optimized Trn1n instance doubles bandwidth to 1,600 Gbps. AWS claims this will deliver 20% higher performance than the current Trn1 instance. Trainium natively supports PyTorch and TensorFlow as part of the AWS Neuron SDK, which minimizes the amount of porting needed to use the alternative silicon. But AWS still has precious few reference customers for Trainium despite this ease of setting up, only citing the Amazon Search team and two start-ups.
The latest Inferentia chip, introduced at AWS re:Invent 2022, is now generally available to customers in Inf2 instances (clients can click here for more about re:Invent 2022). The new instance supports large language models with up to 175 billion parameters. AWS boasts up to 40% better performance per watt than GPU-based instances, claiming the lowest cost per inference on Amazon EC2.
Inferentia2 similarly supports distributed inference, allowing parallel workloads on multiple chips and using direct 192GB-per-second connectivity between multiple accelerators. AWS offers clusters of up to 12 Inferentia2 accelerators, paired with a total of up to 384GB of HBM2e memory. The strategic conceit is that as AI adoption grows among enterprises, dedicated silicon for inference will drive savings for routine tasks such as processing inbound queries to Alexa.
Amazon CodeWhisperer Brings Competition to GitHub Copilot
The prospect of aiding programmers is one that’s received significant attention following the announcement of GitHub Copilot in June 2021. Amazon CodeWhisperer, AWS’ answer to this, is now generally available. Although the professional version matches pricing for Copilot for Business at $20 per month, Amazon offers a free tier for individuals without requiring an AWS account. The tool is available for VS Code, IntelliJ IDEA and PyCharm among others, and adds support for languages including Python, JavaScript, C++, SQL and TypeScript.
Amazon CodeWhisperer features automated security vulnerability scanning as a differentiator, finding and suggesting remediations for vulnerabilities in code or configurations that don’t match best practices for implementing cryptography — finally offering a solution to the truism of “Don’t roll your own crypto”. Although security is an evergreen concern, the more interesting attribute is an ability that “can filter and flag code suggestions that resemble open-source code that customers may want to reference or license for use”.
The licensing implications of training models on open-source code — which is used to write source code that isn’t necessarily open-source — are frequently raised in discussions of AI-assisted programming. The potential for AI-assisted programming to be the centre of litigation over a source code’s provenance, such as AI being used to launder open-source code into a closed-source program, is relatively high. However, this feature in CodeWhisperer could serve as a defence to such claims.
Cloud Is Domain of Model Training, But Inference Is Everywhere
The rapid rate of progress in model development, and the generational improvement of hardware used in model training, makes cloud platforms a sensible destination for training workloads. This is evident from Nvidia’s increasing overtures to providing its hardware-as-a-cloud service. Although additional computing power is needed for inference — particularly to add new functionality to existing devices — the prospect of local use for AI is a long-term reality, something we’ve extensively explored in a recent blog post.
And of course, we cover the latest AI developments in detail for our clients, so if you’d like to know more, please do get in touch.