Skip to main content

Command Palette

Search for a command to run...

Connecting Kong AI Gateway to Amazon Bedrock: Authentication and Inference Profiles

Updated
8 min read
Connecting Kong AI Gateway to Amazon Bedrock: Authentication and Inference Profiles

Kong AI Gateway gives you a lot of room when you connect it to Amazon Bedrock. There is no single way to authenticate and no single way to address a model: the ai-proxy-advanced plugin exposes several options for each, so you can match whatever your AWS environment and security posture require. That flexibility is the strength, but it also means you have to know which option goes where. The authentication options in particular run a wide span: at one end Kong stores a static secret you rotate by hand, at the other it holds no Bedrock credentials at all and signs with an identity your platform supplies at runtime.

The options sort into two independent groups. The first is authentication: how Kong proves to AWS that it is allowed to call Bedrock. There are three credential paths here. The second is model selection: which model, in which region, billed to whose budget. On Bedrock that increasingly means choosing an inference profile, which is not an auth mechanism. The two groups live in different parts of the config, and keeping them separate is what makes the range of options manageable instead of confusing.

This post walks the authentication options first, then the inference-profile options, and ends with the config that combines them for a common production setup: an inference profile, on EKS, with no AWS keys anywhere.

Where each decision lives

Everything you configure for Bedrock falls into one of three places in an ai-proxy-advanced target:

  • auth holds your credentials, and only two AWS fields exist here: aws_access_key_id and aws_secret_access_key. Both are referenceable and encrypted, so they can come from a vault rather than plain config. If you leave this block out entirely, Kong falls back to the AWS credential chain (more on that shortly).

  • model.options.bedrock holds everything about where and as whom: aws_region, and the role-assumption fields aws_assume_role_arn, aws_role_session_name, and aws_sts_endpoint_url.

  • model.name (paired with llm_format) is where you select the inference profile, by writing either a region-prefixed model ID or a profile ARN.

One more field sits at the target level: route_type, the operation the model implements. It spans chat, completions, and embeddings through to image, audio, and video operations; every example here uses llm/v1/chat.

The thing to hold onto: authentication is one corner of the config, and the inference profile is a different corner. No matter which path you take, the end result is the same, a request signed with AWS Signature V4, a per-request signature AWS verifies rather than a static token it looks up. The only variable is where the signing credentials come from.

Authentication: three credential paths

Kong can get the credentials it signs with in three ways. They differ only in where the secret lives and how long it lasts: a secret you store and rotate yourself, a secret you store but let AWS rotate, or no secret in Kong at all.

Path 1: static keys. Put referenceable values in the auth block and Kong signs Bedrock requests directly with a long-lived IAM user's access key and secret.

plugins:
  - name: ai-proxy-advanced
    config:
      targets:
        - route_type: llm/v1/chat
          auth:
            aws_access_key_id: "{vault://hcv/aws_access_key_id}"
            aws_secret_access_key: "{vault://hcv/aws_secret_access_key}"
          model:
            provider: bedrock
            name: meta.llama3-70b-instruct-v1:0
            options:
              bedrock:
                aws_region: us-east-1

This is the simplest path and the one to avoid in production. A static key never expires on its own, so it is the credential most likely to leak and the one you have to rotate by hand.

Path 2: static keys plus assume-role. Add aws_assume_role_arn under model.options.bedrock. Kong authenticates with the static keys, calls AWS STS to assume the named role, and gets back temporary credentials that include a session token. It then signs Bedrock with those. This is how you let Kong reach a dedicated Bedrock role from a different base identity, and the temporary credentials rotate on their own.

Path 3: no auth block at all. Leave auth out and Kong's RemoteCredentials provider falls through to the standard AWS credential chain. On EKS this is the one to reach for. If you run with EKS Pod Identity, the Pod Identity Agent injects AWS_CONTAINER_CREDENTIALS_FULL_URI and AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE into the pod; Kong calls that endpoint, receives temporary credentials with a session token, and signs. No keys in config, no secret in a vault, nothing to rotate. The same chain also covers IRSA and the EC2 instance role.

Path Where the secret lives Expires on its own Best for
Static keys Vault or config No, manual rotation Quick tests, no workload identity available
Keys + assume-role Vault, plus a target role Yes, STS rotates Crossing accounts or into a scoped Bedrock role
No auth block Nowhere (injected at runtime) Yes, the platform refreshes Production on EKS, ECS, or EC2

One nuance worth calling out, because it trips people up. Kong produces session tokens itself: path 2 mints them through assume-role, path 3 gets them from the credential chain, and both sign correctly. What Kong has no field for is ingesting a session token you already hold. If your platform mints STS triples externally (for example, a sidecar that writes an access key, secret, and session token to a file), there is nowhere in the auth block to feed that third value, because as of Kong Gateway 3.14.0.6 it accepts only the key and secret. First-class session-token support is being worked on. In the meantime, if a workload identity is available, the cleaner answer is path 3: let Kong obtain its own temporary credentials.

Last, a permissions reminder that bites people the first time they use a profile. Invoking through an inference profile touches two resources, the profile and the underlying model, so whatever identity Kong ends up signing as (the IAM user, the assumed role, or the Pod Identity role) needs bedrock:InvokeModel allowed on both the profile ARN and the foundation-model ARN.

Inference profiles: a separate axis

Once Kong can authenticate, the second question is which model it actually calls, and on Bedrock that increasingly means choosing an inference profile rather than a bare model ID. An inference profile is not a credential. It is an alias over a model and a set of regions that does three jobs: it spreads requests across regions for throughput, it is the required entry point for newer models that AWS only publishes through a profile, and it carries tags so you can split the Bedrock bill by team or project. In Kong, none of that touches the auth block. You select a profile entirely through model.name and llm_format. There are two kinds, and they are configured differently.

System-defined (cross-region) profiles are the ones AWS creates and you cannot. Each carries a geographic prefix on its ID (us., eu., apac., global.). To use one, put the prefixed ID in model.name and set llm_format to match the model's wire shape, anthropic for Claude, openai for an OpenAI-compatible schema:

config:
  llm_format: anthropic
  targets:
    - route_type: llm/v1/chat
      # no auth block -> Pod Identity via the credential chain
      model:
        provider: bedrock
        name: au.anthropic.claude-sonnet-4-6   # us./eu./apac./global. prefix
        options:
          bedrock:
            aws_region: ap-southeast-2

Application profiles are the ones you create in your own account to attach cost-allocation tags. They are identified by an ARN, and the combination that works on current GA is the native bedrock format, the bare profile ARN in model.name, and the operation in the request path:

config:
  llm_format: bedrock
  targets:
    - route_type: llm/v1/chat
      model:
        provider: bedrock
        name: arn:aws:bedrock:us-east-1:<account>:application-inference-profile/<profile_id>
        options:
          bedrock:
            aws_region: us-east-1
# request
POST /bedrock/model/<ARN>/converse

The reason llm_format differs between the two is the model identifier. A geo-prefixed ID still resolves to a known model whose schema Kong can map (anthropic, openai), so you keep the format of the underlying model. A bare profile ARN does not carry that, so you use the native bedrock format and address the operation directly in the path.

Putting it together

The most common production ask pulls both axes into one small block. A platform engineer recently described exactly this: they wanted to call a model through an inference profile, on EKS, with no AWS credentials configured in Kong at all. That is two decisions, one per axis. The credential question is path 3, no auth block, Pod Identity supplies temporary credentials at runtime. The model question is a system-defined profile, a geo-prefixed ID in model.name. Combined, the entire target is this:

plugins:
  - name: ai-proxy-advanced
    config:
      llm_format: anthropic
      targets:
        - route_type: llm/v1/chat
          # no auth block: EKS Pod Identity supplies credentials
          model:
            provider: bedrock
            name: au.anthropic.claude-sonnet-4-6
            options:
              bedrock:
                aws_region: ap-southeast-2

No keys, no secret, no vault reference. The pod's identity authenticates, the geo-prefixed name selects a cross-region profile, and Bedrock routes the request. The only AWS-side work is making sure the Pod Identity role is allowed bedrock:InvokeModel on both the profile and the underlying model.

That is the whole shape of a Bedrock integration in Kong AI Gateway. Decide how Kong authenticates, static keys, an assumed role, or a runtime identity, and decide which profile it calls, system-defined or application. Keep the two decisions separate, put each in its own corner of the config, and the integration is straightforward to reason about. For production on AWS, the target above is a good default: nothing to leak, nothing to rotate, and cross-region routing without extra config.