Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed, highly available, and secure Apache Kafka service. Amazon MSK reduces the work needed to set up, scale, and manage Apache Kafka in production. With Amazon MSK, you can create a cluster in minutes and start sending data.
With Amazon MSK Serverless, you can run Apache Kafka without having to manage the underlying infrastructure. Amazon MSK will automatically provision, scale, and manage your Apache Kafka clusters, so you can focus on your applications without worrying about the operational overhead. Additionally, MSK Serverless offers fine-grained, pay-as-you-go pricing, making it a cost-effective option for organizations with unpredictable workloads.
Connecting to MSK Serverless is easy. You can set up a serverless cluster using the API or AWS Management Console in minutes. MSK Serverless provides bootstrap information as a private DNS endpoint, allowing clients to connect to the serverless Apache Kafka cluster. A common use case of using MSK Serverless is an on-premises client that needs to process real-time data streams. However, the private DNS endpoint is only accessible from virtual private clouds (VPCs) that have been configured to connect and isn’t directly resolvable from an on-premises network. This can pose a challenge for on-premises clients to discover and connect to the MSK Serverless cluster.
In this post, we guide you through a step-by-step process to connect your on-premises client to MSK Serverless, overcoming this challenge.
The following diagram illustrates the solution architecture.
The flow of the solution is as follows:
- The DNS query for your MSK endpoint is routed to a locally configured on-premises DNS server.
- The on-premises DNS as configured performs conditional forwarding for
kafka-serverless.REPLACE-MSK-SERVERLESS-REGION.amazonaws.comto an Amazon Route 53 inbound resolver endpoint IP address.
- The inbound resolver endpoint performs DNS resolution by forwarding the query to the private hosted zone that was created along with the MSK Serverless cluster.
- The IP addresses returned by the DNS query are the private IP addresses of the interface VPC endpoint, which allow your on-premises host to establish private connectivity over AWS VPN or AWS Direct Connect.
- The interface endpoint is a collection of one or more elastic network interfaces with a private IP address in your account that serves as an entry point for traffic destined to a MSK Serverless service.
Note that at this time, this solution works only for MSK Serverless clusters with a single VPC.
In this section, we discuss the prerequisite steps to complete in order to implement this solution.
Establish network connectivity between on premises and the AWS Cloud
To use MSK Serverless from your on-premises network, you need to establish a network connection between your on-premises environment and the VPC that you have set up for MSK Serverless. Various secure methods are available to connect your on-premises network to the AWS Cloud. Refer to Network-to-Amazon VPC connectivity options for more information.
Create a security group for allowing inbound TCP/UDP connections from your on-premises network
Create a security group with the following configurations on the same VPC that you configured for MSK Serverless:
- Source: [On-premises CIDR range]
- Protocol: TCP/UDP
- Port Range: 53
Outbound rule: Leave it to default
For more information, refer to Work with security groups.
Update the MSK security group for inbound connections from your on-premises network
To ensure that your MSK Serverless cluster can be accessed from your on-premises network, you need to adjust the cluster’s security group settings to allow incoming traffic from your network on TCP port 9098. Complete the following steps:
- On the Amazon MSK console, choose Clusters in the navigation pane.
- Navigate to your serverless MSK cluster’s properties.
- Choose the security group associated with your MSK cluster.
Because MSK Serverless supports configuring multiple VPCs, make sure to choose the security group associated with the VPC that you configured for connecting from your on-premises network.
- To enable connections from your on-premises CIDR block to MSK Serverless, add an inbound rule that allows traffic on TCP port 9098 from your on-premises CIDR.
This ensures that your on-premises network can communicate with MSK Serverless on the specified port.
Configure a Route 53 inbound resolver endpoint
MSK Serverless provides a DNS endpoint that serves as the starting point for an Apache Kafka client to connect to the cluster. However, this endpoint isn’t publicly discoverable and can only be accessed from within the configured VPC. To resolve the serverless DNS endpoint outside of your VPC, you can set up a Route 53 resolver endpoint. This allows you to access the endpoint securely by creating a hybrid cloud setup over VPN or Direct Connect.
To configure the Route 53 resolver using the console, complete the following steps:
- On the Route 53 console, under Resolver in the navigation pane, choose Inbound endpoints.
- Choose Create inbound endpoint.
- For Endpoint name, enter the endpoint name.
- For VPC in the Region, choose the VPC where you configured MSK Serverless.
- For Security group for this endpoint, choose the security group that you created as a prerequisite for inbound TCP/UDP connections.
The security group of the inbound resolver endpoint should allow traffic from the on-premises DNS Server IP address on TCP/UDP port 53.
In the next step, you add your IP addresses, ensuring that the number of IP addresses matches the number of subnets in your MSK cluster.
- Choose the Availability Zones and subnets that are the same as your MSK Serverless network configuration.
- Select Use an IP address that is selected automatically.
- Choose Create inbound endpoint.
- Copy the inbound endpoint IP addresses.
Configure the on-premises DNS server
In this example, we use a Microsoft DNS server. To configure a conditional forwarder, complete the following steps:
- Open DNS Manager.
- Run the following command in the Run command window:
- Choose (right-click) Conditional Forwarders under the server of your choosing, then choose New Conditional Forwarder.
In the next step, you enter
kafka-serverless.REPLACE-MSK-SERVERLESS-REGION.amazonaws.com, using the IP address of Route 53 inbound resolver endpoints that you created earlier. You can find the MSK endpoint information by accessing the cluster’s client information. To learn more about getting client information, refer to Getting the bootstrap brokers for an Amazon MSK cluster.
- For DNS Domain, enter your endpoint name. For example,
kafka-serverless.ap-southeast-2.amazonaws.com. Do not enter the entire endpoint name.
- Choose OK.
Test the DNS resolution
DNS (Domain Name System) uses TCP/UDP port 53. To test whether you can connect any of the Route 53 inbound endpoints, run the following command from your on-premises client:
telnet 10.1.0.133 53
The following is a sample output:
Run the following command to check whether you can connect with the MSK Serverless endpoint from your on-premises client. To get the MSK Serverless endpoint information, refer to Create an MSK Serverless cluster.
dig boot-abcdc9.c3.kafka-serverless.ap-southeast-2.amazonaws.com +short
The following is a sample output:
If the DNS resolution fails, check your network connectivity from on premises. For more information about troubleshooting connectivity issues, refer to How do I troubleshoot VPN tunnel connectivity to an Amazon VPC or Troubleshooting AWS Direct Connect.
After you create a serverless MSK cluster, the service automatically creates an interface VPC endpoint for the cluster. You can use the dig command as shown above to retrieve the VPC endpoint ID and its associated IP address, which confirms that you are now able to connect to the MSK Serverless cluster from your on-premises environment.
Test your Kafka client
Once you complete the configuration of the Route 53 inbound resolver endpoint and on-premises DNS server, you can test your Kafka client from an on-premises network. For instructions, refer to Create a client machine. This documentation guides you through the necessary steps to set up your client machine and verify that it can successfully connect to your MSK cluster from your on-premises network.
MSK Serverless makes it easy for you to manage your data. You don’t have to worry about setting up and running your own Kafka cluster, which saves time and effort. In this post, we explored the option of on-premises connectivity with MSK Serverless and how it can greatly benefit organizations. By establishing this connection, you can gain access to a wide range of real-time analytics use case possibilities and unlock the full potential of your data.
We encourage you to try on-premises connectivity with MSK serverless.
About the Authors
Masudur Rahaman Sayem is a Streaming Data Architect at AWS. He works with AWS customers globally to design and build data streaming architectures to solve real-world business problems. He specializes in optimizing solutions that use streaming data services and NoSQL. Sayem is very passionate about distributed computing.
Akeef Khan is a Solutions Architect at Amazon Web Services. He helps SMB Greenfield customers adopt the cloud. Whilst being a generalist SA, Akeef is passionate about networking.