1

I'm having issues with an Azure Container Instance (ACI) timing out when trying to connect to an Azure SQL database through a Private Endpoint. My setup, all provisioned with Terraform, is as follows:

Database

resource "azurerm_sql_server" "azure_sql_server" {
  name                         = "azure-sql-server"
  resource_group_name          = data.azurerm_resource_group.arc_resource_group.name
  location                      = data.azurerm_resource_group.arc_resource_group.location
  administrator_login          = var.administrator_login
  administrator_login_password = var.administrator_login_password
  version                       = "12.0"
}

resource "azurerm_sql_database" "azure_sql_database" {
  name                = "database"
  server_name         = azurerm_sql_server.arc_azure_sql_server.name
  resource_group_name = data.azurerm_resource_group.arc_resource_group.name
  location            = data.azurerm_resource_group.arc_resource_group.location
}

Container Instance

resource "azurerm_container_group" "nginx_container" {
  name                = "nginx-container-group"
  resource_group_name = data.azurerm_resource_group.arc_resource_group.name
  location            = data.azurerm_resource_group.arc_resource_group.location
  ip_address_type     = "Private"
  os_type             = "Linux"
  subnet_ids          = [azurerm_subnet.subnet_b.id]

  container {
    name   = "nginx-container"
    image  = "${var.acr_repository}.azurecr.io/docker_image:latest"
    cpu    = "0.5"
    memory = "1.5"

    environment_variables = {
        "DB_HOST" = format("%s.database.windows.net", azurerm_sql_server.arc_azure_sql_server.name)
    }
    
    ports {
      port     = 80
      protocol = "TCP"
    }
  }
}

Virtual network

data "azurerm_virtual_network" "existing_vnet" {
  name                = "vnet-uks-dev"
  resource_group_name = data.azurerm_resource_group.arc_resource_group.name
}

resource "azurerm_subnet" "subnet_a" {
  name                 = "sql-sn"
  resource_group_name  = data.azurerm_resource_group.arc_resource_group.name
  virtual_network_name = data.azurerm_virtual_network.existing_vnet.name
  address_prefixes     = ["10.2.1.0/24"]
}

resource "azurerm_subnet" "subnet_b" {
  name                 = "aci-sn"
  resource_group_name  = data.azurerm_resource_group.arc_resource_group.name
  virtual_network_name = data.azurerm_virtual_network.existing_vnet.name
  address_prefixes     = ["10.2.2.0/24"]
  service_endpoints    = ["Microsoft.Sql"]
  delegation {
    name = "aci"
    service_delegation {
      name    = "Microsoft.ContainerInstance/containerGroups"
      actions = ["Microsoft.Network/virtualNetworks/subnets/join/action"]
    }
  }
}

resource "azurerm_subnet" "subnet_c" {
  name                 = "private-link-sn"
  resource_group_name  = data.azurerm_resource_group.arc_resource_group.name
  virtual_network_name = data.azurerm_virtual_network.existing_vnet.name
  address_prefixes     = ["10.2.3.0/24"]
}

DNS and private endpoint

resource "azurerm_private_dns_zone" "sql_private_dns_zone" {
  name                = "privatelink.database.windows.net"
  resource_group_name = data.azurerm_resource_group.arc_resource_group.name
}

resource "azurerm_private_dns_zone_virtual_network_link" "vnet_link" {
  name                  = "vnet-link-to-private-dns"
  resource_group_name   = data.azurerm_resource_group.arc_resource_group.name
  private_dns_zone_name = azurerm_private_dns_zone.sql_private_dns_zone.name
  virtual_network_id    = data.azurerm_virtual_network.existing_vnet.id
}

resource "azurerm_sql_virtual_network_rule" "allow_aci_subnet" {
  name                = "AllowACISubnet"
  resource_group_name = data.azurerm_resource_group.arc_resource_group.name
  server_name         = azurerm_sql_server.arc_azure_sql_server.name
  subnet_id           = azurerm_subnet.subnet_b.id
}

resource "azurerm_private_endpoint" "database_private_endpoint" {
  name                = "pe-database"
  location            = data.azurerm_resource_group.arc_resource_group.location
  resource_group_name = data.azurerm_resource_group.arc_resource_group.name
  subnet_id           = azurerm_subnet.subnet_c.id

  private_service_connection {
    name                           = "psc-sql"
    private_connection_resource_id = azurerm_sql_server.arc_azure_sql_server.id
    subresource_names             = ["sqlServer"]
  }

  private_dns_zone_group {
    name                 = "default"
    private_dns_zone_ids = [azurerm_private_dns_zone.sql_private_dns_zone.id]
  }
}

Given this setup, my application within the ACI is experiencing timeouts when attempting to write to the Azure SQL database. What could be causing this, and how can I successfully establish the connection between the ACI and SQL database over the private endpoint?

I have constructed this with 3 subnets as private endpoints and container instances must have dedicated subnets and will not accommodate additional resources.

I have confirmed the database connection strings are correct. I have confirmed the server is deploying with azure-sql-server.database.windows.net as its server name.

1
  • There are various reasons why this may not be working. I would triage with below steps:: 1) try connecting to the console of ACI & run curl of SQL endpoint with verbose, this should give you whether it's resolved to a private/public IP & also check if ports are opened 2) Check NSGs of sql subnet if there are inbound rules actively blocking the ports 3) Do you have custom dns resolver? If yes, the resolver must have the sql private dns zone configured otherwise A records won't be resolved. Commented Oct 26, 2023 at 7:20

2 Answers 2

0

Ok, after literally days of support by a Microsoft partner, we came to a solution. We installed a NAT Gateway and this finally resulted in a much more stable connection without the intermittent errors. For more information on this system have a look at the documentation here Azure NAT Gateway

Sign up to request clarification or add additional context in comments.

2 Comments

But how did you manage to get this working through nat gateway? NAT gateway is for connections to the internet and not to internal destinations like private endpoint.. You can't even set the routing to go through nat gateway because it does not have a private IP
You’re right. However the situation was slightly different than it may sound at first glance. The intermittent connection errors weren’t directly caused by the private endpoint path itself but by how outbound connections were being handled from our container environment. What actually helped was assigning the ACI to a subnet integrated with a NAT Gateway, which ensured: 1. Stable outbound SNAT behavior (fixed outbound IPs, no ephemeral port exhaustion), 2. Proper routing for other external dependencies (like public DNS or telemetry endpoints), 3. And overall, fewer transient
0

See the solution above

Azure Container transient connection error

We encounter a similar transient error. Every now and then the connection to the SQL server through a private endpoint fails. In our case we tried to deploy the container with a different restart policy default is 'Always' we changed that to '--restart-policy OnFailure' as it looked like an internal error in the container could cause the connection timeout. However it looks like the problem lies at Microsoft Azure's end. Have you resolved this issue by now?

We will make a ticket at Microsoft for this issue as the errors occur on a daily base. When we have a definite solution we will post it here.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.