2025年09月13日 mlops

使用 Azure SDK for Python 构建可扩展异步云集成应用

利用 Azure SDK for Python 的异步客户端、AAD 认证和批量操作，集成存储、计算和 AI 服务，实现高效云原生开发，提供代码示例与最佳实践。

内容加载中...

在云原生开发中，Python 应用需要高效处理高并发和大规模数据操作。Azure SDK for Python 提供了强大的异步支持，通过 .aio 模块实现非阻塞 I/O，这有助于构建可扩展的应用。特别是在集成 Azure 存储、计算和 AI 服务时，异步模式可以显著提升性能，避免线程阻塞问题。本文将聚焦于使用异步客户端、Azure Active Directory (AAD) 认证以及批量操作的实践，帮助开发者快速上手云集成。

AAD 认证的异步配置

认证是云集成的基础。Azure SDK 使用 azure.identity 库支持 AAD 认证，推荐采用 DefaultAzureCredential，它自动尝试多种凭据类型，如环境变量、CLI 登录或托管身份。在异步环境中，确保凭据对象支持异步关闭方法，以避免资源泄漏。

首先，安装必要包：

pip install azure-identity azure-storage-blob aio azure-mgmt-compute azure-ai-openai

配置环境变量（生产环境建议使用托管身份）：

AZURE_CLIENT_ID
AZURE_TENANT_ID
AZURE_CLIENT_SECRET（服务主体密钥）

示例代码展示异步客户端的认证初始化：

import asyncio
from azure.identity.aio import DefaultAzureCredential
from azure.storage.blob.aio import BlobServiceClient

async def authenticate_and_create_client():
    credential = DefaultAzureCredential()
    blob_service_client = BlobServiceClient(
        account_url="https://yourstorageaccount.blob.core.windows.net",
        credential=credential
    )
    # 使用后关闭
    async with blob_service_client:
        # 操作示例
        pass
    await credential.close()

asyncio.run(authenticate_and_create_client())

此配置确保了安全的身份验证，支持角色-based 访问控制 (RBAC)。在计算服务中，类似地使用 azure-mgmt-compute 的异步客户端进行资源管理。注意，角色分配需至少 1-2 分钟传播时间，若遇认证失败，可稍后重试。

异步集成 Azure 存储服务

Azure Blob 存储是数据湖场景的首选。异步客户端 azure.storage.blob.aio 支持并发上传/下载，适用于大规模数据处理。

关键参数：

max_concurrency: 控制并发请求数，默认 1，建议设为 CPU 核心数的 2-4 倍，避免过载。
retry_total: 重试次数，默认 3，针对网络波动。
timeout: 请求超时，单位秒，推荐 300 秒用于大文件。

批量操作示例：并发上传多个 Blob。

import asyncio
from azure.storage.blob.aio import BlobServiceClient
from azure.identity.aio import DefaultAzureCredential

async def batch_upload_blobs():
    credential = DefaultAzureCredential()
    client = BlobServiceClient(
        account_url="https://yourstorageaccount.blob.core.windows.net",
        credential=credential
    )
    container_client = client.get_container_client("mycontainer")
    
    async def upload_blob(blob_name, data):
        blob_client = container_client.get_blob_client(blob_name)
        await blob_client.upload_blob(data, overwrite=True)
    
    tasks = [
        upload_blob(f"file_{i}.txt", f"data_{i}".encode()) for i in range(10)
    ]
    await asyncio.gather(*tasks)
    
    await credential.close()

asyncio.run(batch_upload_blobs())

此示例使用 asyncio.gather 实现并行上传，提高吞吐量。监控要点：使用 Azure Monitor 跟踪 Blob 操作延迟，若超过 5 秒，考虑增加重试或优化数据分片。回滚策略：上传前检查容器存在，若失败，删除部分文件。

异步管理 Azure 计算资源

对于计算服务，如虚拟机 (VM) 管理，使用 azure-mgmt-compute.aio 库的异步接口。长时间运行操作 (LRO) 返回 Poller 对象，支持 begin_create_or_update 等方法。

参数配置：

location: 资源位置，如 "eastus2"，选择低延迟区域。
vm_size: 实例规格，如 "Standard_D2s_v3"，根据负载调整。
polling_interval: 轮询间隔，默认 30 秒，生产环境可调至 10 秒加速反馈。

示例：异步创建 VM。

import asyncio
from azure.mgmt.compute.aio import ComputeManagementClient
from azure.identity.aio import DefaultAzureCredential

async def create_vm():
    credential = DefaultAzureCredential()
    compute_client = ComputeManagementClient(credential, subscription_id="your-sub-id")
    
    poller = await compute_client.virtual_machines.begin_create_or_update(
        resource_group_name="mygroup",
        vm_name="myvm",
        parameters={
            "location": "eastus",
            "hardware_profile": {"vm_size": "Standard_D2s_v3"},
            "storage_profile": {"image_reference": {"publisher": "Canonical", "offer": "UbuntuServer", "sku": "18.04-LTS", "version": "latest"}},
            "os_profile": {"computer_name": "myvm", "admin_username": "azureuser", "admin_password": "Password123!"},
            "network_profile": {"network_interfaces": [{"id": "/subscriptions/.../networkInterfaces/myNIC"}]}
        }
    )
    vm = await poller.result()
    print(f"VM created: {vm.id}")
    
    await credential.close()

asyncio.run(create_vm())

此操作非阻塞，允许应用继续处理其他任务。风险：VM 创建可能需 5-10 分钟，建议设置超时阈值 600 秒。清单：预创建网络接口、安全组；生产中启用自动缩放组 (ASG) 以动态调整实例。

集成 Azure AI 服务异步调用

AI 服务如 Azure OpenAI 支持异步客户端，适用于实时推理场景。使用 azure-ai-openai.aio 库处理批量提示生成。

参数优化：

max_tokens: 响应长度上限，设为 1000 以平衡成本和质量。
temperature: 创造性参数，0.2-0.5 用于确定性输出。
batch_size: 批量大小，推荐 50-100，根据 API 限额调整 (默认 1000 RPM)。

批量推理示例：

import asyncio
from openai.aio import AsyncOpenAI  # Azure OpenAI 兼容
from azure.identity.aio import DefaultAzureCredential
import os

async def batch_ai_inference():
    # 使用 AAD 或 API Key
    client = AsyncOpenAI(
        azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
        api_key=os.getenv("AZURE_OPENAI_KEY"),  # 或使用 credential
        api_version="2023-05-15"
    )
    
    async def generate(prompt):
        response = await client.chat.completions.create(
            model="gpt-35-turbo",
            messages=[{"role": "user", "content": prompt}],
            max_tokens=100,
            temperature=0.3
        )
        return response.choices[0].message.content
    
    prompts = [f"Explain async in Python {i}" for i in range(5)]
    tasks = [generate(p) for p in prompts]
    results = await asyncio.gather(*tasks)
    
    for res in results:
        print(res)
    
    await client.close()

asyncio.run(batch_ai_inference())

此模式适合微服务架构中 AI 驱动的决策。引用官方文档："Azure OpenAI 支持异步客户端以实现高效的并发请求处理。" 监控：跟踪令牌使用率，若超 80%，实施限流。回滚：缓存常见响应，失败时回退到本地模型。

最佳实践与监控

构建可扩展应用时，结合以上服务：

事件驱动架构：使用 Azure Service Bus 异步消息队列解耦存储与 AI 处理。
资源管理：所有异步客户端使用 context manager (async with) 确保清理。
错误处理：捕获 azure.core.exceptions 异常，重试策略使用 exponential backoff (初始 1 秒，最大 60 秒)。
性能调优：在 asyncio.run 中设置 loop policy 为 WindowsEventLoopPolicy (Windows 上)；测试并发上限，避免 API 节流。
安全清单：启用 RBAC 最小权限；密钥轮换每 90 天；日志使用 azure-monitor-opentelemetry。

潜在风险：异步代码调试复杂，建议使用 logging 和 tracing。限额：存储批量上传上限 5000 Blob/批，AI RPM 视模型而定。

通过这些实践，开发者可高效集成 Azure 服务，实现云原生 Python 应用。实际部署时，从小规模 POC 开始，逐步扩展至生产环境，确保监控与 CI/CD 管道集成。

（字数：1256）