考虑到您已经创建了ECS集群,AWS提供了关于Scaling cluster instances with CloudWatch Alarms的说明。
假设要缩放基于内存预留集群,在较高的水平,你需要做到以下几点:
- 为您自动缩放集团创建启动配置。这
- 创建一个Auto Scaling组,以便可以扩大和缩小群集的大小。
- 创建CloudWatch的警报规模集群了,如果内存预留超过70%
- 创建CloudWatch的警报规模集群下来,如果内存预留下30%
因为它更多我的专业,我写了一个例子CloudFormation模板应该让你开始对这个最:
Parameters:
MinInstances:
Type: Number
MaxInstances:
Type: Number
InstanceType:
Type: String
AllowedValues:
- t2.nano
- t2.micro
- t2.small
- t2.medium
- t2.large
VpcSubnetIds:
Type: String
Mappings:
EcsInstanceAmis:
us-east-2:
Ami: ami-1c002379
us-east-1:
Ami: ami-9eb4b1e5
us-west-2:
Ami: ami-1d668865
us-west-1:
Ami: ami-4a2c192a
eu-west-2:
Ami: ami-cb1101af
eu-west-1:
Ami: ami-8fcc32f6
eu-central-1:
Ami: ami-0460cb6b
ap-northeast-1:
Ami: ami-b743bed1
ap-southeast-2:
Ami: ami-c1a6bda2
ap-southeast-1:
Ami: ami-9d1f7efe
ca-central-1:
Ami: ami-b677c9d2
Resources:
Cluster:
Type: AWS::ECS::Cluster
Role:
Type: AWS::IAM::Role
Properties:
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
-
Effect: Allow
Action:
- sts:AssumeRole
Principal:
Service:
- ec2.amazonaws.com
InstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
Path:/
Roles:
- !Ref Role
LaunchConfiguration:
Type: AWS::AutoScaling::LaunchConfiguration
Properties:
ImageId: !FindInMap [EcsInstanceAmis, !Ref "AWS::Region", Ami]
InstanceType: !Ref InstanceType
IamInstanceProfile: !Ref InstanceProfile
UserData:
Fn::Base64: !Sub |
#!/bin/bash
echo ECS_CLUSTER=${Cluster} >> /etc/ecs/ecs.config
AutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
MinSize: !Ref MinInstances
MaxSize: !Ref MaxInstances
LaunchConfigurationName: !Ref LaunchConfiguration
HealthCheckGracePeriod: 300
HealthCheckType: EC2
VPCZoneIdentifier: !Split [",", !Ref VpcSubnetIds]
ScaleUpPolicy:
Type: AWS::AutoScaling::ScalingPolicy
Properties:
AdjustmentType: ChangeInCapacity
AutoScalingGroupName: !Ref AutoScalingGroup
Cooldown: '1'
ScalingAdjustment: '1'
MemoryReservationAlarmHigh:
Type: AWS::CloudWatch::Alarm
Properties:
EvaluationPeriods: '2'
Statistic: Average
Threshold: '70'
AlarmDescription: Alarm if Cluster Memory Reservation is to high
Period: '60'
AlarmActions:
- Ref: ScaleUpPolicy
Namespace: AWS/ECS
Dimensions:
- Name: ClusterName
Value: !Ref Cluster
ComparisonOperator: GreaterThanThreshold
MetricName: MemoryReservation
ScaleDownPolicy:
Type: AWS::AutoScaling::ScalingPolicy
Properties:
AdjustmentType: ChangeInCapacity
AutoScalingGroupName: !Ref AutoScalingGroup
Cooldown: '1'
ScalingAdjustment: '-1'
MemoryReservationAlarmLow:
Type: AWS::CloudWatch::Alarm
Properties:
EvaluationPeriods: '2'
Statistic: Average
Threshold: '30'
AlarmDescription: Alarm if Cluster Memory Reservation is to Low
Period: '60'
AlarmActions:
- Ref: ScaleDownPolicy
Namespace: AWS/ECS
Dimensions:
- Name: ClusterName
Value: !Ref Cluster
ComparisonOperator: LessThanThreshold
MetricName: MemoryReservation
这将创建一个精英集群,一个启动配置,自动缩放集团,以及报警基础上,ECS MEMOR Ÿ预订。
现在我们可以进入有趣的讨论。
为什么我们不能根据CPU利用率进行扩展和内存预留?
简短的回答是你完全可以但是你很可能会为此付出很多。 EC2有一个已知属性,当您创建一个实例时,您至少需要支付1小时,因为部分实例小时数按整小时收费。为什么这是相关的,想象你有多个警报。假设您有一堆当前正在闲置的服务,并且您将填充群集。 “CPU警报”会缩小群集,或者“内存警报”会扩大群集。其中一种可能会使集群扩展到不再触发报警的程度。冷却时间结束后,另一个警报将撤销它的最后一个动作,在下一次冷却后,动作可能会重新进行。因此,创建实例然后在其他冷却时间中重复销毁。
在对此进行了一番思考之后,我想到的策略是使用基于CPU利用率的Application Autoscaling for ECS Services以及基于集群的内存预留。所以如果一个服务运行的很热,一个额外的任务将被添加来共享负载。这将缓慢填充群集内存预留容量。当内存满了时,集群会扩大。当服务正在冷却时,服务将开始关闭任务。随着群集上的内存预留量下降,群集将被缩小。
CloudWatch警报的阈值可能需要根据您的任务定义进行试验。原因在于,如果您将扩展阈值设置得过高,当内存消耗完后可能无法扩展,然后在自动缩放执行另一项任务时,会发现没有足够的内存可用实例在集群中,因此无法放置其他任务。
哪个更重要,CPU还是内存?如果CPU规模放大,应该发生什么,但内存是否缩小? –
谢谢杰米。我会说我的情况的记忆。在ECS中,有一个仪表板显示多少CPU单元/内存分配与未分配。我认为这是基于容器的任务定义。我想根据这些ECS指标进行扩展 – codeshark