查看某种资源
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" 或者 kubectl proxy 在页面上看
Controller 逻辑(JobController为例)
JobController的实现逻辑比较简单,用它来示例 Controller的实现方式
serviceaccount_controller 和 tokens_controller
- serviceaccount_controller: 让每个空间都有一个默认 serviceaccount, 比如配置的 "default"
- tokens_controller: 让 serviceaccount 有对应 token: secret
kubernetes 可配置特性
默认是否打开,以及当前成熟程度
pkg/features/kube_features.go
kubectl code 位置问题
kubectl中 auth/convert/cp/get 在 k8s.io/kubernetes/pkg 下面,其余代码在 k8s.io/kubectl 下面,这个是因为以前都在 k8s.io/kubernetes/pkg, 后来渐渐 move to staging目录下,还没移完
kubectl 实现方式
kubectl 的核心在 vendor/k8s.io/cli-runtime,其中比较重要的是 vendor/k8s.io/cli-runtime/pkg/resource/builder.go
构建builder->设置builder参数->Do设置vistors->Info取回并修饰结果
type RESTClientGetter interface {
// ToRESTConfig returns restconfig
ToRESTConfig() (*rest.Config, error)
// ToDiscoveryClient returns discovery client
// DiscoveryInterface holds the methods that discover server-supported API groups,
// versions and resources.
ToDiscoveryClient() (discovery.CachedDiscoveryInterface, error)
// ToRESTMapper returns a restmapper
// RESTMapper allows clients to map resources to kind, and map kind and version
// to interfaces for manipulating those s. It is primarily intended for
// consumers of Kubernetes compatible REST APIs as defined in docs/devel/api-conventions.md.
ToRESTMapper() ( .RESTMapper, error)
// ToRawKubeConfigLoader return kubeconfig loader as-is
ToRawKubeConfigLoader() clientcmd.ClientConfig
}
// Result contains helper methods for dealing with the outcome of a Builder.
type Result struct {
err error
visitor Visitor
sources []Visitor
singleItemImplied bool
targetsSingleItems bool
mapper *mapper
ignoreErrors []utilerrors.Matcher
// populated by a call to Infos
info []*Info
}kubelet的核心组件
图片来自 https://feisky.gitbooks.io/kubernetes/components/kubelet.html (上图中kubelet之后还有ContainerManager(名字容易混淆)会设置cgroup,device resource之类的信息,然后才会调用genericRuntimeManager)
- PodWorkers: podWorkers handle syncing Pods in response to events.
- kubepod.Manager: podManager is a facade that abstracts away the various sources of pods this Kubelet services.
- eviction.Manager: Needed to observe and respond to situations that could impact node stability
- kubecontainer.ContainerCommandRunner: run in container, 即 exec in container
- cadvisor: 监控
- dnsConfigurer: setting up DNS resolver configuration when launching pods
- VolumePluginMgr: Volume plugins.
- probeManager/livenessManager: Handles container probing/ Manages container health check results.
- kubecontainer.ContainerGC: Policy for handling garbage collection of dead containers.
- images.ImageGCManager: Manager for image garbage collection.
- logs.ContainerLogManager: Manager for container logs.
- secret.Manager: Secret manager
- configmap.Manager: ConfigMap manager.
- certificate.Manager: Handles certificate rotations.
- status.Manager: Syncs pods statuses with apiserver; also used as a cache of statuses.
- volumemanager.VolumeManager: attach/mount/unmount/detach volumes for pods
- cloudprovider.Interface
- cloudresource.SyncManager
- kubecontainer.Runtime: Container runtime, GetPods/SyncPod/KillPod/GetPodStatus/ImageService....
- kubecontainer.StreamingRuntime: GetExec/GetAttach/GetPortForward
- RuntimeService:
- ContainerManager(Create/Start/Stop/List/Exec...Container)
- PodSandboxManager(Run/Stop/Remove..PodSandbox)
- ContainerStatsManager
- PodLifecycleEventGenerator: Generates pod events.
- oomwatcher.Watcher
- cm.ContainerManager: Start/SystemCgroupsLimit/GetNodeConfig/GetMountedSubsystems/GetQOSContainersInfo...
- pluginmanager.PluginManager
kubelet的入口线程
kubelet.go
- ListenAndServe/ListenAndServeReadOnly: server 10250/10255
- ListenAndServePodResources: a gRPC server to serve the PodResources service
- For serviceIndexer/nodeIndexer: get local cache for service and node
- containerGC/imageManager.GarbageCollection: 定期 GarbageCollect, call kubeGenericRuntimeManager.containerGC evictContainers/evictSandboxes/evictPodLogsDirectories / realImageGCManager.GarbageCollect
- pluginManager.Run: CSIPlugin/DevicePlugin
- cloudResourceSyncManager: sync node address
- volumeManager: runs a set of asynchronous loops that figure out which volumes need to be attached/mounted/unmounted/detached d on the pods scheduled on this node and makes it so.
- syncNodeStatus/fastStatusUpdateOnce/nodeLeaseController: updateNodeStatus 两种上报方式,lease轻量不易因为集群数据量过大失败
- updateRuntimeUp: every 5s , initializing the runtime dependent modules when the container runtime first comes up
- podKiller: every 1s, Start a goroutine responsible for killing pods (that are not properly handled by pod workers).
syncLoopIteration // Arguments: // 1. configCh: a channel to read config events from, 来自http/status/apiserver // 2. handler: the SyncHandler to dispatch pods to, 同步状态 // 3. syncCh: a channel to read periodic sync events from // 4. housekeepingCh: a channel to read housekeeping events from // 5. plegCh: a channel to read PLEG updates from, 容器状态变化ContainerStarted/Died/Removed/..
cgroup 结构
https://zhuanlan.zhihu.com/p/38359775
# ubuntu 16.04; kubernetes v1.10.5
ubuntu@VM-0-12-ubuntu:~$ systemd-cgls
Control group /:
-.slice
├─init.scope
│ └─1 /sbin/init
├─system.slice
│ ├─avahi-daemon.service
│ │ ├─1268 avahi-daemon: running [VM-0-12-ubuntu.local
│ │ └─1283 avahi-daemon: chroot helpe
| | -- 略
│ ├─dockerd.service
│ │ ├─ 5134 /usr/bin/dockerd --config-file=/etc/docker/daemon.json
│ │ ├─ 5143 docker-containerd --config /var/run/docker/containerd/containerd.toml
│ │ └─29537 docker-containerd-shim -namespace moby -workdir /data/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/303a0718c84995350d835f6e2d17036
| | | 略
│ ├─accounts-daemon.service
│ │ └─1262 /usr/lib/accountsservice/accounts-daemon
| | --略
│ ├─NetworkManager.service
│ │ └─1287 /usr/sbin/NetworkManager --no-daemon
│ ├─kubelet.service
│ │ └─5239 /usr/bin/kubelet --cluster-dns=10.15.255.254 --network-plugin=cni --kube-reserved=cpu=80m,memory=1319Mi --cloud-config=/etc/kubernetes/qcloud.conf
│ ├─rsyslog.service
│ │ └─1251 /usr/sbin/rsyslogd -n
| | 略
│ └─acpid.service
│ └─1293 /usr/sbin/acpid
├─user.slice
│ └─user-500.slice
│ ├─session-129315.scope
│ │ ├─27862 sshd: ubuntu [priv]
│ └─user@500.service
│ └─init.scope
│ ├─27870 /lib/systemd/systemd --user
│ └─27871 (sd-pam)
└─kubepods
├─burstable
│ ├─pod5645ed58-e98f-11e9-8443-52540087514c
│ │ ├─1f8f76dacb8334bd8d8ab2a7432d2cc250286ca6b5b73ab6dca9a845b77a3a09
│ │ │ └─8958 /configmap-reload --webhook-url=http://localhost:9090/-/reload --volume-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0
└─besteffort
├─pod3cf3ae0d-b7f4-11e9-8443-52540087514c
│ ├─fde2178c5fa634206c2c86756c107c3de2828d2f90e2ea4c6a3b57f50c25267c
│ │ └─5435 /pause
│ └─5b4082efeb73ad102cc3fea33ff4c931c042a7120f0cd5277d46660aedffffde
│ ├─ 5663 sh /install-cni.sh
│ └─20347 sleep 3600APIserver 结构
一个不错的参考:https://note.youdao.com/ynoteshare1/index.html?id=63f58c5e98634c8b3df9da2b024aacd5&type=note
重要流程
- CreateKubeAPIServer
- completedConfig.InstallLegacyAPI: api/all 和 api/legacy,分别控制全部和遗留 API
- completedConfig.InstallAPIs
- apiGroupInfo=restStorageBuilder.NewRESTStorage: 比较重要的元素是 VersionedResourcesStorageMap mapstringmapstringrest.Storage: {"v1beta1":{"deployments":deploymentStorage.Deployment}}
- 以"app"为例: if v1enable: storageMap=RESTStorageProvider(storage_app).v1Storage
- deploymentStorage = deploymentstore.NewStorage, storage"deployments" = deploymentStorage.Deployment; deploymentStorage 里面是 XXXREST元素, XXXREST元素的解释见下面
- 以"app"为例: if v1enable: storageMap=RESTStorageProvider(storage_app).v1Storage
- GenericAPIServer.InstallAPIGroups
- s.installAPIResources: 核心安装 API 的方法,建立 api 和 storage的关系
- apiGroupVersion.InstallREST
- installer.Install()
- registerResourceHandlers: 对storage里面所有的path 关联 storage
- 比如 actions = appendIf(actions, action{"GET", itemPath, nameParams, namer, false}, isGetter)
- handler = restfulGetResource(getter, exporter, reqScope)
- route := ws.GET(action.Path).To(handler).Doc(doc)....
- installer.Install()
- apiGroupVersion.InstallREST
- s.DiscoveryGroupManager.AddGroup
- s.Handler.GoRestfulContainer.Add(discovery.NewAPIGroupHandler(s.Serializer, apiGroup).WebService())
- s.installAPIResources: 核心安装 API 的方法,建立 api 和 storage的关系
- apiGroupInfo=restStorageBuilder.NewRESTStorage: 比较重要的元素是 VersionedResourcesStorageMap mapstringmapstringrest.Storage: {"v1beta1":{"deployments":deploymentStorage.Deployment}}
// NewREST returns a RESTStorage that will work against deployments.
func NewREST(optsGetter generic.RESTOptionsGetter) (*REST, *StatusREST, *RollbackREST, error) {
store := &genericregistry.Store{
NewFunc: func() runtime. { return &apps.Deployment{} },
NewListFunc: func() runtime. { return &apps.DeploymentList{} },
DefaultQualifiedResource: apps.Resource("deployments"),
CreateStrategy: deployment.Strategy,
UpdateStrategy: deployment.Strategy,
DeleteStrategy: deployment.Strategy,
TableConvertor: printerstorage.TableConvertor{TableGenerator: printers.NewTableGenerator().With(printersinternal.AddHandlers)},
}
options := &generic.StoreOptions{RESTOptions: optsGetter}
if err := store.CompleteWithOptions(options); err != nil {
return nil, nil, nil, err
}
statusStore := *store
statusStore.UpdateStrategy = deployment.StatusStrategy
return &REST{store, []string{"all"}}, &StatusREST{store: &statusStore}, &RollbackREST{store: store}, nil
}
type REST struct {
*genericregistry.Store
categories []string
}
genericregistry.Store 定义了 NewList,New ,CreateStrategy,UpdateStrategy
核心是 DryRunnableStorage:DryRunnableStorage中的 storage.Interface 是实际对存储的 crud 入口
type DryRunnableStorage struct {
Storage storage.Interface
Codec runtime.Codec
}
Storage 是 Cacher struct {真实stotrage -> etcd3/store}
generic.StoreOptions.RESTOptions决定了后端存储 是 completedConfig.(genericapiserver.CompletedConfig).的一部分 从最上面一层一层传递下来 来自 buildGenericConfig <- createAggregatorConfig master.config->completedConfig
最终可以发现 generic.RESTOptions.Decorator = genericregistry.StorageWithCacher(cacheSize) 即带 cache 的etcd后端 (EnableWatchCache打开的时候 default true)
cache 的实现在 vendor/k8s.io/apiserver/pkg/storage/cacher/cacher.go
下面具体的看这个 cache的实现apiserver 里面的 cache 实现
watch 为例; user 为 vendor/k8s.io/apiserver/pkg/registry/generic/registry/store.go
动作 | 处理 |
|---|---|
Create | etcd3/store:Create |
Delete | etcd3/store:Delete |
Watch | etcd3/注册 watcher 接受事件, 从 cache |
Get | resourceVersion=""时直接去 store get; 否则从 cache 获取(需要wait resourceVersion) |
List | 和 Get 类似 |
Debug Etcd
# 下载 etcd
ETCD_VER=v3.4.0
DOWNLOAD_URL=https://github.com/etcd-io/etcd/releases/download
curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz && tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /tmp/etcd-download-test --strip-components=1 && rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
# 配置环境
export ETCDCTL_CERT=/etc/kubernetes/certs/kube-apiserver-etcd-client.crt
export ETCDCTL_KEY=/etc/kubernetes/certs/kube-apiserver-etcd-client.key
export ETCDCTL_CACERT=/etc/kubernetes/certs/kube-apiserver-etcd-ca.crt
export ETCDCTL_ENDPOINTS=https://etcd.cls-4lr4c4wx.ccs.tencent-cloud.com:2379
etcdctl get "" --prefix=true --limit=1 # get key and value
etcdctl get "" --prefix=true --keys-only --limit=100 # get only keys
etcdctl get "/cls-4lr4c4wx/pods" --prefix=true --keys-only --limit=10 # get pod keys; 这里cls-4lr4c4wx是etcd prefix
etcdctl get "/cls-4lr4c3wx/configmaps" --prefix=true --limit 1 --write-out="json" # 输出为 json1.16 里面的 watch bookmark event是什么意思
比如一个客户端 watch pod
GET /api/v1/namespaces/test/pods?watch=1&resourceVersion=10245&allowWatchBookmarks=true
---
200 OK
Transfer-Encoding: chunked
Content-Type: application/json
{
"type": "ADDED",
" ": {"kind": "Pod", "apiVersion": "v1", " data": {"resourceVersion": "10596", ...}, ...}
}
{
"type": "BOOKMARK",
" ": {"kind": "Pod", "apiVersion": "v1", " data": {"resourceVersion": "12746"} }
}然后 watcher 发生了重启, 有BOOKMARK的 watcher就可以从 resourceVersion=12746 开始继续 watch, 而没有收到 BOOKMARK 的,只能从 resourceVersion=10596 继续 watch,但是其实 10596-12746 起码没有他关心的 event了.
apiserver 是如何实现 Aggregator 的
aggregator 本身也是一个 controller
继续阅读与本文标签相同的文章
-
有看头了!Showtime正将Uber故事拍成一部电视剧
2026-05-15栏目: 教程
-
卡死?你的BI速度慢吗?试试这个工具
2026-05-15栏目: 教程
-
印度电信运营商宣布使用爱立信设备建设5G核心网
2026-05-15栏目: 教程
-
微信这样发语音才能让人更喜欢听,看了这些,网友:还真的是!
2026-05-15栏目: 教程
-
任正非:6G或将10年内问世
2026-05-15栏目: 教程
