从 BGP 到街道:一个 IP 归属地地理库到底是怎么建出来的

作者: forthxu
时间: June 11, 2026
分类: 默认分类
评论

从 BGP 到街道:一个 IP 归属地地理库到底是怎么建出来的

很多人对 IP 定位有个朴素的想象:既然每个 IP 都唯一,那查一下它在哪儿,应该像查身份证一样简单。等真要动手建一个 IP 归属地地理库,才会发现这件事远比想象的拧巴——而拧巴的根源,是把两件本质不同的东西混为了一谈。

这篇文章想把这件事讲透:一个能用的 IP 地理库,到底由哪些部分组成、每一层用什么方法、精度的天花板在哪里,以及哪些"街道级"其实是虚假精度。

第一性原理:路由身份 ≠ 物理位置

先立一个贯穿全文的认知:IP 的"路由身份"和"物理位置"是两个维度,必须解耦。

BGP 路由表告诉你的是"这个前缀由哪个 AS 宣告、经哪些 AS 可达"——这是逻辑的、三层的。一个 AS(自治系统)是一个统一路由策略下的管理实体,它跟地理没有必然关系:一个 AS 可以横跨几十个国家,一栋机房里也可以挤着好几个 AS。所以"这个 IP 属于哪个 AS"回答的是"谁在路由它",而不是"它在哪"。

更要命的是,一个被宣告的 CIDR 前缀不等于一个物理位置。运营商常把地理分散的小块聚合成一个大前缀宣告;一个 /16 可能服务一整个省、成千上万个互相隔离的局域网。哪怕是一个 /24(256 个地址),里面的地址也可能散在不同城市——这不是理论,而是经典算法 GeoCluster 真实遇到的情况:一个 /24 里 10 个 IP 在西雅图、1 个在波士顿。

所以物理位置是一份单独的事实,必须靠地理库或网络测量得到,而不能从"哪个 AS 宣告了它"反推。理解了这一点,整个地理库的架构就顺了:BGP 给骨架,别的方法给血肉。

地基:BGP/RIB 能做什么,不能做什么

虽然 BGP 给不了精确位置,但它是地理库不可或缺的地基。从公共采集点(RouteViews、RIPE RIS)的 RIB dump 里,你能拿到全球真实在路由的前缀全集 + 每个前缀的源 AS。这有几个关键作用:

提供"行"的单元。地理库本质是一张"前缀 → 位置"的表;用 BGP 里实际宣告的前缀当行,而不是凭空切块,保证和真实网络一致、能按最长前缀匹配查询。
源 AS → 机构和类型。把源 AS 映射到运营商名和类型(家宽 / IDC / 移动 / 云 / CDN),这决定了"该用什么策略去定位"——家宽和机房的定位逻辑完全不同。同时 AS 的注册国家给了一个国家级锚点。
anycast 检测。一个前缀若被多个源 AS、从分散位置宣告,就该标记为 anycast——这种前缀没有"一个位置",任何定位尝试都是错的。
过滤与保鲜。BGP 告诉你哪些地址实际在用,变更 feed 还能在前缀拆分、源 AS 改变时提示你"该重新核对位置了"。

但 BGP 的天花板,大致就是国家级。想到城市、街道,得换方法。

方法分层:从国家到街道

真正决定精度的,是叠在前缀之上的地理信号。按精度从低到高,方法是这样的:

国家级:免费的兜底

RIR 委派数据和 whois 注册信息,免费,只能到国家级(whois 偶尔到省)。它们的价值是兜底——保证再差也有个国家。注意这是"注册/分配"国家,不一定等于实际使用地,所以连国家级都可能错,只能当下限。

城市级:务实的主战场

这一层有四类方法,组合起来能稳到城市:

1. 商业地理库(务实主力)。 MaxMind、IP2Location,以及国内城市级最准的 ipdb 等。它们已经替你做完了测量和挖掘,直接给"前缀 → 城市"。对绝大多数自建者,这应该是城市级基线——买,而不是重复造轮子。

2. rDNS 主机名挖掘。 很多运营商基础设施的反向 DNS 名里藏着城市码或机场码,比如路由器名里的 sjc 指向圣何塞。批量解析这些 PTR 记录,对 IDC 和骨干设施很有效;但对家宽用户 IP 基本无用。

3. 接入边缘 / BRAS 锚定(家宽的正解)。 这是个聪明的反转。家宽用户 IP 又动又散,直接定位很难;但用户前面的接入设施(BRAS、城域网核心、OLT 局端)是物理固定在某个城市机房里的,而且用户挂在本地城域网下,所以接入边缘的城市 ≈ 用户的城市。做法是:先把"接入设施 IP → 城市"这层地标建准(traceroute 采中间跳 + rDNS + 测量 + 商业库交叉),之后给海量用户定位,只是查"它在哪个接入边缘后面"。这把"定位百万移动目标"降维成"定位少数稳定接入点再传导",能稳到城市级。

唯一要警惕的是 BRAS 集中化:运营商把接入设备集中云化后,接入边缘可能远离用户(整省几个点),这时要识别出来、降级到省级,别硬标城市。

4. 时延测量 + 基准点。 也就是 CBG / TBG 这类算法,适合做区域定位和交叉验证。这一类的原理值得单独说,因为它常被误解。

测量的真相:延迟给的是"上界",不是距离

有人会问:网络速度不固定,延迟会被排队、绕路、抖动污染,怎么可能从延迟测出距离?

问得对——确实测不出精确距离。但测量定位根本不求精确距离,它求的是距离的上界。延迟里所有的污染因素只会让延迟变大,永远不会让它小于光在光纤里走直线的时间。于是反过来:一个测到的 RTT,对应一个最大可能距离(光在光纤里约 200 km/ms,RTT 每毫秒对应约 100 km 的半径上界)。目标"不可能比这更远"。

有了上界,再用几招收紧:多个位置已知的基准点各给一个上界圆,目标在所有圆的交集里(基准点越多交集越小);就近基准点约束最紧;多次测量取最小值滤掉瞬时拥塞;概率模型(Spotter 等)直接从数据里学"延迟-距离"的真实分布,而不假设线性。最终得到的是一个带概率的区域,不是一个确定的距离值。精度由基准点的密度和邻近度决定——这也是为什么城区准、偏远差。

街道级:本质是地面真值工程

这是天花板,也是虚假精度的重灾区。要说一句实话:纯网络测量基本到不了家宽的街道级——末跳接入延迟的噪声比街道尺度的信号还大,基准点也不可能密到每条街。能到街道级的,几乎都靠地面真值数据:

众包 GPS-IP 配对:在用户授权下,通过 App 采集「精确 GPS + 当前公网 IP」,汇聚成质心。这是端用户街道级唯一可规模化的途径,也是准确商业库的真正底牌。需要 App 触达和严格合规。
应用层地址数据:自有业务里"收货地址 + 当时 IP"的配对,很强,但仅自有数据可用,且隐私敏感。
机房设施识别:对服务器类 IP,通过 rDNS、已知 DC 地址段、PeeringDB 里登记的设施地址,把服务器映射到具体机房 → 楼/街道级。这是对 IDC 很实用的一条正经路径。

对家宽,如果拿不到真值,就老实停在城市级,不要堆一个看着像样、实则是猜测的"街道地址"。

位置会"静默漂移"

还有一个容易被忽略的坑:即便前缀和源 AS 都不变,地址实际落在哪个机房也会变——地址重分配、VM 迁移、运营商换服务机房、地址租赁转让……这些里最危险的一种,是"运营商仍用同一个 AS 宣告同一前缀,只是把服务机房换了":RIB 行纹丝不动,地理数据却悄悄过期了,你从路由上完全察觉不到。

结论是:地理层必须独立、持续地刷新,不能假设"RIB 没变 = 位置没变"。靠 rDNS 变化、时延突变、whois 更新等信号触发重核。

数据模型:把"稳"和"流动"分开存

这些认知直接决定了表结构。核心是把路由身份(稳)和物理位置(流动、可切分、带时间戳)分两层:

prefix:路由锚点(前缀 → 源 AS),来自 RIB,变化慢,当行索引。
asn:分类器(机构 + 类型 + 注册国),决定定位策略和国家兜底。
geo_segment:物理位置层。一个前缀下可挂多条——这就天然支持"一个前缀跨多机房",每条带 source / confidence / measured_at,可独立刷新、可被取代。
geo_resolved:把 geo_segment 按精度和置信度合并成不重叠区间,运行时只查它,单条命中最快。
change_log:记录 RIB 变更和探测异常,驱动地理层重核。

实现上,IP 统一按 16 字节存(IPv4 映射成 ::ffff: 形式),用区间做最长匹配。每条地理判定都带置信度和时间戳——位置是会变的事实,不是和前缀绑死的常量。

务实的建库蓝图

把方法串成一条可执行的路径:

底座:RIB → prefix + asn(用 pyasn 等把"前缀 → AS"建起来)。
分类:判 AS 类型,不同类型走不同策略。
兜底:RIR 国家级全量铺满,保证有下限。
打底:商业库铺城市级基线(买,不重复造)。
精修家宽:rDNS + 接入边缘锚定,把家宽提到城市级。
验证:主动测量交叉核对商业库、精修城区,顺带抓"静默漂移"。
冲街道:在能拿到真值的地方(自有 App 众包、自有业务地址、服务器设施识别)做街道级。
合并 + 保鲜:压成 geo_resolved,全程带置信度和时间戳,持续刷新。

你会发现,BGP/AS 主要出现在 1、2 和切分、anycast、刷新这些结构性步骤;真正决定精度的地理信号层,BGP 完全帮不上,得靠商业库、挖掘、测量和真值。

诚实的天花板

最后留几句最该记住的:

精度现实:国家级准确率约 99.9%,城市级约 70%,街道级靠地面真值,且市场充斥虚假精度。
一个前缀 ≠ 一个位置:要按地理把前缀切开;anycast 不定位,只标记。
反对假精度:没有真值支撑的"街道级"宁可降级到城市,也不要堆一个唬人的门牌号。
分工铭记:BGP 给骨架(谁路由、国家级),四类方法给血肉(城市→街道),测量是"带不确定性的区域",真值才是街道级的来源。

一个好的 IP 地理库,不是某个神奇算法的产物,而是用 BGP 当稳定骨架、用多源信号当血肉、用置信度和时间戳兜住诚实的一套工程。把路由和位置分清楚,把"测得到的"和"猜出来的"分清楚——这是这件事最难、也最重要的地方。

使用cloudflared代理内网

作者: forthxu
时间: December 11, 2025
分类: 默认分类
评论

使用cloudflared代理内网

安装

curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-arm64 -o cloudflared
chmod +x cloudflared
mv cloudflared /usr/bin/

登陆

cloudflared login

创建通道

cloudflared tunnel create mytunnel

增加通道绑定域名

cloudflared tunnel route dns mytunnel dy.xiamen.dev

配置通道服务

mkdir /etc/cloudflared/
vim /etc/cloudflared/config.yml

tunnel: mytunnel
credentials-file: /root/.cloudflared/155ed863-891d-45e3-b436-da57cde7f8f8.json

ingress:
  # 强制 HTTPS
  - hostname: dy.xiamen.dev
    service: http://127.0.0.1:3001
    originRequest:
      noTLSVerify: true
  # 暴露 SSH，通道需要绑定域名
  - hostname: ssh.xiamen.dev
    service: ssh://127.0.0.1:22
  # 默认 fallback
  - service: http_status:404

启用通道服务

cloudflared tunnel run mytunnel

后台启用通道服务

vim /etc/init.d/cloudflared

#!/bin/sh /etc/rc.common
START=99
STOP=10

start() {
    echo "Starting cloudflared..."
    /usr/bin/cloudflared --config /etc/cloudflared/config.yml tunnel run > /var/log/cloudflared.log 2>&1 &
}

stop() {
    echo "Stopping cloudflared..."
    killall cloudflared
}

chmod +x /etc/init.d/cloudflared
/etc/init.d/cloudflared enable
/etc/init.d/cloudflared start

IP 风险评分（IP Reputation / IP Risk Score）来源和系统自建

作者: forthxu
时间: November 28, 2025
分类: 默认分类
评论

https://chatgpt.com/share/69291b65-c5b8-8001-acf2-781c9b25c14c

这些来源都是各大代理商、风控平台、广告平台、支付公司、反欺诈系统实际使用的。

一、国际权威风险评分服务商（最核心来源）

这些是行业标准，代理商都会订阅：

1. Spamhaus

用途：反垃圾邮件、恶意活动
特征库：SBL、XBL、PBL
评分内容：

垃圾邮件来源
僵尸网络
恶意流量
动态 IP 池标记（PBL 是家庭宽带大杀器）

所有代理商一定参考 Spamhaus

2. IPQualityScore (IPQS)

用途：反欺诈、广告防作弊
提供：

代理/VPN 检测
异常活动评分
TOR / 数据中心标记
高风险国家标记
BOT、点击作弊评分

世界上使用最广的 IP 风险评分库之一

3. MaxMind GeoIP2 + MaxMind RiskScore

地区库 + 风险引擎
包含：

住宅/数据中心识别
VPN / Proxy 检测
匿名网络检测
短时间内大量请求行为检测

几乎所有反爬虫系统都会用 MaxMind

4. FraudScore、FraudLogix

广告防作弊常用，识别：

虚假广告点击
模拟设备
程序化流量

5. Cisco Talos Intelligence

威胁情报
检测：

恶意软件感染
BOT
被攻击的受害者 IP

二、反爬虫 / 反欺诈平台的数据源

这些平台有超大规模客户数据，覆盖面广：

1. Cloudflare Threat Score

Cloudflare 的威胁情报基于：

数千万人站点访问数据
Bot 值（BOT Score）
爬虫评估
代理标记
IP 行为历史

2. Akamai Threat Labs

提供：

TOR
爬虫行为
DDoS 参与
被入侵终端设备标记

3. PerimeterX / Human Security

首选广告、APP 防作弊平台
能检测：

浏览器指纹伪造
自动化工具
Selenium/Headless
住宅代理节点滥用

4. Datadome、Kasada

反爬虫重量级
基于：

全网指纹
IP 行为画像
再识别技术

三、网络安全威胁情报（多个国家与机构）

这些机构提供恶意行为数据库：

1. AbuseIPDB

非常常用，众包 + 安全厂商联合数据
评分基于：

DDoS
扫描器
登录暴力破解
恶意脚本行为

2. FireHOL IP Blacklist

整合多个威胁情报源

3. AlienVault OTX

全球最有名的威胁社区
标记：

恶意软件感染
僵尸网络
钓鱼站
C2 控制节点

4. TOR Project Exit Node List

标记 TOR 出口节点

四、商业数据库（付费订阅）

代理商会购买这些数据库以提升检测能力：

1. IP2Proxy / IP2Location

提供：

是否为代理 / VPN
是否为 Residential IP
是否为 Hosting Provider
ASN 信息
区域一致性检查

2. Neustar IP Intelligence

用于广告投放风控
包含：

住宅/商业/IDC 分类
代理识别
风险评分

3. ThreatMetrix

全球最大数字身份图库之一
基于：

全球设备指纹
行为特征
匿名化网络检测

五、浏览器指纹数据（高级代理商才有）

顶级代理商（例如 BrightData、Oxylabs）会结合浏览器指纹评分：

IP + 浏览器指纹匹配度：

浏览器时区与 IP 所属时区是否一致
浏览器语言与 IP 所在地区是否一致
Canvas / WebGL 指纹是否可疑
User-Agent 与 IP 地区是否匹配（国别差异很明显）

六、代理商自有风控数据（核心资产）

这是每个大代理商最重要的“黑箱技术”：

自研评分包含：

① 节点成功率

访问常见网站是否被封
403/429 频率
目标站点 ban 率

② 用户行为异常

同一 IP 同时被多个用户使用
大量请求频率不正常
非浏览器模式行为
重复 Cookie、重复指纹行为

③ 节点画像

国内外访问比例
同 ASN 下是否存在反爬高频行为
IP 是否曾用于爬虫/抓取工具

④ 节点稳定性

漏洞感染风险（如 Windows SMB）
NAT 类型
带宽稳定性
端口开放情况（恶意软件特征端口）

综合后形成的 IP Risk Score（0-100 或 0-1）

大代理商通常是：

权威数据库 + 行业库 + 自有流量特征 → 综合评分

举例（简化版公式）：

RiskScore = 
 0.35 * (IPQS score)
+0.20 * (MaxMind risk)
+0.15 * (Spamhaus SBL/PBL/XBL)
+0.10 * (AbuseIPDB)
+0.20 * (自研行为数据)

这样就能稳定识别出：

数据中心代理
被滥用的住宅代理
安全隐患 IP
VPN / 路由器代理
僵尸网络节点
TOR 出口节点

下面给你一套实战可落地的方案：从架构、数据源、特征、评分引擎（规则+ML）、离线/在线实现、指标/阈值、验证与监控，到示例代码和 SQL。目标是让你能自己搭建一个生产级的 IP 质量评分系统（IP Risk / IP Reputation），适用于住宅代理池、反欺诈或流量分级。

一、总体思路（两层架构）

把系统拆成两层，简单有效、便于迭代：

规则引擎（Rule-based） — 快速拦截已知高风险（黑名单、TOR、数据中心、Spamhaus 等）。低延迟、易解释，作为第一道防线。
评分引擎（Model-based） — 将多个信号合并为综合评分（0-100），可用简单的加权规则或机器学习（逻辑回归、GBDT）。

两者结合可取得高召回和高可解释性：先用规则剔除绝大多数明显风险，再用模型细化评分和分类。

二、数据源（必须/可选）

必须（立即接入）：

GeoIP（MaxMind GeoIP2 免费/付费） → 国家、ASN、城市、ISP、是否数据中心标记
Spamhaus / Abuse lists（SBL, XBL, PBL, AbuseIPDB） → 黑名单标记
公共威胁情报（AlienVault OTX, CiscoTalos）
TOR exit node list
自有流量日志（最关键）：HTTP 状态、响应码分布、失败率、请求频率、会话时长、目标站点列表、user-agent、cookie、TLS 指纹等

可选/增强：

IPQS/IP2Proxy、Neustar、IP2Location 等商业库（VPN/Proxy/Hosting 判定）
Browser fingerprint signals（时区/语言/UA 与 IP 是否匹配、Canvas 指纹）
被封/被ban历史（目标站点返回 403/429 的记录）
节点稳定性指标（在线时长、平均带宽、丢包率）
WHOIS/ASN 历史（是否属于家宽 ASN）
设备级信号（若有家庭节点 agent，可上报 CPU、OS、connected SSIDs）

三、关键特征（Feature）——按优先级

黑名单/标记类（binary）：Spamhaus, AbuseIPDB, TOR, commercial proxy flag, data-center flag.
地理/网络一致性：IP.country == claimed country? ASN type (ISP vs Hosting). ASN age, prefix size.
行为统计（从日志统计，窗口：1m/10m/1h/24h）：
- 请求频率（req/s）
- 成功率（2xx / 总请求）
- 错误率（4xx、5xx 比例）
- 403/429 出现次数与频率（针对目标站点）
- 新会话比例 vs 长会话比例
- 用户代理多样性（同IP下UA数量）
历史信誉：
- 过去 7/30/90 天 ban 率
- 下线/丢失率（节点不稳定）
指纹一致性（若可得）：
- 浏览器时区/语言 vs IP 地理是否匹配
- TLS 指纹异常
网络特征：
- TTL/TTL 分布（可能被 NAT/CGNAT 修改）
- 端口扫描 / 非 HTTP 协议异常
经济/商业信号：是否通过 SDK 得到（已知渠道），是否来自付费路由器合作等（正向信号）

四、评分策略（示例：混合规则 + ML）

设计思路：先用规则打低风险/高风险标签，然后用模型输出 0-100 分。

示例简化流程：

规则判定 HIGH_RISK（立即 block / score=100）：
- 在 Spamhaus SBL/XBL 列表或 AbuseIPDB 分超过阈值
- 在 TOR exit list
- 商业库标注为 DATA_CENTER 且 ASN 属于云厂商
规则判定 LOW_RISK（score=0）：
- ASN 属于大型 ISP（residential ASN），且历史 30 天内 ban_rate < 0.1% 且稳定性好
其余进入模型：features -> 标准化 -> 模型（LR / GBDT） -> 输出概率 p -> map 到 RiskScore = round(p * 100)

简单线性加权（规则替代模型）：

score = clamp(
  40*blacklist_flag + 
  25*data_center_flag + 
  15*ban_rate(0-1) + 
  10*error_rate(0-1) + 
  10*(1 - stability(0-1)),
  0, 100)

五、离线训练与在线服务（架构）

架构要点（生产实践）：

离线数据仓库：ClickHouse/BigQuery/ClickHouse 用于存储访问日志；用于 batch 特征计算与模型训练。
特征流水线：Airflow 或 Prefect 调度的 ETL，按窗口（1m/10m/1h/1d）生成特征表。
模型训练：使用 scikit-learn / xgboost / lightgbm。保留验证集和回测历史（A/B test）。
实时评分服务：FastAPI / gRPC microservice，加载模型与规则库，提供 low-latency scoring。
缓存层：Redis 缓存 IP 最近评分（TTL 1-10 min），减轻实时服务压力。
控制台 & 可视化：Grafana + Prometheus 或 Kibana，用于显示 IP 分布、拒绝率、误判率。
黑名单同步：评分到阈值后，自动写入防火墙 / WAF / 路由器 / CDN 黑名单。

六、阈值与策略示例（生产建议）

score >= 90 → 高风险：block 或 require captcha / 触发人工审核。
70 <= score < 90 → 高风险：降权/限制速率/验证。
40 <= score < 70 → 中风险：限速、观察、二次验证。
score < 40 → 低风险：允许。

阈值应基于业务容忍度与 A/B 实验不断调优。

七、模型与训练细节

标签（supervised）：用历史日志标注 bad（经常被目标站点封禁/大量 403/429/事务异常）与 good（长期稳定低错误率）。
特征工程：滑动窗口聚合（count, rate, unique_count, pct_403等），one-hot 国家/ASN，embedding（若ASN很多）。
训练流程：分时间切片做回测（time-based CV），避免数据泄漏。
模型选择：先用逻辑回归做 baseline（解释性好），再用 LGBM 提升效果。
输出校准：用 Platt scaling 或 isotonic regression 校准概率（便于映射到 0-100 分）。

八、示例表结构（ClickHouse / Postgres）

示例 access_logs（原始）：

CREATE TABLE access_logs (
  ts DateTime,
  ip String,
  target_host String,
  http_status UInt16,
  user_agent String,
  bytes_sent UInt32,
  latency_ms UInt32,
  session_id String
);

示例 ip_features_hourly（汇总特征）：

CREATE TABLE ip_features_hourly (
  hour DateTime,
  ip String,
  req_count UInt32,
  pct_2xx Float32,
  pct_4xx Float32,
  pct_403 Float32,
  unique_ua UInt16,
  avg_latency Float32,
  ban_count UInt16,
  stability_score Float32,
  is_blacklist UInt8,
  country String,
  asn UInt32
);

九、关键 SQL 示例（ClickHouse 风格）

计算一小时内请求数与 403 比率：

SELECT
  ip,
  toStartOfHour(ts) as hour,
  count() as req_count,
  sum(http_status >= 200 AND http_status < 300)/count() as pct_2xx,
  sum(http_status = 403)/count() as pct_403,
  uniqExact(user_agent) as unique_ua
FROM access_logs
WHERE ts >= now() - INTERVAL 1 HOUR
GROUP BY ip, hour;

十、示例 Python 实现（极简版评分函数）

下面代码示例展示如何把规则 + 模型结合（假设你已经训练好了 model）：

# requirements: scikit-learn, lightgbm, pandas, joblib
import math
import joblib
import numpy as np

# 加载模型（示例）
model = joblib.load("ip_risk_model.pkl")  # 输出概率

def rule_check(ip_meta):
    # ip_meta 包含 is_spamhaus, is_tor, is_datacenter 等布尔
    if ip_meta.get("is_spamhaus") or ip_meta.get("is_tor"):
        return 100  # 强制最高风险
    if ip_meta.get("is_datacenter") and ip_meta.get("is_proxy_flag"):
        return 95
    return None

def compute_score(feature_vector, ip_meta):
    # 规则优先
    r = rule_check(ip_meta)
    if r is not None:
        return r

    # 假设 feature_vector 已经是模型要求的数组
    p = model.predict_proba([feature_vector])[0][1]  # 取正类概率
    # calibration / mapping
    score = int(round(p * 100))
    # 再加上少量规则惩罚（示例）
    if ip_meta.get("asn_risk_score"):
        score = min(100, score + int(ip_meta["asn_risk_score"] * 10))
    return score

十一、评估与上线策略

回测（backtest）：用历史数据比对不同阈值下的 TP/FP/TPR/FPR。
灰度发布：先对 1% 流量启用评分并记录决策影响，再扩大到 10%/50%/100%。
A/B 测试：一组继续按旧策略，一组用新评分，比较成功率与误封率。
常态化监控：监控每天被 block IP 数量、误封申诉数量、平均响应延迟、模型漂移（数据分布变化）。

十二、运维与自动化

定期（每日/小时）重新训练模型并自动回测，若性能提升则自动部署。
黑名单同步到 WAF、CDN（Fastly/Cloudflare API）与防火墙。
提供人工复核控制台（查看 IP 的历史行为、评分依据、可手动白/黑名单）。
自动报警：当误报率、拒绝率或 ban 率异常提升时告警。

十三、隐私与合规注意

避免保存过度个人数据（如完整 cookie、个人信息）。
若使用商业威胁情报服务，确保合约允许你用于风控目的。
在某些地区（GDPR 等）使用指纹/行为数据需注意合规与用户同意。

十四、扩展高级方向（可选）

在线学习：使用 streaming 更新特征与模型（例如 River 库）以快速适应新态势。
图谱关联：把 IP、ASN、设备指纹、session 做图谱分析，识别协同行为群体（Graph ML）。
可解释性：用 SHAP 展示模型对某 IP 的贡献特征，方便人工复核与合规。
反向验证：对“低风险”IP随机抽样发起深度探测，验证其真实性（地理连通性、延时、traceroute 等）。

十五、快速落地清单（最小可行产品 MVP）

把原始 access_logs 收集到 ClickHouse（或 Postgres）。
接入 GeoIP + Spamhaus + TOR 列表。
写一个小时级的特征聚合任务（Airflow）。
写一个简单规则引擎（黑名单、数据中心直接标注）。
用历史标签训练逻辑回归模型，部署到 FastAPI。
在生产用 Redis 缓存评分、并在 Nginx/WAF 接入评分决策。
建 Grafana 仪表盘监控关键指标（block rate、false positive 报告等）。

各种软件和操作系统中内置的根证书名单

作者: forthxu
时间: June 26, 2025
分类: 默认分类
评论

各种软件和操作系统中内置的根证书名单（Trusted Root CA List），以下是一些主流渠道和资源：

1. 操作系统内置根证书名单

| 操作系统 | 获取方式 & 说明 |
| --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Windows | 通过“证书管理器”（certmgr.msc）查看“受信任的根证书颁发机构”证书列表
官方文档：Microsoft Trusted Root Program |
| macOS / iOS | 使用“钥匙串访问”应用查看“系统根证书”
苹果官方根证书：Apple Root Certificate Program |
| Linux | 各发行版使用不同，通常在 /etc/ssl/certs 或 /usr/share/ca-certificates 目录下
例如 Debian/Ubuntu: ca-certificates 包 |
| Android | 根证书存储在系统内置密钥库
Android官方证书列表：Android CAs |

2. 浏览器内置根证书名单

| 浏览器 | 根证书来源 & 获取方式 |
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Mozilla Firefox | 自带独立的根证书存储，称为 NSS（Network Security Services）库
根证书列表：Mozilla Included CA Certificates |
| Google Chrome | 使用操作系统的根证书库（Windows、macOS、Linux）
部分版本会内置自己的根证书（Chromium） |
| Microsoft Edge | 依赖 Windows 根证书库 |
| Safari | 依赖 macOS/iOS 根证书库 |

3. 各大CA官方根证书库

4. 公共根证书数据库

CA/Browser Forum Root Store Policy：https://cabforum.org/
Mozilla CA Certificate Program：https://wiki.mozilla.org/CA
Microsoft Trusted Root Program Participants：https://learn.microsoft.com/en-us/security/trusted-root/participants-list
Google Root Program（Chromium）：https://chromium.googlesource.com/chromium/src/+/refs/heads/main/net/data/ssl/certificates/

5. 实用工具和命令

Windows: 打开命令行执行
```
certutil -viewstore root
```
Linux (Debian/Ubuntu):
```
ls /etc/ssl/certs/
```
macOS:

打开“钥匙串访问”，选择“系统根证书”查看。

总结

| 平台/软件 | 根证书名单获取途径 |
| ------------ | ------------------------- |
| Windows 系统 | certmgr.msc、certutil、微软官网 |
| macOS/iOS 系统 | 钥匙串访问、苹果官网 |
| Linux 发行版 | ca-certificates 包，系统目录 |
| Firefox 浏览器 | Mozilla CA DB 官网 |
| Chrome 浏览器 | 依赖操作系统，Chromium源码 |
| 各大CA | 各CA官方证书下载页 |

需要帮你定位某个具体系统或软件的根证书名单，我可以帮你直接找链接或者导出列表。你想看哪一个？

币圈U卡最新评测

作者: forthxu
时间: June 18, 2025
分类: 默认分类
评论

一图学会挑U卡？
币圈人手一张的资金护城河

1 U 卡是什么？
U卡就是将 USDT（稳定币）绑定到一张实体卡或虚拟卡，你可以像用普通银行卡一样，刷卡消费、绑定 Apple Pay、提现法币。很多是由交易所或加密支付公司发行的。
2 为什么币圈需要 U 卡？
• 避开黑钱风险：卡来自官方渠道，绑定实名信息，资金流清晰，降低合规风险。
• 提升出入金自由度：很多U卡支持多币种自动兑换，甚至可以刷卡消费，无需先转到银行账户。
• 全球可用：支持 Visa/Master 网络，海外吃饭购物直接刷，再也不用到处找OTC。
• 对接Web2生活场景：可绑定微信、支付宝、Apple Pay，落地性强。

3 到底选哪张U卡？一图看懂

币圈U卡最新评测

| 平台 | 办理条件 | 开卡费用 | 充值手续费 | 外汇手续费 | 加密货币兑换费用 | ATM取现手续费 | 支持支付 | 备注 |
|-------------|---------------------------------------|------------|----------------|------------|--------------------------------------|--------------------------|----------------|------------------|
| Redotpay | 海外身份 | 100U | 免费 | 2% | / | 1% | 支付宝 / 微信 | |
| Bybit | 身份认证+地址证明（地址建议意大利） | 免费 | 免费 | 1% | 加密货币兑换费用：0.90% | 2%（超100U） | 不支持 | |
| Crydit | 身份认证 | 199U | 1.5% ~ 12U | 2% | / | 2% | 支付宝 / 微信 | |
| Bitget | 身份认证+地址证明（仅限VIP或要求高） | 免费 | 免费 | 1.6%（新增账号以外） | / | 2%（超126.65U） | 支付宝 / 微信 | |
| Coinbase | 身份认证（Coinbase One会员） | 499U/年 | / | / | / | / | 2025年底提现（重金4%返现） | |
| Solayer | 身份认证 | 75U | 1% | 1% | / | / | 支付宝 / 微信 | |
| Pokepay | 身份认证 | 5U | 免费 | / | 2% | / | 支付宝 / 微信 | |
| Metamask | 海外身份 | 免费 | 免费 | / | usdt兑换0.02U / #usdt兑换0.8% | / | 支付宝（有消息返现） | |
| Safepal | 身份认证 | 免费 | 0.6% ~ 1% | / | / | / | 支付宝 / 微信 | |
| imToken | 身份认证 | 免费 | 1% | 0 | / | / | 支付宝 / 微信 | |

U卡最新评测

https://x.com/onchainsea/status/1934846556089536705?t=nMjVVbe4UY0xf6x7-NqvAw&s=19

项目规划

作者: forthxu
时间: April 22, 2025
分类: 默认分类
评论

项目规划

原文链接：https://aping-dev.com/index.php/archives/13/

1. 主体流程

1.1. 前期调研

体验竞品功能，形成对竞品的初步认知。
结合可用资源（如已有的 API）确定产品需实现的功能。
列出功能的利弊，形成功能调研文档。

1.2. 概要设计

列出功能点，描述总体流程。
使用流程图、时序图、架构图、交互图进行说明。
实现基础框架和各模块之间的交互接口。

1.3. 详细设计

将各模块功能分配给负责人，必要时进一步调研并给出详细设计。
细化各功能点，列出多种实现方案及其利弊。
涉及产品决策时，需及时与产品沟通。
调用第三方接口时，需充分调研其行为。
设计数据库表时，根据 SQL 语句设计索引，并使用 EXPLAIN、SHOW PROFILES 等进行验证。
涉及状态转换时，绘制状态图并考虑各种状态冲突。
考虑各种异常情况，如并发、程序崩溃、数据库错误、消息队列消息丢失等，并设计相应的重试机制和幂等处理。
在操作资源前，判断资源是否存在、状态是否正常等。

1.4. 技术评审

各模块负责人完成详细设计后，进行技术评审，讨论实现方案，及时发现方案的不足。
模块负责人需提前准备方案的利弊分析，并参考评审意见形成最终决策。
交流时，需从基本信息开始讲述，不假设参与者已知相关内容。

1.5. 开发

开发过程中注意抽象相似的函数或模块，避免重复改动，便于后续开发。
遵循约定的编程规范。
设计变动时，及时同步更新文档，方便协作。

1.7. 代码 Review

代码 Review 是一项辛苦的工作，需感谢提出意见的同事。

1.8. 测试

单元测试：使用 Python 的 unittest.TestCase 对各个单元进行测试，保持测试代码的整洁。
模块自测：包括接口参数、功能、内部逻辑测试，可使用 Postman 进行 API 测试，保持测试代码的整洁。
交叉测试：
1. 了解对方模块的功能，通过文档或面谈。
2. 自行设计，比较与模块负责人的差异，审视彼此的缺陷考虑，有助于提升。
3. 模块负责人修复交叉测试的 Bug 后，需对涉及的接口进行回归测试，防止引入新的 Bug。
集成测试：对整个系统进行测试，确保各模块协同工作正常。

1.9. 上线

项目打包、部署，进行灰度上线。

1.10. 内测

专门的测试人员对上线项目进行测试，及时发现问题。

1.11. 公测 & 运营

向用户提供优惠活动，吸引用户体验并反馈问题。
产品经理撰写文案，介绍产品特点、优势，为产品造势。

1.12. 项目回顾

总结项目中的良好实践，形成经验积累，在未来项目中推广。

2. 并行流程

2.1. 人力预估

对投入的人力进行大致估计，随着项目进展，可能需要调整。

2.2. 行为规范

提前告知开发人员需遵循的规范，包括：
- 编程规范，如编码风格约定。
- 各类文档撰写规范，包括需求文档、实现文档、接口文档、测试文档等。

2.3. 进度预估

包括总体项目完成时间、各阶段完成时间、各模块负责人对各自模块完成时间的预估，及时发现风险点。

2.4. 资源申请

申请所需资源，如域名、运维机器、配额等。

2.5. 进度对齐

在约定的时间点进行进度对齐，及时发现风险点。

监控系统评估和推荐

作者: forthxu
时间: April 9, 2025
分类: 默认分类
评论

主流监控系统
├── 老牌监控系统
│   ├── Zabbix         ← 仍然活跃，适合传统架构
│   ├── Nagios         ← 已过时，扩展性差
│   ├── Cacti          ← 图形展示强，功能有限
│   ├── Ganglia        ← 偏 HPC，基本没人用了
│   ├── Grafana        ← 可视化工具，不是监控系统本身
│
├── 新一代监控系统
│   ├── Open-Falcon    ← 国产经典，可替代但不再活跃开发
│   ├── Nightingale    ← Falcon 的升级版，支持 PromQL，国产强推荐
│   ├── Prometheus     ← 云原生主流，生态活跃，标准事实
│   ├── VictoriaMetrics← 高性能时序数据库，Prometheus 替代或后端
│   ├── Thanos         ← Prometheus 的水平扩展方案
│   ├── Cortex         ← 类似 Thanos，多租户 Prometheus 集群
│   ├── Loki           ← 日志聚合系统，与 Prometheus/Grafana 配合使用
│   ├── Alertmanager   ← Prometheus 告警模块
│   ├── Telegraf       ← 数据采集工具，支持输出 Prometheus/InfluxDB
│   ├── Vector         ← 数据管道，采集/转换/转发日志和指标
│   ├── Datadog        ← 商业化 SaaS，开箱即用，价格不低
│   ├── Grafana Cloud  ← Grafana 官方 SaaS，集成 Prom/Loki/Tempo 等

了解主流网络数据采集技术和框架工具，如Selenium/Puppeteer/PhantomJS/Scrapy等

作者: forthxu
时间: April 3, 2025
分类: 默认分类
评论

主流的网络数据采集（Web Scraping）技术和框架工具主要用于从网页自动提取数据。以下是几种常见的工具及其特点：

1. Selenium

适用场景：动态页面抓取（JS渲染）、自动化测试
特点：

主要用于自动化浏览器操作，如模拟用户输入、点击、滚动等。
支持多种浏览器（Chrome、Firefox、Edge等）。
适用于爬取依赖 JavaScript 渲染的网页。
可结合 WebDriver 进行无头浏览（Headless 模式）。

缺点：

运行效率较低，比直接解析 HTML 或使用异步框架慢。
需要安装浏览器驱动（如 ChromeDriver）。

适用语言：Python、Java、C#、JavaScript 等

2. Puppeteer

适用场景：无头浏览器抓取、截图、网页自动化
特点：

由 Google 开发，专为控制 Chrome/Chromium 设计。
支持无头模式，执行速度比 Selenium 快。
适用于爬取动态加载数据的网站。
可以用来生成 PDF、截取网页截图、自动填写表单等。

缺点：

仅支持 Chromium 浏览器，不支持其他浏览器。
需要 Node.js 环境。

适用语言：JavaScript/TypeScript

3. PhantomJS（已停止维护）

适用场景：早期的无头浏览器爬虫
特点：

早期的无界面 WebKit 浏览器，支持 JavaScript 渲染。
可以用于网页截图、测试、抓取动态网页内容。
由于 Puppeteer 和 Headless Chrome 出现，PhantomJS 已被淘汰。

适用语言：JavaScript

4. Scrapy

适用场景：大规模数据采集、高效爬虫
特点：

Python 爬虫框架，异步高效，适用于大规模数据抓取。
内置强大的爬取调度、数据清洗、持久化存储功能。
可扩展性强，支持中间件、自定义代理、Cookies 处理等。
适用于结构化数据采集，如电商、新闻网站等。

缺点：

不支持 JavaScript 渲染（需结合 Selenium 或 Splash）。
学习曲线较高，适合有一定爬虫经验的开发者。

适用语言：Python

对比总结

| 工具/框架 | 适用场景 | 语言 | 是否支持动态页面 | 运行效率 | 适用规模 |
|---|---|---|---|---|---|
| Selenium | 自动化测试、动态网页抓取 | 多种 | ✅ | 较慢 | 小规模 |
| Puppeteer | 无头浏览器爬取、自动化 | JS/TS | ✅ | 较快 | 中等 |
| PhantomJS（淘汰） | 早期无头浏览器爬取 | JS | ✅ | 较慢 | 小规模 |
| Scrapy | 高效爬取静态网页 | Python | ❌（需扩展） | 高 | 大规模 |

如果需要爬取动态网页（如 AJAX、Vue、React 生成的内容），建议使用 Puppeteer 或 Selenium。
如果目标网站是静态 HTML，Scrapy 更高效，适合大规模数据采集。

下面是 Scrapy、Puppeteer 和 Selenium 的安装及简单使用示例。

1. Scrapy

安装

Scrapy 需要 Python 环境（推荐 Python 3.8+）。安装方式：

pip install scrapy

创建 Scrapy 爬虫

创建 Scrapy 项目

scrapy startproject my_scraper
cd my_scraper

生成爬虫

scrapy genspider example example.com

编辑 spiders/example.py

import scrapy

class ExampleSpider(scrapy.Spider):
    name = "example"
    allowed_domains = ["example.com"]
    start_urls = ["https://example.com"]

    def parse(self, response):
        title = response.css("title::text").get()
        yield {"title": title}

运行爬虫

scrapy crawl example

2. Puppeteer

安装

Puppeteer 需要 Node.js 环境，安装方法：

npm install puppeteer

使用 Puppeteer 采集网页数据

创建 puppeteer_scraper.js：

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('https://example.com');

    // 获取标题
    const title = await page.title();
    console.log(`Title: ${title}`);

    await browser.close();
})();

运行 Puppeteer 爬虫

node puppeteer_scraper.js

注意：Puppeteer 默认会下载 Chromium，如需使用已有的 Chrome 浏览器：

npm install puppeteer-core

然后在 launch() 时指定可执行路径：

const browser = await puppeteer.launch({ executablePath: '/path/to/chrome' });

3. Selenium

安装

Selenium 需要 webdriver，例如 ChromeDriver：

pip install selenium

然后下载 ChromeDriver 并确保它与 Chrome 版本匹配：下载地址。

使用 Selenium 采集网页数据

创建 selenium_scraper.py：

from selenium import webdriver

# 配置 WebDriver（确保 chromedriver 在 PATH 中）
driver = webdriver.Chrome()

# 访问网页
driver.get("https://example.com")

# 获取标题
title = driver.title
print("Title:", title)

# 关闭浏览器
driver.quit()

运行 Selenium 爬虫

python selenium_scraper.py

无头模式（不弹出浏览器）：

from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("--headless")  # 无头模式
driver = webdriver.Chrome(options=options)

工具对比

| 工具 | 适用场景 | 语言 | 动态内容支持 | 运行速度 | 适用规模 |
|---|---|---|---|---|---|
| Scrapy | 静态网页爬取 | Python | ❌ | 高 | 大规模 |
| Puppeteer | 动态网页爬取 | JavaScript/Node.js | ✅ | 中 | 中等 |
| Selenium | 自动化测试、动态爬取 | Python/Java | ✅ | 低 | 小规模 |

推荐选择：

Scrapy：适用于 大规模数据采集（如新闻、电商）。
Puppeteer：适用于 前端渲染的网站（如 Vue/React）。
Selenium：适用于 需要模拟用户行为 的情况（如登录、点击）。

Prometheus

作者: forthxu
时间: March 17, 2025
分类: 默认分类
评论

Prometheus

Prometheus 是一个开源的监控和告警系统，最初由 SoundCloud 于 2012 年开发，后来捐赠给了 CNCF（云原生计算基金会），成为 Kubernetes 生态中事实上的标准监控工具。它以高效的时间序列数据库、强大的查询语言和灵活的架构著称，特别适合动态云环境和微服务架构的监控。

核心特性
时间序列数据：
Prometheus 存储所有数据为时间序列，即带有时间戳的指标（metrics），每个指标由名称和一组键值对标签（labels）标识。
示例：http_requests_total{method="GET", endpoint="/api"} 1234。
Pull 模型：
通过 HTTP 定期从配置的目标（exporter 或应用）拉取指标，而不是依赖推送。
目标暴露 /metrics 端点，提供 Prometheus 格式的数据。
PromQL：
提供强大的查询语言（Prometheus Query Language），用于分析和聚合时间序列数据。
示例：rate(http_requests_total[5m]) 计算每秒请求速率。
多维数据模型：
通过标签支持多维度分析，比如按服务、实例、环境等过滤数据。
内置告警：
通过 Alertmanager 组件处理告警，支持多种通知方式（如邮件、Slack、Webhook）。
服务发现：
支持静态配置和动态服务发现（DNS、Consul、Kubernetes 等），适应云原生环境。
架构组件
Prometheus Server：核心服务，负责拉取、存储和查询数据。
Exporters：辅助工具（如 node_exporter、blackbox_exporter），暴露目标系统的指标。
Pushgateway：可选组件，用于支持短生命周期任务的指标推送。
Alertmanager：处理告警，分组、抑制和发送通知。
客户端库：支持多种语言（如 Go、Java、Python），便于应用自定义指标。

监控节点指标

https://github.com/prometheus/node_exporter

监控http服务器访问情况

https://github.com/prometheus/blackbox_exporter

监控服务器网络状况
https://github.com/prometheus/snmp_exporter

其他监控
https://prometheus.io/docs/instrumenting/exporters/

这几个exporter可以单独运行，通过curl获取数据，改为 itdog.com 17ce.com ping.chinaz.com 类似的服务

详细比较下 Prometheus Zabbix Nagios Datadog

以下是对 Prometheus、Zabbix、Nagios 和 Datadog 的详细比较，从功能、架构、使用场景、优缺点等多个维度进行分析，帮助你理解它们的异同并选择合适的监控工具。

1. 概述

| 工具 | 类型 | 开源性 | 初创时间 | 核心目标 |
|-------------|--------------|--------|----------|------------------------------|
| Prometheus | 时间序列监控 | 开源 | 2012 | 云原生、微服务监控 |
| Zabbix | 综合监控 | 开源 | 1998 | 传统 IT 基础设施监控 |
| Nagios | 状态监控 | 开源 | 1999 | 服务和网络可用性检查 |
| Datadog | 云监控平台 | 商业 | 2010 | 企业级云和应用性能监控 |

2. 架构与数据模型

| 维度 | Prometheus | Zabbix | Nagios | Datadog |
|-------------|--------------------------------|-------------------------------|------------------------------|------------------------------|
| 数据模型| 时间序列（指标+标签） | 通用（数值、状态、日志） | 状态检查（up/down） | 时间序列+日志+追踪 |
| 采集方式| Pull（拉取） | Push（推送）+ Pull | Pull（插件执行） | Push（Agent）+ Pull |
| 存储 | 内置 TSDB（本地时间序列数据库）| 关系型数据库（如 MySQL） | 无内置存储，需扩展 | 云端存储（SaaS） |
| 查询语言| PromQL（强大且灵活） | 无（依赖 GUI 或 SQL） | 无（脚本输出） | 无（UI 和 API） |
| 扩展性 | 单机为主，可用 Thanos 集群 | 支持分布式（Proxy + Server） | 依赖插件，扩展性有限 | 云原生分布式，无需管理 |

Prometheus：强调时间序列数据，标签模型（如 cpu{host="server1"}）便于多维度分析。
Zabbix：数据类型多样，但时间序列支持不如 Prometheus 优化。
Nagios：专注状态检查，结果为布尔值或简单数值，无时间序列概念。
Datadog：综合模型，支持指标、日志和追踪，数据存储在云端。

3. 功能对比

| 功能 | Prometheus | Zabbix | Nagios | Datadog |
|--------------|------------------------------|-----------------------------|----------------------------|----------------------------|
| 指标采集 | Exporter 暴露 /metrics | Agent 推送或 SNMP/Zabbix Trap | 插件执行检查 | Agent 推送或集成 API |
| 服务探测 | 用 blackbox_exporter | 内置网络发现和探测 | 用插件（如 check_http） | 内置服务检查 |
| 告警 | Alertmanager（灵活路由） | 内置告警（邮件、短信等） | 插件触发通知 | 高级告警（AI 驱动） |
| 可视化 | 内置 UI + Grafana | 内置仪表盘 | 无内置 UI，需 Nagios XI | 强大内置仪表盘 |
| 服务发现| DNS、Consul、Kubernetes | 网络发现、LLD（低级发现） | 手动配置或脚本 | 自动发现（云服务集成） |
| 应用监控| 需客户端库（如 PHP、Java） | 内置模板或自定义脚本 | 需插件 | 强大 APM（代码级追踪） |

Prometheus：依赖 Exporter 生态，适合动态环境。
Zabbix：功能全面，内置多种采集方式。
Nagios：插件驱动，灵活但配置繁琐。
Datadog：一体化解决方案，强调 APM 和云集成。

4. 部署与维护

| 维度 | Prometheus | Zabbix | Nagios | Datadog |
|--------------|------------------------------|-----------------------------|----------------------------|----------------------------|
| 安装 | 二进制运行，配置 YAML | Server + Agent + DB | 核心 + 插件，需编译或包管理| Agent 安装，SaaS 无需部署 |
| 配置 | 文本文件（prometheus.yml） | GUI + 配置文件 | 配置文件 + 插件脚本 | UI 或 API 配置 |
| 维护成本 | 中等（需管理 Exporter） | 较高（数据库、代理维护） | 高（插件依赖） | 低（云端托管） |
| 依赖 | 无（单机运行） | 数据库（MySQL/PostgreSQL） | 无（但需扩展存储） | 云服务订阅 |

Prometheus：轻量但需手动配置 Exporter。
Zabbix：部署复杂，适合已有数据库环境。
Nagios：基础版简单，但扩展功能需额外工作。
Datadog：开箱即用，但需订阅费用。

5. 性能与扩展性

| 维度 | Prometheus | Zabbix | Nagios | Datadog |
|--------------|------------------------------|-----------------------------|----------------------------|----------------------------|
| 性能 | 高（TSDB 优化查询） | 中等（数据库瓶颈） | 低（插件执行效率低） | 高（云端优化） |
| 规模 | 单机数万指标，集群需扩展 | 支持数千设备（分布式部署） | 小规模（插件限制） | 大规模（云端无上限） |
| 高可用 | 需 Thanos/Cortex | 内置 HA（Server + Proxy） | 需 Nagios XI 或手动集群 | 默认高可用 |

Prometheus：单机性能优秀，大规模需额外工具。
Zabbix：分布式支持较好，但数据库可能成瓶颈。
Nagios：不适合大规模监控。
Datadog：云端扩展性强，无需操心。

6. 生态与集成

| 维度 | Prometheus | Zabbix | Nagios | Datadog |
|--------------|------------------------------|-----------------------------|----------------------------|----------------------------|
| 生态 | Exporter 丰富（100+） | 模板和插件 | 插件社区庞大 | 400+ 集成（AWS、Docker等） |
| 云原生 | 强（Kubernetes 首选） | 中等 | 弱 | 强（云服务优化） |
| 可视化 | Grafana（主流） | 内置或 Grafana | Nagios XI 或外部工具 | 内置强大仪表盘 |
| API | 强大（支持 PromQL 查询） | 支持 REST API | 有限 | 丰富 API |

Prometheus：云原生生态最佳。
Zabbix：传统 IT 生态丰富。
Nagios：插件生态强大但老旧。
Datadog：商业集成最全面。

7. 优缺点

| 工具 | 优点 | 缺点 |
|-------------|--------------------------------|-------------------------------|
| Prometheus | - 云原生支持强
- PromQL 灵活
- 开源免费 | - 单机存储有限
- 配置复杂 |
| Zabbix | - 功能全面
- 内置发现和告警
- 开源 | - 部署复杂
- 性能依赖数据库 |
| Nagios | - 简单易上手
- 插件灵活
- 开源 | - 无时间序列
- 扩展性差 |
| Datadog | - 开箱即用
- APM 强大
- 云集成好 | - 商业收费
- 数据不可控 |

8. 使用场景

| 工具 | 最佳场景 | 不适用场景 |
|-------------|--------------------------------|-----------------------------|
| Prometheus | - Kubernetes/微服务
- 动态环境 | - 超大规模单机
- 简单状态检查 |
| Zabbix | - 传统服务器
- 网络设备监控 | - 云原生动态环境
- 轻量需求 |
| Nagios | - 小规模状态监控
- 简单网络 | - 时间序列分析
- 大规模系统 |
| Datadog | - 企业云监控
- APM 需求 | - 预算有限
- 自托管需求 |

9. 实际案例对比

监控服务器是否正常：
- Prometheus：用 node_exporter，查询 CPU、内存。
- Zabbix：Agent 推送资源数据，GUI 配置告警。
- Nagios：用 check_cpu 插件检查状态。
- Datadog：Agent 自动采集，仪表盘展示。
网站可用性：
- Prometheus：用 blackbox_exporter 探测 HTTP。
- Zabbix：内置 Web 监控模板。
- Nagios：用 check_http 插件。
- Datadog：内置 Synthetic Monitoring。

10. 选择建议

预算有限 + 云原生：选 Prometheus，搭配 Grafana 和 Exporter。
传统 IT + 全面功能：选 Zabbix，适合网络和服务器监控。
简单状态检查：选 Nagios，适合小型环境。
企业级 + 云服务：选 Datadog，省心但需付费。

如果你是运维新手，建议从 Prometheus 开始，因为它在现代环境（如 Docker、Kubernetes）中应用广泛，社区活跃，学习资源丰富。如果有具体需求（比如监控某服务），可以告诉我，我帮你细化方案！

Zabbix 主要是推，Prometheus 主要是拉，Prometheus性能和可以支持的节点是否比较不足

你提到“Zabbix 主要是推，Prometheus 主要是拉”，这个观点是正确的，并且确实是两者架构设计的核心区别。接下来，我会详细分析这个区别，并回答你的问题：Prometheus 在性能和支持节点数方面是否比较不足。我们将从工作模式、性能表现和可扩展性三个方面展开讨论。

1. Zabbix 主要是推，Prometheus 主要是拉

Zabbix：推模式（Push）为主

机制：
- Zabbix Agent 默认采用主动模式（Active Mode），通过 ServerActive 配置，Agent 主动将数据推送给 Zabbix Server（默认端口 10051）。
- Agent 定期采集数据（如 CPU、内存），然后异步发送到 Server。
优势：
- Server 无需轮询大量主机，负载较低，适合分布式环境。
- 在防火墙或 NAT 场景下，Agent 只需能访问 Server，网络配置简单。
劣势：
- 数据推送频率受限于 Agent 配置（如 1 秒到数分钟），实时性稍弱。
- Server 端依赖数据库（如 MySQL），高并发推送可能导致瓶颈。

Prometheus：拉模式（Pull）为主

机制：
- Prometheus 通过 scrape_configs 配置，从目标（如 /metrics 端点）定期拉取数据（默认端口 9090）。
- 依赖 Exporter（如 node_exporter）暴露指标，Prometheus 主动抓取。
优势：
- 实时性强，Server 控制采集频率（默认 15 秒，可调）。
- 动态服务发现（如 Kubernetes SD）适应云原生环境，目标变更无需手动调整。
劣势：
- Server 需访问所有目标的端口（如 9100），在复杂网络中配置防火墙较麻烦。
- 高频拉取可能增加 Server 负载，尤其在目标数量庞大时。

小结

Zabbix 的推模式更适合传统静态环境或网络受限场景，减轻 Server 负担。
Prometheus 的拉模式更适合动态、云原生环境，强调实时性和灵活性。

2. Prometheus 的性能表现

Prometheus 的性能特点

时间序列数据库（TSDB）：
- Prometheus 使用内置的高效 TSDB，专为时间序列数据设计，查询和存储性能优异。
- 数据压缩和索引优化，支持高频采集（如每秒采样）。
单机性能：
- 单实例 Prometheus 可处理数万指标（metrics），每秒处理百万级样本（samples）。
- 官方测试表明，单机可支持 80-100 万 samples/sec（取决于硬件和配置）。
查询能力：
- PromQL 提供强大的查询功能（如 rate()、increase()），支持复杂分析，且响应快。
局限性：
- 默认存储为本地磁盘，默认保留 15 天，数据量大时需清理或扩展。
- 单机内存占用较高（因数据驻留内存以加速查询）。

Zabbix 的性能特点

关系型数据库：
- Zabbix 依赖外部数据库（如 MySQL、PostgreSQL），性能受限于数据库优化。
- 数据写入和查询效率不如专用 TSDB，尤其在高并发下。
单机性能：
- 单机 Zabbix Server 可支持约 1 万个设备（10,000 nodes），采集频率 1 分钟时表现良好。
- 高频采集（如 1 秒）或大规模部署时，数据库可能成为瓶颈。
查询能力：
- 无专用查询语言，依赖 GUI 或 SQL，复杂分析能力较弱。

性能对比

采集效率：
- Prometheus 的拉模式更高效，支持高频采样（秒级），适合实时监控。
- Zabbix 的推模式效率稍低，频率受 Agent 配置限制。
查询速度：
- Prometheus 的 TSDB 和 PromQL 远超 Zabbix 的数据库查询性能。
负载分布：
- Zabbix 推模式将负载分散到 Agent，Server 压力小。
- Prometheus 拉模式集中负载在 Server，单机能力强但扩展复杂。

小结

Prometheus 在单机性能上更强，尤其在高频采集和查询方面优于 Zabbix。
Zabbix 的性能受数据库限制，但在小规模、低频场景下表现稳定。

3. Prometheus 支持的节点数是否不足

Prometheus 的扩展性

单机限制：
- 单实例支持约 1000-2000 个目标（targets，如节点或服务），具体取决于指标数量和采集频率。
- 超过此规模，内存和 CPU 可能饱和。
扩展方案：
1. Federation（联邦）：
  - 多实例 Prometheus 分担目标，高级实例聚合数据。
  - 适合分区域或分层监控。
2. Thanos/Cortex：
  - Thanos 提供全局视图和长期存储，支持无限节点。
  - Cortex 增强水平扩展，可处理百万级指标。
- 服务发现：
  - 动态发现（如 Kubernetes SD）支持节点自动增减，无需手动配置。
实际案例：
- Kubernetes 集群中，Prometheus 配合 Operator 和 Thanos，可监控数千节点和数十万 Pod。

Zabbix 的扩展性

单机限制：
- 单 Server 支持约 1 万个设备（10,000 nodes），但高负载下需优化数据库。
扩展方案：
- Zabbix Proxy：
  - 分布式代理收集数据并推送给 Server，支持大规模部署。
  - 部署简单，适合传统分布式环境。
- 数据库优化：
  - 使用分区表或 TimescaleDB 提升性能。
实际案例：
- 大型企业用 Proxy 架构监控数万设备（如服务器、网络设备）。

节点支持对比

单机能力：
- Prometheus 单机支持的目标数（1000-2000）少于 Zabbix（约 1 万），但指标密度更高。
扩展性：
- Prometheus 通过 Thanos/Cortex 可扩展到数万节点甚至更多，潜力更大。
- Zabbix 通过 Proxy 也能支持数万设备，但数据库仍是瓶颈。
动态性：
- Prometheus 的服务发现更适合动态环境（如 Kubernetes）。
- Zabbix 的 LLD（低级发现）功能较静态，适应性稍逊。

小结

单机状态下，Prometheus 支持的节点数可能少于 Zabbix，但性能更优。
扩展后，Prometheus（借助 Thanos/Cortex）支持的节点数和灵活性超过 Zabbix。

4. 综合评价

Prometheus 性能和节点支持是否不足？

性能：
- Prometheus 的 TSDB 和拉模式使其在高频采集和查询上表现卓越，不足之处在于单机内存占用较高。
- Zabbix 的推模式性能稳定，但数据库限制了高负载场景。
节点支持：
- 单机下，Prometheus 支持的节点数不如 Zabbix，但通过扩展工具（如 Thanos），其能力远超 Zabbix。
- Zabbix 的 Proxy 架构扩展性强，但受限于数据库和传统设计。

适用场景

Prometheus：
- 适合云原生、动态环境（如 Kubernetes），追求高性能和实时性。
- 节点数多时需额外扩展组件，但潜力巨大。
Zabbix：
- 适合传统 IT（如服务器、网络设备），节点数中等时开箱即用。
- 高负载下需数据库优化，扩展性稍逊。

5. 新手建议

如果你是新手运维：
- 小规模尝试：Zabbix 的推模式和 GUI 配置更简单，适合快速上手。
- 云原生环境：Prometheus 的拉模式和生态（如 Grafana）更现代化，值得学习。
性能和节点测试：
- 小规模（<100 节点）：两者差别不大，Zabbix 可能更省心。
- 大规模（>1000 节点）：Prometheus 配合扩展工具更有优势。

如果你有具体环境（如节点数、采集频率），可以告诉我，我帮你进一步分析选择！

blackbox_exporter 返回示例

http://127.0.0.1:9115/probe?target=api.cn&module=icmp&debug=false

HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds

TYPE probe_dns_lookup_time_seconds gauge

probe_dns_lookup_time_seconds 0.180650917

HELP probe_duration_seconds Returns how long the probe took to complete in seconds

TYPE probe_duration_seconds gauge

probe_duration_seconds 0.189979708

HELP probe_icmp_duration_seconds Duration of icmp request by phase

TYPE probe_icmp_duration_seconds gauge

probe_icmp_duration_seconds{phase="resolve"} 0.180650917
probe_icmp_duration_seconds{phase="rtt"} 0.008315333
probe_icmp_duration_seconds{phase="setup"} 0.000723375

HELP probe_icmp_reply_hop_limit Replied packet hop limit (TTL for ipv4)

TYPE probe_icmp_reply_hop_limit gauge

probe_icmp_reply_hop_limit 56

HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.

TYPE probe_ip_addr_hash gauge

probe_ip_addr_hash 1.849020578e+09

HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6

TYPE probe_ip_protocol gauge

probe_ip_protocol 4

HELP probe_success Displays whether or not the probe was a success

TYPE probe_success gauge

probe_success 1

这段 Prometheus 数据展示了一个 ICMP（通常是 ping）探针的监控指标，反映了对某个目标进行网络探测的结果。以下是逐项的中文解释：

1. DNS 解析耗时

# HELP probe_dns_lookup_time_seconds 返回探针 DNS 查找所需的时间（秒）
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 0.180650917

解释: DNS 解析耗时 0.180650917 秒，约 181 毫秒。这表示将目标域名解析为 IP 地址所需的时间，略高，可能与网络延迟或 DNS 服务器响应速度有关。

2. 探针总耗时

# HELP probe_duration_seconds 返回探针完成所需的总时间（秒）
# TYPE probe_duration_seconds gauge
probe_duration_seconds 0.189979708

解释: 整个探针过程耗时 0.189979708 秒，约 190 毫秒。包括 DNS 解析、ICMP 请求等所有阶段的总和。

3. ICMP 请求各阶段耗时

# HELP probe_icmp_duration_seconds ICMP 请求按阶段的耗时
# TYPE probe_icmp_duration_seconds gauge
probe_icmp_duration_seconds{phase="resolve"} 0.180650917
probe_icmp_duration_seconds{phase="rtt"} 0.008315333
probe_icmp_duration_seconds{phase="setup"} 0.000723375

解释: ICMP 请求分为三个阶段：
- resolve（DNS 解析）：0.180650917 秒，约 181 毫秒，与 probe_dns_lookup_time_seconds 一致，占总耗时的大部分。
- rtt（往返时间，Round-Trip Time）：0.008315333 秒，约 8.3 毫秒，表示发送 ICMP 请求并收到响应的时间，网络延迟较低。
- setup（建立阶段）：0.000723375 秒，约 0.7 毫秒，表示准备发送 ICMP 请求的耗时，非常短。
总和: 181 + 8.3 + 0.7 ≈ 190 毫秒，与 probe_duration_seconds 吻合。

4. ICMP 回复跳数限制

# HELP probe_icmp_reply_hop_limit 回复数据包的跳数限制（IPv4 的 TTL）
# TYPE probe_icmp_reply_hop_limit gauge
probe_icmp_reply_hop_limit 56

解释: 回复的 ICMP 数据包的 TTL（Time To Live，跳数限制）值为 56。TTL 表示数据包在网络中可以经过的最大路由器跳数。初始 TTL 通常是 64、128 或 255，这里为 56，说明数据包可能经过了 8 次路由（假设初始 TTL 为 64）。

5. IP 地址哈希

# HELP probe_ip_addr_hash 指定 IP 地址的哈希值，用于检测 IP 地址是否变化
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 1.849020578e+09

解释: 目标 IP 地址的哈希值为 1,849,020,578。这个值用于追踪目标 IP 是否发生变化（例如 DNS 解析结果更新）。具体 IP 地址不可从哈希直接反推。

6. IP 协议

# HELP probe_ip_protocol 指定探针使用的 IP 协议是 IPv4 还是 IPv6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4

解释: 4 表示使用的是 IPv4 协议，而不是 IPv6。

7. 探针成功

# HELP probe_success 显示探针是否成功
# TYPE probe_success gauge
probe_success 1

解释: 1 表示探针成功，目标响应了 ICMP 请求。

总结

探针目标: 使用 IPv4 协议对某个目标执行 ICMP（ping）探测。
耗时: 总耗时约 190 毫秒，其中 DNS 解析占 181 毫秒（约 95%），实际网络往返时间仅 8.3 毫秒，说明网络连接很快，但 DNS 解析较慢。
网络路径: TTL 为 56，可能经过 8 个路由器（假设初始 TTL 为 64）。
结果: 探针成功，目标可达。

如果需要进一步分析（比如目标是谁，或为何 DNS 解析偏慢），可以提供更多上下文或让我帮你搜索相关信息！有什么具体想了解的吗？

http://127.0.0.1:9115/probe?target=api.cn&module=http_2xx&debug=false

HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds

TYPE probe_dns_lookup_time_seconds gauge

probe_dns_lookup_time_seconds 0.099875208

HELP probe_duration_seconds Returns how long the probe took to complete in seconds

TYPE probe_duration_seconds gauge

probe_duration_seconds 0.236132625

HELP probe_failed_due_to_regex Indicates if probe failed due to regex

TYPE probe_failed_due_to_regex gauge

probe_failed_due_to_regex 0

HELP probe_http_content_length Length of http content response

TYPE probe_http_content_length gauge

probe_http_content_length -1

HELP probe_http_duration_seconds Duration of http request by phase, summed over all redirects

TYPE probe_http_duration_seconds gauge

probe_http_duration_seconds{phase="connect"} 0.016199790999999998
probe_http_duration_seconds{phase="processing"} 0.074317625
probe_http_duration_seconds{phase="resolve"} 0.10107029199999999
probe_http_duration_seconds{phase="tls"} 0.0264935
probe_http_duration_seconds{phase="transfer"} 0.016835709

HELP probe_http_redirects The number of redirects

TYPE probe_http_redirects gauge

probe_http_redirects 1

HELP probe_http_ssl Indicates if SSL was used for the final redirect

TYPE probe_http_ssl gauge

probe_http_ssl 1

HELP probe_http_status_code Response HTTP status code

TYPE probe_http_status_code gauge

probe_http_status_code 200

HELP probe_http_uncompressed_body_length Length of uncompressed response body

TYPE probe_http_uncompressed_body_length gauge

probe_http_uncompressed_body_length 17502

HELP probe_http_version Returns the version of HTTP of the probe response

TYPE probe_http_version gauge

probe_http_version 1.1

HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.

TYPE probe_ip_addr_hash gauge

probe_ip_addr_hash 1.849020578e+09

HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6

TYPE probe_ip_protocol gauge

probe_ip_protocol 4

HELP probe_ssl_earliest_cert_expiry Returns last SSL chain expiry in unixtime

TYPE probe_ssl_earliest_cert_expiry gauge

probe_ssl_earliest_cert_expiry 1.74643561e+09

HELP probe_ssl_last_chain_expiry_timestamp_seconds Returns last SSL chain expiry in timestamp

TYPE probe_ssl_last_chain_expiry_timestamp_seconds gauge

probe_ssl_last_chain_expiry_timestamp_seconds 1.74643561e+09

HELP probe_ssl_last_chain_info Contains SSL leaf certificate information

TYPE probe_ssl_last_chain_info gauge

probe_ssl_last_chain_info{fingerprint_sha256="6434662aa222147a3c959e44bdb412372ead9c8a48f7bae1341b0e86a5e1bb77",issuer="CN=GlobalSign GCC R6 AlphaSSL CA 2023,O=GlobalSign nv-sa,C=BE",serialnumber="1c76b2f48eabe234db9348b7",subject="CN=.api.cn",subjectalternative=".api.cn,api.cn"} 1

HELP probe_success Displays whether or not the probe was a success

TYPE probe_success gauge

probe_success 1

HELP probe_tls_cipher_info Returns the TLS cipher negotiated during handshake

TYPE probe_tls_cipher_info gauge

probe_tls_cipher_info{cipher="TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256"} 1

HELP probe_tls_version_info Returns the TLS version used or NaN when unknown

TYPE probe_tls_version_info gauge

probe_tls_version_info{version="TLS 1.2"} 1

我来为你详细解释这段 Prometheus 监控数据。这是关于某个探针（probe）检查目标服务（可能是 HTTP 服务）的指标，包含了 DNS 解析、HTTP 请求、SSL/TLS 等多个方面的信息。以下是逐项的中文解释：

1. DNS 解析相关

# HELP probe_dns_lookup_time_seconds 返回探针 DNS 查找所需的时间（秒）
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 0.099875208

解释: 表示探针执行 DNS 解析（将域名转换为 IP 地址）所需的时间。这里的值是 0.099875208 秒，约 100 毫秒，说明 DNS 解析速度正常。

2. 探针总耗时

# HELP probe_duration_seconds 返回探针完成所需的总时间（秒）
# TYPE probe_duration_seconds gauge
probe_duration_seconds 0.236132625

解释: 表示探针从开始到结束的总耗时。这里的值是 0.236132625 秒，约 236 毫秒，包括 DNS 解析、连接建立、数据传输等所有阶段。

3. 正则表达式失败

# HELP probe_failed_due_to_regex 指示探针是否因正则表达式失败
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0

解释: 如果探针配置了正则表达式来检查响应内容，表示没有因正则匹配失败而导致探针失败，说明检查通过。

4. HTTP 内容长度

# HELP probe_http_content_length HTTP 响应内容的长度
# TYPE probe_http_content_length gauge
probe_http_content_length -1

解释: 表示 HTTP 响应的内容长度（字节）。这里是 -1，通常意味着服务器未返回 Content-Length 头，或者内容长度未知（可能是分块传输编码）。

5. HTTP 请求各阶段耗时

# HELP probe_http_duration_seconds 按阶段返回 HTTP 请求的耗时（秒），包含所有重定向的总和
# TYPE probe_http_duration_seconds gauge
probe_http_duration_seconds{phase="connect"} 0.016199790999999998
probe_http_duration_seconds{phase="processing"} 0.074317625
probe_http_duration_seconds{phase="resolve"} 0.10107029199999999
probe_http_duration_seconds{phase="tls"} 0.0264935
probe_http_duration_seconds{phase="transfer"} 0.016835709

解释: 将 HTTP 请求拆分为不同阶段的耗时：
- resolve（域名解析）：0.101070291 秒，约 101 毫秒，与 probe_dns_lookup_time_seconds 一致。
- connect（建立 TCP 连接）：0.016199791 秒，约 16 毫秒。
- tls（TLS 握手）：0.0264935 秒，约 26 毫秒。
- processing（服务器处理）：0.074317625 秒，约 74 毫秒。
- transfer（数据传输）：0.016835709 秒，约 17 毫秒。
总结: 总耗时约 236 毫秒，与 probe_duration_seconds 吻合。

6. HTTP 重定向次数

# HELP probe_http_redirects 重定向次数
# TYPE probe_http_redirects gauge
probe_http_redirects 1

解释: 表示探针经历了 1 次 HTTP 重定向（比如从 HTTP 到 HTTPS）。

7. 最终连接是否使用 SSL

# HELP probe_http_ssl 指示最终重定向是否使用了 SSL
# TYPE probe_http_ssl gauge
probe_http_ssl 1

解释: 1 表示最终连接使用了 SSL/TLS（即 HTTPS）。

8. HTTP 状态码

# HELP probe_http_status_code 响应的 HTTP 状态码
# TYPE probe_http_status_code gauge
probe_http_status_code 200

解释: 返回的状态码是 200，表示请求成功。

9. 未压缩响应体长度

# HELP probe_http_uncompressed_body_length 未压缩响应体的长度
# TYPE probe_http_uncompressed_body_length gauge
probe_http_uncompressed_body_length 17502

解释: 响应体的未压缩长度是 17,502 字节（约 17 KB）。

10. HTTP 版本

# HELP probe_http_version 返回探针响应的 HTTP 版本
# TYPE probe_http_version gauge
probe_http_version 1.1

解释: 使用的是 HTTP/1.1 协议。

11. IP 地址哈希

# HELP probe_ip_addr_hash 指定 IP 地址的哈希值，用于检测 IP 地址是否变化
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 1.849020578e+09

解释: IP 地址的哈希值为 1,849,020,578，用于追踪目标 IP 是否发生变化。

12. IP 协议

# HELP probe_ip_protocol 指定探针使用的 IP 协议是 IPv4 还是 IPv6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4

解释: 4 表示使用了 IPv4 协议。

13. SSL 证书到期时间

# HELP probe_ssl_earliest_cert_expiry 返回 SSL 证书链中最早的到期时间（Unix 时间戳）
# TYPE probe_ssl_earliest_cert_expiry gauge
probe_ssl_earliest_cert_expiry 1.74643561e+09

解释: 1,746,435,610 秒转换为日期是 2025 年 5 月 6 日 09:20:10 UTC，表示证书链中最早的到期时间。今天是 2025 年 3 月 17 日，剩余约 50 天。

# HELP probe_ssl_last_chain_expiry_timestamp_seconds 返回 SSL 证书链最后到期时间（时间戳）
# TYPE probe_ssl_last_chain_expiry_timestamp_seconds gauge
probe_ssl_last_chain_expiry_timestamp_seconds 1.74643561e+09

解释: 与上面一致，证书链的最后到期时间也是 2025 年 5 月 6 日。

14. SSL 证书信息

# HELP probe_ssl_last_chain_info 包含 SSL 叶子证书信息
# TYPE probe_ssl_last_chain_info gauge
probe_ssl_last_chain_info{fingerprint_sha256="6434662aa222147a3c959e44bdb412372ead9c8a48f7bae1341b0e86a5e1bb77",issuer="CN=GlobalSign GCC R6 AlphaSSL CA 2023,O=GlobalSign nv-sa,C=BE",serialnumber="1c76b2f48eabe234db9348b7",subject="CN=*.api.cn",subjectalternative="*.api.cn,api.cn"} 1

解释: 叶子证书的详细信息：
- 指纹（SHA256）：6434662aa...
- 颁发者：GlobalSign GCC R6 AlphaSSL CA 2023（比利时）。
- 序列号：1c76b2f48eabe234db9348b7。
- 主题：CN=*.api.cn（通配符证书）。
- 备用名称：*.api.cn, api.cn。

15. 探针成功

# HELP probe_success 显示探针是否成功
# TYPE probe_success gauge
probe_success 1

解释: 1 表示探针成功。

16. TLS 加密套件

# HELP probe_tls_cipher_info 返回 TLS 握手中协商的加密套件
# TYPE probe_tls_cipher_info gauge
probe_tls_cipher_info{cipher="TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256"} 1

解释: 使用了 TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 加密套件，安全性较高。

17. TLS 版本

# HELP probe_tls_version_info 返回使用的 TLS 版本
# TYPE probe_tls_version_info gauge
probe_tls_version_info{version="TLS 1.2"} 1

解释: 使用的是 TLS 1.2 协议。

总结

这个探针检查了一个 HTTPS 服务（*.api.cn），整个过程耗时约 236 毫秒，DNS 解析约 100 毫秒，状态码 200，证书将在 2025 年 5 月 6 日到期（还有约 50 天）。探针成功，未发现问题，使用了 IPv4、HTTP/1.1 和 TLS 1.2，一切正常运行。有什么具体想深入了解的吗？

node_exporter 返回示例

http://127.0.0.1:9100/metrics?target=api.cn&module=http_2xx&debug=false

HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.

TYPE go_gc_duration_seconds summary

go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0
go_gc_duration_seconds_sum 0
go_gc_duration_seconds_count 0

HELP go_gc_gogc_percent Heap size target percentage configured by the user, otherwise 100. This value is set by the GOGC environment variable, and the runtime/debug.SetGCPercent function. Sourced from /gc/gogc:percent

TYPE go_gc_gogc_percent gauge

go_gc_gogc_percent 100

HELP go_gc_gomemlimit_bytes Go runtime memory limit configured by the user, otherwise math.MaxInt64. This value is set by the GOMEMLIMIT environment variable, and the runtime/debug.SetMemoryLimit function. Sourced from /gc/gomemlimit:bytes

TYPE go_gc_gomemlimit_bytes gauge

go_gc_gomemlimit_bytes 9.223372036854776e+18

HELP go_goroutines Number of goroutines that currently exist.

TYPE go_goroutines gauge

go_goroutines 7

HELP go_info Information about the Go environment.

TYPE go_info gauge

go_info{version="go1.24.1"} 1

HELP go_memstats_alloc_bytes Number of bytes allocated in heap and currently in use. Equals to /memory/classes/heap/objects:bytes.

TYPE go_memstats_alloc_bytes gauge

go_memstats_alloc_bytes 657056

HELP go_memstats_alloc_bytes_total Total number of bytes allocated in heap until now, even if released already. Equals to /gc/heap/allocs:bytes.

TYPE go_memstats_alloc_bytes_total counter

go_memstats_alloc_bytes_total 657056

HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table. Equals to /memory/classes/profiling/buckets:bytes.

TYPE go_memstats_buck_hash_sys_bytes gauge

go_memstats_buck_hash_sys_bytes 1.445589e+06

HELP go_memstats_frees_total Total number of heap objects frees. Equals to /gc/heap/frees:objects + /gc/heap/tiny/allocs:objects.

TYPE go_memstats_frees_total counter

go_memstats_frees_total 224

HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata. Equals to /memory/classes/metadata/other:bytes.

TYPE go_memstats_gc_sys_bytes gauge

go_memstats_gc_sys_bytes 1.940696e+06

HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and currently in use, same as go_memstats_alloc_bytes. Equals to /memory/classes/heap/objects:bytes.

TYPE go_memstats_heap_alloc_bytes gauge

go_memstats_heap_alloc_bytes 657056

HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used. Equals to /memory/classes/heap/released:bytes + /memory/classes/heap/free:bytes.

TYPE go_memstats_heap_idle_bytes gauge

go_memstats_heap_idle_bytes 1.900544e+06

HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use. Equals to /memory/classes/heap/objects:bytes + /memory/classes/heap/unused:bytes

TYPE go_memstats_heap_inuse_bytes gauge

go_memstats_heap_inuse_bytes 1.900544e+06

HELP go_memstats_heap_objects Number of currently allocated objects. Equals to /gc/heap/objects:objects.

TYPE go_memstats_heap_objects gauge

go_memstats_heap_objects 3923

HELP go_memstats_heap_released_bytes Number of heap bytes released to OS. Equals to /memory/classes/heap/released:bytes.

TYPE go_memstats_heap_released_bytes gauge

go_memstats_heap_released_bytes 1.900544e+06

HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system. Equals to /memory/classes/heap/objects:bytes + /memory/classes/heap/unused:bytes + /memory/classes/heap/released:bytes + /memory/classes/heap/free:bytes.

TYPE go_memstats_heap_sys_bytes gauge

go_memstats_heap_sys_bytes 3.801088e+06

HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.

TYPE go_memstats_last_gc_time_seconds gauge

go_memstats_last_gc_time_seconds 0

HELP go_memstats_mallocs_total Total number of heap objects allocated, both live and gc-ed. Semantically a counter version for go_memstats_heap_objects gauge. Equals to /gc/heap/allocs:objects + /gc/heap/tiny/allocs:objects.

TYPE go_memstats_mallocs_total counter

go_memstats_mallocs_total 4147

HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures. Equals to /memory/classes/metadata/mcache/inuse:bytes.

TYPE go_memstats_mcache_inuse_bytes gauge

go_memstats_mcache_inuse_bytes 1208

HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system. Equals to /memory/classes/metadata/mcache/inuse:bytes + /memory/classes/metadata/mcache/free:bytes.

TYPE go_memstats_mcache_sys_bytes gauge

go_memstats_mcache_sys_bytes 31408

HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures. Equals to /memory/classes/metadata/mspan/inuse:bytes.

TYPE go_memstats_mspan_inuse_bytes gauge

go_memstats_mspan_inuse_bytes 32960

HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system. Equals to /memory/classes/metadata/mspan/inuse:bytes + /memory/classes/metadata/mspan/free:bytes.

TYPE go_memstats_mspan_sys_bytes gauge

go_memstats_mspan_sys_bytes 48960

HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place. Equals to /gc/heap/goal:bytes.

TYPE go_memstats_next_gc_bytes gauge

go_memstats_next_gc_bytes 4.194304e+06

HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations. Equals to /memory/classes/other:bytes.

TYPE go_memstats_other_sys_bytes gauge

go_memstats_other_sys_bytes 745323

HELP go_memstats_stack_inuse_bytes Number of bytes obtained from system for stack allocator in non-CGO environments. Equals to /memory/classes/heap/stacks:bytes.

TYPE go_memstats_stack_inuse_bytes gauge

go_memstats_stack_inuse_bytes 393216

HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator. Equals to /memory/classes/heap/stacks:bytes + /memory/classes/os-stacks:bytes.

TYPE go_memstats_stack_sys_bytes gauge

go_memstats_stack_sys_bytes 393216

HELP go_memstats_sys_bytes Number of bytes obtained from system. Equals to /memory/classes/total:byte.

TYPE go_memstats_sys_bytes gauge

go_memstats_sys_bytes 8.40628e+06

HELP go_sched_gomaxprocs_threads The current runtime.GOMAXPROCS setting, or the number of operating system threads that can execute user-level Go code simultaneously. Sourced from /sched/gomaxprocs:threads

TYPE go_sched_gomaxprocs_threads gauge

go_sched_gomaxprocs_threads 1

HELP go_threads Number of OS threads created.

TYPE go_threads gauge

go_threads 7

HELP node_boot_time_seconds Unix time of last boot, including microseconds.

TYPE node_boot_time_seconds gauge

node_boot_time_seconds 1.737388468861951e+09

HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.

TYPE node_cpu_seconds_total counter

node_cpu_seconds_total{cpu="0",mode="idle"} 1.6048261e+06
node_cpu_seconds_total{cpu="0",mode="nice"} 0
node_cpu_seconds_total{cpu="0",mode="system"} 194986.66
node_cpu_seconds_total{cpu="0",mode="user"} 174748.74
node_cpu_seconds_total{cpu="1",mode="idle"} 1.66515699e+06
node_cpu_seconds_total{cpu="1",mode="nice"} 0
node_cpu_seconds_total{cpu="1",mode="system"} 165075.32
node_cpu_seconds_total{cpu="1",mode="user"} 154451.79
node_cpu_seconds_total{cpu="10",mode="idle"} 2.00943864e+06
node_cpu_seconds_total{cpu="10",mode="nice"} 0
node_cpu_seconds_total{cpu="10",mode="system"} 3149.53
node_cpu_seconds_total{cpu="10",mode="user"} 9955.19
node_cpu_seconds_total{cpu="11",mode="idle"} 2.00944043e+06
node_cpu_seconds_total{cpu="11",mode="nice"} 0
node_cpu_seconds_total{cpu="11",mode="system"} 3167.68
node_cpu_seconds_total{cpu="11",mode="user"} 9948.72
node_cpu_seconds_total{cpu="12",mode="idle"} 2.00996266e+06
node_cpu_seconds_total{cpu="12",mode="nice"} 0
node_cpu_seconds_total{cpu="12",mode="system"} 3047.07
node_cpu_seconds_total{cpu="12",mode="user"} 9580.09
node_cpu_seconds_total{cpu="13",mode="idle"} 2.0103041e+06
node_cpu_seconds_total{cpu="13",mode="nice"} 0
node_cpu_seconds_total{cpu="13",mode="system"} 3004.28
node_cpu_seconds_total{cpu="13",mode="user"} 9294.98
node_cpu_seconds_total{cpu="2",mode="idle"} 1.76961929e+06
node_cpu_seconds_total{cpu="2",mode="nice"} 0
node_cpu_seconds_total{cpu="2",mode="system"} 114183.01
node_cpu_seconds_total{cpu="2",mode="user"} 113225.68
node_cpu_seconds_total{cpu="3",mode="idle"} 1.83257525e+06
node_cpu_seconds_total{cpu="3",mode="nice"} 0
node_cpu_seconds_total{cpu="3",mode="system"} 81655.16
node_cpu_seconds_total{cpu="3",mode="user"} 91278.43
node_cpu_seconds_total{cpu="4",mode="idle"} 2.00880497e+06
node_cpu_seconds_total{cpu="4",mode="nice"} 0
node_cpu_seconds_total{cpu="4",mode="system"} 3332.93
node_cpu_seconds_total{cpu="4",mode="user"} 10303.43
node_cpu_seconds_total{cpu="5",mode="idle"} 2.00908416e+06
node_cpu_seconds_total{cpu="5",mode="nice"} 0
node_cpu_seconds_total{cpu="5",mode="system"} 3265.66
node_cpu_seconds_total{cpu="5",mode="user"} 10247.04
node_cpu_seconds_total{cpu="6",mode="idle"} 2.0092584e+06
node_cpu_seconds_total{cpu="6",mode="nice"} 0
node_cpu_seconds_total{cpu="6",mode="system"} 3252.1
node_cpu_seconds_total{cpu="6",mode="user"} 10107.97
node_cpu_seconds_total{cpu="7",mode="idle"} 2.00939525e+06
node_cpu_seconds_total{cpu="7",mode="nice"} 0
node_cpu_seconds_total{cpu="7",mode="system"} 3216.94
node_cpu_seconds_total{cpu="7",mode="user"} 10023.3
node_cpu_seconds_total{cpu="8",mode="idle"} 2.00916048e+06
node_cpu_seconds_total{cpu="8",mode="nice"} 0
node_cpu_seconds_total{cpu="8",mode="system"} 3290.98
node_cpu_seconds_total{cpu="8",mode="user"} 10179.82
node_cpu_seconds_total{cpu="9",mode="idle"} 2.0094225e+06
node_cpu_seconds_total{cpu="9",mode="nice"} 0
node_cpu_seconds_total{cpu="9",mode="system"} 3157.41
node_cpu_seconds_total{cpu="9",mode="user"} 9952.5

HELP node_disk_read_bytes_total The total number of bytes read successfully.

TYPE node_disk_read_bytes_total counter

node_disk_read_bytes_total{device="disk0"} 1.374778241024e+12
node_disk_read_bytes_total{device="disk4"} 7.962624e+06
node_disk_read_bytes_total{device="disk5"} 2.9613056e+07

HELP node_disk_read_errors_total The total number of read errors.

TYPE node_disk_read_errors_total counter

node_disk_read_errors_total{device="disk0"} 0
node_disk_read_errors_total{device="disk4"} 0
node_disk_read_errors_total{device="disk5"} 0

HELP node_disk_read_retries_total The total number of read retries.

TYPE node_disk_read_retries_total counter

node_disk_read_retries_total{device="disk0"} 0
node_disk_read_retries_total{device="disk4"} 0
node_disk_read_retries_total{device="disk5"} 0

HELP node_disk_read_sectors_total The total number of sectors read successfully.

TYPE node_disk_read_sectors_total counter

node_disk_read_sectors_total{device="disk0"} 19617.591064453125
node_disk_read_sectors_total{device="disk4"} 0.181640625
node_disk_read_sectors_total{device="disk5"} 3.826171875

HELP node_disk_read_time_seconds_total The total number of seconds spent by all reads.

TYPE node_disk_read_time_seconds_total counter

node_disk_read_time_seconds_total{device="disk0"} 16492.82938385
node_disk_read_time_seconds_total{device="disk4"} 0.298451998
node_disk_read_time_seconds_total{device="disk5"} 1.4305535520000001

HELP node_disk_reads_completed_total The total number of reads completed successfully.

TYPE node_disk_reads_completed_total counter

node_disk_reads_completed_total{device="disk0"} 8.0353653e+07
node_disk_reads_completed_total{device="disk4"} 372
node_disk_reads_completed_total{device="disk5"} 1959

HELP node_disk_write_errors_total The total number of write errors.

TYPE node_disk_write_errors_total counter

node_disk_write_errors_total{device="disk0"} 0
node_disk_write_errors_total{device="disk4"} 0
node_disk_write_errors_total{device="disk5"} 0

HELP node_disk_write_retries_total The total number of write retries.

TYPE node_disk_write_retries_total counter

node_disk_write_retries_total{device="disk0"} 0
node_disk_write_retries_total{device="disk4"} 0
node_disk_write_retries_total{device="disk5"} 0

HELP node_disk_write_time_seconds_total This is the total number of seconds spent by all writes.

TYPE node_disk_write_time_seconds_total counter

node_disk_write_time_seconds_total{device="disk0"} 8720.636515421
node_disk_write_time_seconds_total{device="disk4"} 0
node_disk_write_time_seconds_total{device="disk5"} 1.78473225

HELP node_disk_writes_completed_total The total number of writes completed successfully.

TYPE node_disk_writes_completed_total counter

node_disk_writes_completed_total{device="disk0"} 9.6940981e+07
node_disk_writes_completed_total{device="disk4"} 0
node_disk_writes_completed_total{device="disk5"} 2184

HELP node_disk_written_bytes_total The total number of bytes written successfully.

TYPE node_disk_written_bytes_total counter

node_disk_written_bytes_total{device="disk0"} 2.269443596288e+12
node_disk_written_bytes_total{device="disk4"} 0
node_disk_written_bytes_total{device="disk5"} 1.77536e+07

HELP node_disk_written_sectors_total The total number of sectors written successfully.

TYPE node_disk_written_sectors_total counter

node_disk_written_sectors_total{device="disk0"} 23667.231689453125
node_disk_written_sectors_total{device="disk4"} 0
node_disk_written_sectors_total{device="disk5"} 4.265625

HELP node_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, goversion from which node_exporter was built, and the goos and goarch for the build.

TYPE node_exporter_build_info gauge

node_exporter_build_info{branch="master",goarch="arm64",goos="darwin",goversion="go1.24.1",revision="0c10545e8dfb983f4c013281c4904be27f645de2",tags="unknown",version="1.9.0"} 1

HELP node_filesystem_avail_bytes Filesystem space available to non-root users in bytes.

TYPE node_filesystem_avail_bytes gauge

node_filesystem_avail_bytes{device="/dev/disk1s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/iSCPreboot"} 5.03803904e+08
node_filesystem_avail_bytes{device="/dev/disk1s2",device_error="",fstype="apfs",mountpoint="/System/Volumes/xarts"} 5.03803904e+08
node_filesystem_avail_bytes{device="/dev/disk1s3",device_error="",fstype="apfs",mountpoint="/System/Volumes/Hardware"} 5.03803904e+08
node_filesystem_avail_bytes{device="/dev/disk2s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update/SFR/mnt1"} 3.475095552e+09
node_filesystem_avail_bytes{device="/dev/disk3s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update/mnt1"} 7.17410553856e+11
node_filesystem_avail_bytes{device="/dev/disk3s1s1",device_error="",fstype="apfs",mountpoint="/"} 7.1741108224e+11
node_filesystem_avail_bytes{device="/dev/disk3s2",device_error="",fstype="apfs",mountpoint="/System/Volumes/Preboot"} 7.17410553856e+11
node_filesystem_avail_bytes{device="/dev/disk3s4",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update"} 7.17410553856e+11
node_filesystem_avail_bytes{device="/dev/disk3s5",device_error="",fstype="apfs",mountpoint="/System/Volumes/Data"} 7.1741108224e+11
node_filesystem_avail_bytes{device="/dev/disk3s6",device_error="",fstype="apfs",mountpoint="/System/Volumes/VM"} 7.17410553856e+11
node_filesystem_avail_bytes{device="/dev/disk5s1",device_error="",fstype="msdos",mountpoint="/Volumes/KINGSTON"} 6.796980224e+09
node_filesystem_avail_bytes{device="OrbStack:/OrbStack",device_error="",fstype="nfs",mountpoint="/Users/forthxu/OrbStack"} 6.7862841344e+11
node_filesystem_avail_bytes{device="map auto_home",device_error="",fstype="autofs",mountpoint="/System/Volumes/Data/home"} 0

HELP node_filesystem_device_error Whether an error occurred while getting statistics for the given device.

TYPE node_filesystem_device_error gauge

node_filesystem_device_error{device="/dev/disk1s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/iSCPreboot"} 0
node_filesystem_device_error{device="/dev/disk1s2",device_error="",fstype="apfs",mountpoint="/System/Volumes/xarts"} 0
node_filesystem_device_error{device="/dev/disk1s3",device_error="",fstype="apfs",mountpoint="/System/Volumes/Hardware"} 0
node_filesystem_device_error{device="/dev/disk2s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update/SFR/mnt1"} 0
node_filesystem_device_error{device="/dev/disk3s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update/mnt1"} 0
node_filesystem_device_error{device="/dev/disk3s1s1",device_error="",fstype="apfs",mountpoint="/"} 0
node_filesystem_device_error{device="/dev/disk3s2",device_error="",fstype="apfs",mountpoint="/System/Volumes/Preboot"} 0
node_filesystem_device_error{device="/dev/disk3s4",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update"} 0
node_filesystem_device_error{device="/dev/disk3s5",device_error="",fstype="apfs",mountpoint="/System/Volumes/Data"} 0
node_filesystem_device_error{device="/dev/disk3s6",device_error="",fstype="apfs",mountpoint="/System/Volumes/VM"} 0
node_filesystem_device_error{device="/dev/disk5s1",device_error="",fstype="msdos",mountpoint="/Volumes/KINGSTON"} 0
node_filesystem_device_error{device="OrbStack:/OrbStack",device_error="",fstype="nfs",mountpoint="/Users/forthxu/OrbStack"} 0
node_filesystem_device_error{device="map auto_home",device_error="",fstype="autofs",mountpoint="/System/Volumes/Data/home"} 0

HELP node_filesystem_files Filesystem total file nodes.

TYPE node_filesystem_files gauge

node_filesystem_files{device="/dev/disk1s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/iSCPreboot"} 4.919992e+06
node_filesystem_files{device="/dev/disk1s2",device_error="",fstype="apfs",mountpoint="/System/Volumes/xarts"} 4.919961e+06
node_filesystem_files{device="/dev/disk1s3",device_error="",fstype="apfs",mountpoint="/System/Volumes/Hardware"} 4.920051e+06
node_filesystem_files{device="/dev/disk2s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update/SFR/mnt1"} 3.3936542e+07
node_filesystem_files{device="/dev/disk3s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update/mnt1"} 4.293979786e+09
node_filesystem_files{device="/dev/disk3s1s1",device_error="",fstype="apfs",mountpoint="/"} 4.294488888e+09
node_filesystem_files{device="/dev/disk3s2",device_error="",fstype="apfs",mountpoint="/System/Volumes/Preboot"} 7.005963924e+09
node_filesystem_files{device="/dev/disk3s4",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update"} 7.005962762e+09
node_filesystem_files{device="/dev/disk3s5",device_error="",fstype="apfs",mountpoint="/System/Volumes/Data"} 7.010848968e+09
node_filesystem_files{device="/dev/disk3s6",device_error="",fstype="apfs",mountpoint="/System/Volumes/VM"} 7.005962442e+09
node_filesystem_files{device="/dev/disk5s1",device_error="",fstype="msdos",mountpoint="/Volumes/KINGSTON"} 1
node_filesystem_files{device="OrbStack:/OrbStack",device_error="",fstype="nfs",mountpoint="/Users/forthxu/OrbStack"} 0
node_filesystem_files{device="map auto_home",device_error="",fstype="autofs",mountpoint="/System/Volumes/Data/home"} 0

HELP node_filesystem_files_free Filesystem total free file nodes.

TYPE node_filesystem_files_free gauge

node_filesystem_files_free{device="/dev/disk1s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/iSCPreboot"} 4.91996e+06
node_filesystem_files_free{device="/dev/disk1s2",device_error="",fstype="apfs",mountpoint="/System/Volumes/xarts"} 4.91996e+06
node_filesystem_files_free{device="/dev/disk1s3",device_error="",fstype="apfs",mountpoint="/System/Volumes/Hardware"} 4.91996e+06
node_filesystem_files_free{device="/dev/disk2s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update/SFR/mnt1"} 3.393648e+07
node_filesystem_files_free{device="/dev/disk3s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update/mnt1"} 4.293575294e+09
node_filesystem_files_free{device="/dev/disk3s1s1",device_error="",fstype="apfs",mountpoint="/"} 4.294084721e+09
node_filesystem_files_free{device="/dev/disk3s2",device_error="",fstype="apfs",mountpoint="/System/Volumes/Preboot"} 7.00596244e+09
node_filesystem_files_free{device="/dev/disk3s4",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update"} 7.00596244e+09
node_filesystem_files_free{device="/dev/disk3s5",device_error="",fstype="apfs",mountpoint="/System/Volumes/Data"} 7.0059676e+09
node_filesystem_files_free{device="/dev/disk3s6",device_error="",fstype="apfs",mountpoint="/System/Volumes/VM"} 7.00596244e+09
node_filesystem_files_free{device="/dev/disk5s1",device_error="",fstype="msdos",mountpoint="/Volumes/KINGSTON"} 0
node_filesystem_files_free{device="OrbStack:/OrbStack",device_error="",fstype="nfs",mountpoint="/Users/forthxu/OrbStack"} 0
node_filesystem_files_free{device="map auto_home",device_error="",fstype="autofs",mountpoint="/System/Volumes/Data/home"} 0

HELP node_filesystem_free_bytes Filesystem free space in bytes.

TYPE node_filesystem_free_bytes gauge

node_filesystem_free_bytes{device="/dev/disk1s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/iSCPreboot"} 5.03803904e+08
node_filesystem_free_bytes{device="/dev/disk1s2",device_error="",fstype="apfs",mountpoint="/System/Volumes/xarts"} 5.03803904e+08
node_filesystem_free_bytes{device="/dev/disk1s3",device_error="",fstype="apfs",mountpoint="/System/Volumes/Hardware"} 5.03803904e+08
node_filesystem_free_bytes{device="/dev/disk2s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update/SFR/mnt1"} 3.475095552e+09
node_filesystem_free_bytes{device="/dev/disk3s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update/mnt1"} 7.17410553856e+11
node_filesystem_free_bytes{device="/dev/disk3s1s1",device_error="",fstype="apfs",mountpoint="/"} 7.1741108224e+11
node_filesystem_free_bytes{device="/dev/disk3s2",device_error="",fstype="apfs",mountpoint="/System/Volumes/Preboot"} 7.17410553856e+11
node_filesystem_free_bytes{device="/dev/disk3s4",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update"} 7.17410553856e+11
node_filesystem_free_bytes{device="/dev/disk3s5",device_error="",fstype="apfs",mountpoint="/System/Volumes/Data"} 7.1741108224e+11
node_filesystem_free_bytes{device="/dev/disk3s6",device_error="",fstype="apfs",mountpoint="/System/Volumes/VM"} 7.17410553856e+11
node_filesystem_free_bytes{device="/dev/disk5s1",device_error="",fstype="msdos",mountpoint="/Volumes/KINGSTON"} 6.796980224e+09
node_filesystem_free_bytes{device="OrbStack:/OrbStack",device_error="",fstype="nfs",mountpoint="/Users/forthxu/OrbStack"} 6.7862841344e+11
node_filesystem_free_bytes{device="map auto_home",device_error="",fstype="autofs",mountpoint="/System/Volumes/Data/home"} 0

HELP node_filesystem_mount_info Filesystem mount information.

TYPE node_filesystem_mount_info gauge

node_filesystem_mount_info{device="/dev/disk1s1",major="",minor="",mountpoint="/System/Volumes/iSCPreboot"} 1
node_filesystem_mount_info{device="/dev/disk1s2",major="",minor="",mountpoint="/System/Volumes/xarts"} 1
node_filesystem_mount_info{device="/dev/disk1s3",major="",minor="",mountpoint="/System/Volumes/Hardware"} 1
node_filesystem_mount_info{device="/dev/disk2s1",major="",minor="",mountpoint="/System/Volumes/Update/SFR/mnt1"} 1
node_filesystem_mount_info{device="/dev/disk3s1",major="",minor="",mountpoint="/System/Volumes/Update/mnt1"} 1
node_filesystem_mount_info{device="/dev/disk3s1s1",major="",minor="",mountpoint="/"} 1
node_filesystem_mount_info{device="/dev/disk3s2",major="",minor="",mountpoint="/System/Volumes/Preboot"} 1
node_filesystem_mount_info{device="/dev/disk3s4",major="",minor="",mountpoint="/System/Volumes/Update"} 1
node_filesystem_mount_info{device="/dev/disk3s5",major="",minor="",mountpoint="/System/Volumes/Data"} 1
node_filesystem_mount_info{device="/dev/disk3s6",major="",minor="",mountpoint="/System/Volumes/VM"} 1
node_filesystem_mount_info{device="/dev/disk5s1",major="",minor="",mountpoint="/Volumes/KINGSTON"} 1
node_filesystem_mount_info{device="OrbStack:/OrbStack",major="",minor="",mountpoint="/Users/forthxu/OrbStack"} 1
node_filesystem_mount_info{device="map auto_home",major="",minor="",mountpoint="/System/Volumes/Data/home"} 1

HELP node_filesystem_purgeable_bytes Filesystem space available including purgeable space (MacOS specific).

TYPE node_filesystem_purgeable_bytes gauge

node_filesystem_purgeable_bytes{device="/dev/disk1s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/iSCPreboot"} 0
node_filesystem_purgeable_bytes{device="/dev/disk1s2",device_error="",fstype="apfs",mountpoint="/System/Volumes/xarts"} 0
node_filesystem_purgeable_bytes{device="/dev/disk1s3",device_error="",fstype="apfs",mountpoint="/System/Volumes/Hardware"} 0
node_filesystem_purgeable_bytes{device="/dev/disk2s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update/SFR/mnt1"} 0
node_filesystem_purgeable_bytes{device="/dev/disk3s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update/mnt1"} 0
node_filesystem_purgeable_bytes{device="/dev/disk3s1s1",device_error="",fstype="apfs",mountpoint="/"} 7.24023851138e+11
node_filesystem_purgeable_bytes{device="/dev/disk3s2",device_error="",fstype="apfs",mountpoint="/System/Volumes/Preboot"} 0
node_filesystem_purgeable_bytes{device="/dev/disk3s4",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update"} 7.24023851138e+11
node_filesystem_purgeable_bytes{device="/dev/disk3s5",device_error="",fstype="apfs",mountpoint="/System/Volumes/Data"} 7.24023851138e+11
node_filesystem_purgeable_bytes{device="/dev/disk3s6",device_error="",fstype="apfs",mountpoint="/System/Volumes/VM"} 0
node_filesystem_purgeable_bytes{device="/dev/disk5s1",device_error="",fstype="msdos",mountpoint="/Volumes/KINGSTON"} 0
node_filesystem_purgeable_bytes{device="OrbStack:/OrbStack",device_error="",fstype="nfs",mountpoint="/Users/forthxu/OrbStack"} 0
node_filesystem_purgeable_bytes{device="map auto_home",device_error="",fstype="autofs",mountpoint="/System/Volumes/Data/home"} 0

HELP node_filesystem_readonly Filesystem read-only status.

TYPE node_filesystem_readonly gauge

node_filesystem_readonly{device="/dev/disk1s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/iSCPreboot"} 0
node_filesystem_readonly{device="/dev/disk1s2",device_error="",fstype="apfs",mountpoint="/System/Volumes/xarts"} 0
node_filesystem_readonly{device="/dev/disk1s3",device_error="",fstype="apfs",mountpoint="/System/Volumes/Hardware"} 0
node_filesystem_readonly{device="/dev/disk2s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update/SFR/mnt1"} 0
node_filesystem_readonly{device="/dev/disk3s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update/mnt1"} 0
node_filesystem_readonly{device="/dev/disk3s1s1",device_error="",fstype="apfs",mountpoint="/"} 1
node_filesystem_readonly{device="/dev/disk3s2",device_error="",fstype="apfs",mountpoint="/System/Volumes/Preboot"} 0
node_filesystem_readonly{device="/dev/disk3s4",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update"} 0
node_filesystem_readonly{device="/dev/disk3s5",device_error="",fstype="apfs",mountpoint="/System/Volumes/Data"} 0
node_filesystem_readonly{device="/dev/disk3s6",device_error="",fstype="apfs",mountpoint="/System/Volumes/VM"} 0
node_filesystem_readonly{device="/dev/disk5s1",device_error="",fstype="msdos",mountpoint="/Volumes/KINGSTON"} 0
node_filesystem_readonly{device="OrbStack:/OrbStack",device_error="",fstype="nfs",mountpoint="/Users/forthxu/OrbStack"} 0
node_filesystem_readonly{device="map auto_home",device_error="",fstype="autofs",mountpoint="/System/Volumes/Data/home"} 0

HELP node_filesystem_size_bytes Filesystem size in bytes.

TYPE node_filesystem_size_bytes gauge

node_filesystem_size_bytes{device="/dev/disk1s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/iSCPreboot"} 5.24288e+08
node_filesystem_size_bytes{device="/dev/disk1s2",device_error="",fstype="apfs",mountpoint="/System/Volumes/xarts"} 5.24288e+08
node_filesystem_size_bytes{device="/dev/disk1s3",device_error="",fstype="apfs",mountpoint="/System/Volumes/Hardware"} 5.24288e+08
node_filesystem_size_bytes{device="/dev/disk2s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update/SFR/mnt1"} 5.368664064e+09
node_filesystem_size_bytes{device="/dev/disk3s1",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update/mnt1"} 9.9466258432e+11
node_filesystem_size_bytes{device="/dev/disk3s1s1",device_error="",fstype="apfs",mountpoint="/"} 9.9466258432e+11
node_filesystem_size_bytes{device="/dev/disk3s2",device_error="",fstype="apfs",mountpoint="/System/Volumes/Preboot"} 9.9466258432e+11
node_filesystem_size_bytes{device="/dev/disk3s4",device_error="",fstype="apfs",mountpoint="/System/Volumes/Update"} 9.9466258432e+11
node_filesystem_size_bytes{device="/dev/disk3s5",device_error="",fstype="apfs",mountpoint="/System/Volumes/Data"} 9.9466258432e+11
node_filesystem_size_bytes{device="/dev/disk3s6",device_error="",fstype="apfs",mountpoint="/System/Volumes/VM"} 9.9466258432e+11
node_filesystem_size_bytes{device="/dev/disk5s1",device_error="",fstype="msdos",mountpoint="/Volumes/KINGSTON"} 8.54386688e+09
node_filesystem_size_bytes{device="OrbStack:/OrbStack",device_error="",fstype="nfs",mountpoint="/Users/forthxu/OrbStack"} 7.12966750208e+11
node_filesystem_size_bytes{device="map auto_home",device_error="",fstype="autofs",mountpoint="/System/Volumes/Data/home"} 0

HELP node_load1 1m

向量数据库

作者: forthxu
时间: December 20, 2024
分类: 默认分类
评论

向量数据库（Vector Database）是一种用于存储、管理和检索高维向量数据的数据库。它在处理大规模数据集，特别是在机器学习和深度学习等领域中，用于存储嵌入（embeddings）、特征向量（feature vectors）或其他高维数据。向量数据库对于进行相似性搜索（如近似最近邻搜索，ANN）非常高效，广泛应用于图像搜索、推荐系统、自然语言处理（NLP）等领域。

1. 向量数据库的概念

向量数据库专门用于存储和查询向量数据。向量数据是由一组数字组成的数值集合（如 128 维、256 维、512 维等），这些向量通常是通过机器学习模型（例如卷积神经网络、自然语言处理模型等）生成的，用于表示图像、文本、音频或其他类型的复杂数据。

在向量数据库中，用户可以存储大量的向量并执行查询操作，例如：

相似度搜索：查询与给定向量相似的向量。
最近邻搜索（Nearest Neighbor Search）：寻找与查询向量最接近的向量。

2. 向量数据库的应用场景

向量数据库在多个领域中都有广泛的应用，主要包括：

自然语言处理（NLP）：存储词向量（word embeddings）或句向量（sentence embeddings），并用于快速检索相似的文本。
计算机视觉：存储图像的特征向量，并进行相似图像检索。
推荐系统：通过存储用户行为数据的向量表示，基于相似度进行个性化推荐。
音频和视频分析：存储音频或视频片段的向量表示，进行相似的音频或视频检索。
生物信息学：用于存储DNA序列、蛋白质结构的向量化表示，并进行相似性比对。

3. 向量数据库的关键技术

向量数据库的核心功能是高效的相似性搜索，尤其是在处理高维向量时。为了加速向量查询，常用的技术有：

近似最近邻（ANN，Approximate Nearest Neighbor）搜索：当数据集非常大时，精确的最近邻搜索可能非常慢，因此会使用近似算法进行加速。常见的ANN算法有：
- HNSW（Hierarchical Navigable Small World）：一种基于小世界图的近似最近邻搜索算法，具有较高的搜索精度和效率。
- FAISS（Facebook AI Similarity Search）：一个由Facebook开源的库，用于高效的向量检索，支持多种近似最近邻搜索算法。
- LSH（Locality-Sensitive Hashing）：一种哈希技术，能够将相似的对象映射到相同的哈希桶中，适用于高维数据的近似搜索。
- IVF（Inverted File Indexing）：一种倒排索引的方法，将向量分组并对每组内的向量进行索引，以加速搜索。
向量量化（Vector Quantization）：通过将高维向量映射到有限数量的簇（cluster）或离散的值来减小存储空间并加速搜索过程。常见的向量量化方法包括：
- PQ（Product Quantization）：将向量分成多个子向量，并对每个子向量进行量化，减少存储空间。
- OPQ（Optimized Product Quantization）：在PQ的基础上进行优化，进一步提高搜索性能和准确度。

4. 向量数据库的常见工具和平台

目前有多种开源和商业化的向量数据库可供选择，其中一些流行的工具和平台包括：

4.1 FAISS (Facebook AI Similarity Search)

概述：由Facebook开发的开源库，广泛应用于高维向量的高效检索和相似度搜索。
功能：支持多种向量索引方法（如IVF、PQ、HNSW等），能够在百万级甚至亿级规模的向量数据中进行高效的近似最近邻搜索。
特点：提供CPU和GPU版本，支持大规模数据集的快速索引和查询。

4.2 Milvus

概述：Milvus是一个开源的向量数据库，专门用于存储、管理和查询向量数据。
功能：支持多种索引类型，包括HNSW、IVF、Annoy、RocksDB等，能够高效地进行向量检索。
特点：具有高可扩展性，支持分布式部署，适用于大规模向量数据的处理。

4.3 Weaviate

概述：Weaviate是一个开源的向量搜索引擎，专注于将结构化数据与非结构化数据（如文本、图像、视频）结合起来进行推荐和检索。
功能：支持基于图像、文本和视频的向量化检索，可以与外部机器学习模型和深度学习框架集成。
特点：内建向量索引，支持灵活的数据模型和自动化的训练工作流。

4.4 Pinecone

概述：Pinecone是一个商业化的向量数据库平台，专注于提供高效、可扩展的向量存储和搜索服务。
功能：提供快速的相似性搜索，支持自动化的索引更新、在线查询等功能。
特点：面向开发者，支持无缝集成，适用于实时数据处理和检索。

4.5 Elasticsearch with Vector Search

概述：Elasticsearch 是一个广泛使用的搜索引擎，最近版本加入了对向量数据的支持，提供向量搜索功能。
功能：通过向量搜索插件，支持基于余弦相似度、欧几里得距离等方法进行向量检索。
特点：能够处理传统文本搜索与向量搜索的混合应用，适用于日志分析、全文搜索等场景。

5. 向量数据库的优势

高效的相似性检索：对于大规模高维数据集，向量数据库提供比传统数据库更高效的相似度搜索功能，支持快速的最近邻检索。
处理高维数据：传统数据库处理高维数据（如文本、图像、音频的嵌入向量）时，性能较差，而向量数据库能够针对这种数据类型进行优化。
支持机器学习与深度学习模型：向量数据库支持机器学习和深度学习模型生成的嵌入数据，可以用来构建更智能的推荐系统、搜索引擎等。

6. 总结

向量数据库是高维数据处理的利器，特别在需要快速检索和相似性比较的场景中，如推荐系统、图像和文本检索、自然语言处理等领域。随着数据量的增长和AI技术的普及，向量数据库的应用越来越广泛，它们能够帮助开发者在大数据环境下实现高效的数据存储与快速查询。