需求来源,需要用PHP解析HTML提取我想要的数据

用PHP写网站爬虫的时候,需要把爬取的网页进行解析,提取里面想要的数据,这个过程叫做网页HTML中数据结构化。
很多人应该知道用phpQuery像JQuery一样的语法进行网页处理,抽取想要的数据。
但是在复杂一些的场景phpQuery并不能很好的完成工作,说简单点就是复杂场景不好用。
有没有更好的方式呢,我们看看商业爬虫软件是怎么做的。

他山之石,商业爬虫怎么做的

市面上商业爬虫怎么做HTML解析抽取数据的

看看市面上商业爬虫怎么做HTML解析结构化数据字段抽取
通过观察市面上商业爬虫工具GooSeeker、神箭手、八爪鱼,可以知道他们都用一个叫做XPath表达式的方式提取需要的字段数据。
那么可以判断使用XPath对HTML解析进行数据定位提取就是通行最佳的方式之一。

PHP中的XPath支持

XPath表达式可以查找HTML节点或元素,是一种路径表达语言。
那么需要先学习下XPath的基础,花个1-2小时入门,XPath就是页面数据提取能力的最佳内功之一,这个时间值得花。
既然用XPath提取页面数据是通行的方式,那么PHP中支持XPath的扩展包是什么呢?
为了帮大家节约时间,Symfony DomCrawler 就是PHP中最佳XPath包之一,直接用他吧,Symfony出品质量可是有目共睹,PHP热门框架laravel都用Symfony的包。
Symfony DomCrawler官方文档介绍的实例有限,建议有需要的情况下把代码读下,更能熟知新用法。

撸起袖子干,用DomCrawler做XPath HTML页面解析结构化数据抽取

基本思路

在Chrome浏览器中安装”XPath Helper”插件(XPath Helper怎么使用见参考资料)
打开需要解析的网站页面编写和测试XPath表达式
在PHP代码用DomCrawler使用上XPath表达式抽取想要的字段数据

实例演示

解析《神偷奶爸3》页面的的电影介绍信息
在Chrome中用编写测试XPath表达式

xpath helper使用


在项目下用composer安装guzzlehttp/guzzle(http client)、symfony/dom-crawler(Symfony DomCrawler)

composer require  guzzlehttp/guzzle
composer require  symfony/dom-crawler

下面直接上代码,在代码中用注释做说明
php Douban.php

<?php
/**
 * Created by PhpStorm.
 * User: wwek
 * Date: 2017/7/9
 * Time: 21:41
 */

require __DIR__ . '/vendor/autoload.php';

use GuzzleHttp\Client;
use Symfony\Component\DomCrawler\Crawler;

print_r(json_encode(Spider(), JSON_UNESCAPED_UNICODE));
//print_r(Spider());

function Spider()
{
    //需要爬取的页面
    $url = 'https://movie.douban.com/subject/25812712/?from=showing';

    //下载网页内容
    $client   = new Client([
        'timeout' => 10,
        'headers' => ['User-Agent' => 'Mozilla/5.0 (compatible; Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)',
        ],
    ]);
    $response = $client->request('GET', $url)->getBody()->getContents();

    //进行XPath页面数据抽取
    $data    = []; //结构化数据存本数组
    $crawler = new Crawler();
    $crawler->addHtmlContent($response);

    try {
        //电影名称
        //网页结构中用css选择器用id的比较容易写xpath表达式
        $data['name'] = $crawler->filterXPath('//*[@id="content"]/h1/span[1]')->text();
        //电影海报
        $data['cover'] = $crawler->filterXPath('//*[@id="mainpic"]/a/img/@src')->text();
        //导演
        $data['director'] = $crawler->filterXPath('//*[@id="info"]/span[1]/span[2]')->text();
        //多个导演处理成数组
        $data['director'] = explode('/', $data['director']);
        //过滤前后空格
        $data['director'] = array_map('trim', $data['director']);

        //编剧
        $data['cover'] = $crawler->filterXPath('//*[@id="info"]/span[2]/span[2]/a')->text();
        //主演
        $data['mactor'] = $crawler->filterXPath('//*[@id="info"]/span[contains(@class,"actor")]/span[contains(@class,"attrs")]')->text();
        //多个主演处理成数组
        $data['mactor'] = explode('/', $data['mactor']);
        //过滤前后空格
        $data['mactor'] = array_map('trim', $data['mactor']);

        //上映日期
        $data['rdate'] = $crawler->filterXPath('//*[@id="info"]')->text();
        //使用正则进行抽取
        preg_match_all("/(\d{4})-(\d{2})-(\d{2})\(.*?\)/", $data['rdate'], $rdate); //2017-07-07(中国大陆) / 2017-06-14(安锡动画电影节) / 2017-06-30(美国)
        $data['rdate'] = $rdate[0];
        //简介
        //演示使用class选择器的方式
        $data['introduction'] = trim($crawler->filterXPath('//div[contains(@class,"indent")]/span')->text());

        //演员
        //本xpath表达式会得到多个对象结果,用each方法进行遍历
        //each是传入的参数是一个闭包,在闭包中使用外部的变量使用use方法,并使用变量指针
        $crawler->filterXPath('//ul[contains(@class,"celebrities-list from-subject")]/li')->each(function (Crawler $node, $i) use (&$data) {
            $actor['name']   = $node->filterXPath('//div[contains(@class,"info")]/span[contains(@class,"name")]/a')->text(); //名字
            $actor['role']   = $node->filterXPath('//div[contains(@class,"info")]/span[contains(@class,"role")]')->text(); //角色
            $actor['avatar'] = $node->filterXPath('//a/div[contains(@class,"avatar")]/@style')->text(); //头像
            //background-image: url(https://img3.doubanio.com/img/celebrity/medium/5253.jpg) 正则抽取头像图片
            preg_match_all("/((https|http|ftp|rtsp|mms)?:\/\/)[^\s]+\.(jpg|jpeg|gif|png)/", $actor['avatar'], $avatar);
            $actor['avatar'] = $avatar[0][0];
            //print_r($actor);
            $data['actor'][] = $actor;
        });

    } catch (\Exception $e) {

    }

    return $data;

}

执行结果

{
    "name": "神偷奶爸3 Despicable Me 3",
    "cover": "https://img3.doubanio.com/view/movie_poster_cover/lpst/public/p2469070974.webp",
    "director": [
        "凯尔·巴尔达",
        "皮艾尔·柯芬"
    ],
    "mactor": [
        "史蒂夫·卡瑞尔",
        "克里斯汀·韦格",
        "崔·帕克",
        "米兰达·卡斯格拉夫",
        "拉塞尔·布兰德",
        "迈克尔·贝亚蒂",
        "达纳·盖尔",
        "皮艾尔·柯芬",
        "安迪·尼曼"
    ],
    "rdate": [
        "2017-07-07(中国大陆)",
        "2017-06-14(安锡动画电影节)",
        "2017-06-30(美国)"
    ],
    "introduction": "  《神偷奶爸3》将延续前两部的温馨、搞笑风格,聚焦格鲁和露西的婚后生活,继续讲述格鲁和三个女儿的爆笑故事。“恶棍”奶爸格鲁将会如何对付大反派巴萨扎·布莱德,调皮可爱的小黄人们又会如何耍贱卖萌,无疑让全球观众万分期待。该片配音也最大程度沿用前作阵容,史蒂夫·卡瑞尔继续为男主角格鲁配音,皮埃尔·柯芬也将继续为经典角色小黄人配音,而新角色巴萨扎·布莱德则由《南方公园》主创元老崔·帕克为其配音。",
    "actor": [
        {
            "name": "皮艾尔·柯芬 ",
            "role": "导演",
            "avatar": "https://img3.doubanio.com/img/celebrity/medium/1389806916.36.jpg"
        },
        {
            "name": "凯尔·巴尔达 ",
            "role": "导演",
            "avatar": "https://img3.doubanio.com/img/celebrity/medium/51602.jpg"
        },
        {
            "name": "史蒂夫·卡瑞尔 ",
            "role": "饰 Gru / Dru",
            "avatar": "https://img3.doubanio.com/img/celebrity/medium/15731.jpg"
        },
        {
            "name": "克里斯汀·韦格 ",
            "role": "饰 Lucy Wilde",
            "avatar": "https://img3.doubanio.com/img/celebrity/medium/24543.jpg"
        },
        {
            "name": "崔·帕克 ",
            "role": "饰 Balthazar Bratt",
            "avatar": "https://img3.doubanio.com/img/celebrity/medium/5253.jpg"
        },
        {
            "name": "米兰达·卡斯格拉夫 ",
            "role": "饰 Margo",
            "avatar": "https://img1.doubanio.com/img/celebrity/medium/1410165824.37.jpg"
        }
    ]
}

经验

网页上数据是动态变化的,没获取到的字段需要用try catch进行异常处理,这样程序就不会崩溃
XPath表达式孰能生巧基本能应付绝大多数的需求
某些数据如果用XPath表达式也不好取,或者取出来的数据还需要加工的,用正则表达式处理,用preg_match_all进行抽取,用preg_replace进行替换
用strip_tags()函数去除HTML、XML以及PHP的标签,加参数可以保留标签去除,如处理文章内容strip_tags($str, "<p><img><strong>"),后留后面参数中的标签
一些常用的正则表达式


$str=preg_replace("/\s+/", " ", $str); //过滤多余回车 $str=preg_replace("/<[ ]+/si","<",$str); //过滤<__("<"号后面带空格) $str=preg_replace("/<\!--.*?-->/si","",$str); //注释 $str=preg_replace("/<(\!.*?)>/si","",$str); //过滤DOCTYPE $str=preg_replace("/<(\/?html.*?)>/si","",$str); //过滤html标签 $str=preg_replace("/<(\/?head.*?)>/si","",$str); //过滤head标签 $str=preg_replace("/<(\/?meta.*?)>/si","",$str); //过滤meta标签 $str=preg_replace("/<(\/?body.*?)>/si","",$str); //过滤body标签 $str=preg_replace("/<(\/?link.*?)>/si","",$str); //过滤link标签 $str=preg_replace("/<(\/?form.*?)>/si","",$str); //过滤form标签 $str=preg_replace("/cookie/si","COOKIE",$str); //过滤COOKIE标签 $str=preg_replace("/<(applet.*?)>(.*?)<(\/applet.*?)>/si","",$str); //过滤applet标签 $str=preg_replace("/<(\/?applet.*?)>/si","",$str); //过滤applet标签 $str=preg_replace("/<(style.*?)>(.*?)<(\/style.*?)>/si","",$str); //过滤style标签 $str=preg_replace("/<(\/?style.*?)>/si","",$str); //过滤style标签 $str=preg_replace("/<(title.*?)>(.*?)<(\/title.*?)>/si","",$str); //过滤title标签 $str=preg_replace("/<(\/?title.*?)>/si","",$str); //过滤title标签 $str=preg_replace("/<(object.*?)>(.*?)<(\/object.*?)>/si","",$str); //过滤object标签 $str=preg_replace("/<(\/?objec.*?)>/si","",$str); //过滤object标签 $str=preg_replace("/<(noframes.*?)>(.*?)<(\/noframes.*?)>/si","",$str); //过滤noframes标签 $str=preg_replace("/<(\/?noframes.*?)>/si","",$str); //过滤noframes标签 $str=preg_replace("/<(i?frame.*?)>(.*?)<(\/i?frame.*?)>/si","",$str); //过滤frame标签 $str=preg_replace("/<(\/?i?frame.*?)>/si","",$str); //过滤frame标签 $str=preg_replace("/<(script.*?)>(.*?)<(\/script.*?)>/si","",$str); //过滤script标签 $str=preg_replace("/<(\/?script.*?)>/si","",$str); //过滤script标签 $str=preg_replace("/javascript/si","Javascript",$str); //过滤script标签 $str=preg_replace("/vbscript/si","Vbscript",$str); //过滤script标签 $str=preg_replace("/on([a-z]+)\s*=/si","On\\1=",$str); //过滤script标签 $str=preg_replace("/&#/si","&#",$str); //过滤script标签,如javAsCript:alert(

参考资料

官方文档The DomCrawler Component
八爪鱼的了解XPath常用术语和表达式解析 十分钟轻松入门
gooseeker的xpath基础知识培训
神箭手的常用的辅助开发工具
XPath Helper:chrome爬虫网页解析工具 Chrome插件图文教程
在PHP中,您如何解析和处理HTML/XML?

扩展阅读

关于反爬虫,看这一篇就够了
最好的语言PHP + 最好的前端测试框架Selenium = 最好的爬虫(上)
QueryList 简单、 灵活、强大的PHP采集工具,让采集更简单一点。

隐藏nginx、openresty版本号

隐藏nginx、openresty的版本号有什么用?
假设一个场景,nginx被爆出0.9-1.5的版本被爆出一个0day漏洞,
攻击者会先大量扫描匹配的nginx版本,然后实施攻击。
如果事先隐藏了会降低第一时间被攻击的风险

在 http {} 中间配置增加

server_tokens off;

在http头中从原来的
Server: nginx/1.0.15 变为 Server: nginx
Server: openresty/1.11.2.3 变为 Server: openresty

nginx 日志格式化完整增强版

本完整增强版主要解决了后端执行时间的记录、哪台后端处理的、日志集中化后日志来自于哪台服务器ip、cdn传过来的客户端ip等扩展等问题。

在默认的nginx日志上,扩展增加了http头中代理服务器ip($http_x_forwarded_for)、
http头中cdn保存的客户端用户真实IP($http_x_real_ip)、服务端ip($server_addr)、http头中host主机($host)、
请求时间($request_time)、后端返回时间($upstream_response_time)、后端地址($upstream_addr)、
URI($uri)、ISO 8601标准时间($time_iso8601)

#log format
log_format access '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" '
'"$http_x_real_ip" "$server_addr" "$host" '
'$request_time $upstream_response_time "$upstream_addr" '
'"$time_iso8601"';

nginx日志滚动切割

繁忙的nginx服务器每天都会产生大量的web日志,所以每天需要切割。
每天切割的日志需要保留一段时间,更老的日志需要删除,专业叫法叫做日志滚动类似于视频监控,
所需要保留一定时间的日志还需要删除更老的日志。

很多人喜欢手动用bash shell去写nginx的日志切割滚动配合定时计划任务执行执行。
其实用系统自带的logrotate更好。
新建文件

/etc/logrotate.d/nginx

写入

/data/wwwlogs/*.log {
    #设置权限
    su root www
    #以天为周期分割日志
    daily
    #最小 比如每日分割 但如果大小未到 1024M 则不分割
    minsize 1024M
    #最大 当日志文件超出 2048M 时,即便再下一个时间周期之前 日志也会被分割
    maxsize 2048M
    #保留七天
    rotate 7
    #忽略错误
    missingok
    #如果文件为空则不分割 not if empty
    notifempty
    #以日期为单位
    dateext
    #以大小为限制做日志分割 size 会覆盖分割周期配置 1024 1024k 1024M 1024G
    size 1024M
    #开始执行附加的脚本命令 nginx写日志是按照文件索引进行的 必须重启服务才能写入新日志文件
    sharedscripts
    postrotate
    if [ -f /usr/local/nginx/logs/nginx.pid ]; then
    #重启nginx服务
    kill -USR1 `cat /usr/local/nginx/logs/nginx.pid`
    fi
    endscript
}

elastic stack elk日志系统

采集的日志需要格式化格式,要么在采集端做,要么在入库elasticsearch的时候做。
在nginx中直接配置输出的日志就是json格式,可以减少格式化日志的cpu开销
在日志采集端,用filebeat、或者logstash作为日志采集工具可以不做任务的格式化处理,
仅仅采集json格式的文本即可。

log_format logstash_json '{"@timestamp":"$time_iso8601",'
'"host":"$server_addr",'
'"clientip":"$remote_addr",'
'"remote_addr":"$remote_addr",'
'"http_x_forwarded_for":"$http_x_forwarded_for",'
'"http_x_real_ip":"$http_x_real_ip",'
'"http_cf_connecting_ip":"$http_cf_connecting_ip",'
'"size":$body_bytes_sent,'
'"responsetime":$request_time,'
'"upstreamtime":"$upstream_response_time",'
'"upstreamhost":"$upstream_addr",'
'"http_host":"$host",'
'"request":"$request",'
'"url":"$uri",'
'"referer":"$http_referer",'
'"agent":"$http_user_agent",'
'"status":"$status"}';

nginx反向代理

listen 80;
#listen [::]:80;
server_name proxy.iamle.com;

location / {
#auth_basic "Password please";
#auth_basic_user_file /usr/local/nginx/conf/htpasswd;
proxy_pass http://127.0.0.1:5601/;
proxy_redirect off;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}

nginx反向代理的时候要支持后端服务器为DDNS动态域名

server_name proxy.iamle.com;
resolver 1.1.1.1 valid=3s;
set $HassHost "http://backend.iamle.com:999";
location / {
    #auth_basic "Password please";
    #auth_basic_user_file /usr/local/nginx/conf/htpasswd;
    proxy_pass $HassHost;
    proxy_redirect off;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header Host $http_host;
    proxy_set_header X-NginX-Proxy true;
}

nginx 反向代理 WebScoket

location /api/ {
        proxy_pass http://webscoket:80/;
        # WebScoket Support
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";

        #proxy_set_header Origin xxx;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_set_header X-NginX-Proxy true;
}

nginx代理设置php pm status

php-fpm.conf设置

pm.status_path = /phpfpm-status-www

phpfpm.conf

server
{
listen 80;
server_name localhost;
location ~ ^/(phpfpm-status-www|phpstatuswww)$
{
fastcgi_pass unix:/tmp/php-cgi.sock;
include fastcgi.conf;
fastcgi_param SCRIPT_FILENAME $fastcgi_script_name;
}
}

nginx 域名SEO优化 301

把 iamle.com 默认 301到 www.iamle.com

listen 80;
server_name www.iamle.com iamle.com;
if ($host != 'www.iamle.com' ) {
rewrite ^/(.*)$ https://www.iamle.com/$1 permanent;
}

nginx 全站http跳转https

server{
listen 80;
server_name www.iamle.com;
return 301 https://www.iamle.com$request_uri;
}

server {

listen 443 ssl http2;
ssl    on;
ssl_certificate         /usr/local/nginx/conf/ssl/www.iamle.com.crt;
ssl_certificate_key     /usr/local/nginx/conf/ssl/www.iamle.com.key;
ssl_session_cache           shared:SSL:10m;
ssl_session_timeout         10m;
ssl_session_tickets         off;

# intermediate configuration. tweak to your needs.
    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
    ssl_ciphers 'ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES256-SHA:ECDHE-ECDSA-DES-CBC3-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:DES-CBC3-SHA:!DSS';
    ssl_prefer_server_ciphers on;

# HSTS (ngx_http_headers_module is required) (15768000 seconds = 6 months)
#add_header Strict-Transport-Security max-age=15768000;
add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload";

# OCSP Stapling ---
# fetch OCSP records from URL in ssl_certificate and cache them
ssl_stapling on;
ssl_stapling_verify on;

server_name www.iamle.com;
#  ....
}

XSS、Ifram

add_header X-Frame-Options SAMEORIGIN;                                                                                                                        
add_header X-Content-Type-Options nosniff;                          
add_header X-Xss-Protection "1; mode=block";

nginx http2 openssl 支持情况介绍

Supporting HTTP/2 for Website Visitors

ssl https 证书配置生成巩固

Mozilla SSL Configuration Generator

nginx+php 开启OPCACHE 并且使用软连接发布

软连接的方式发布新版本php代码,但是因为OPCACHE未及时更新某些时候会导致502错误
2个方式,方式1直接在nginx这里自动更新最新路径,方式2刷新php-fpm的缓存
下面是方式1

        fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
        fastcgi_param DOCUMENT_ROOT $realpath_root;

Cache invalidation for scripts in symlinked folders
Symfony Configuring a Web Server

apache ab

老牌简单方便的ab压测

yum install apr-util

webench

web http压力测试

wget http://blog.zyan.cc/soft/linux/webbench/webbench-1.5.tar.gz
tar zxvf webbench-1.5.tar.gz
cd webbench-1.5
make && make install

wrk

wrk官网
Modern HTTP benchmarking tool
web http压力测试

提示:
wrk测试结果可能不是那么准确,原因见
openresty作者(agentzh) 春哥在微博中说道 https://weibo.com/1834459124/G9ew2d5Ky?type=repost

我们的一家付费客户之前反映用 wrk 压测时发现有延时较高的长尾请求,
我们仔细使用 systemtap 和抓包工具分析之后,
发现 wrk 报告的延时结果错得离谱(比实际延时可能高出一个数量级的水平),
同时发现 wrk 输出的报告里不同的数字之间甚至都相互矛盾,
最后发现 wrk 内部有一个 stats_correct 函数会故意向实际统计的结果里加入噪音数据,
按其作者的说法,是为了摸拟真实互联网环境下的延时,
我就大汗了……我们自己去掉 wrk 那个 C 函数调用结果就对了。

go-wrk

go-wrk官网
go-wrk – a HTTP benchmarking tool based in spirit on the excellent wrk tool (https://github.com/wg/wrk)

Gatling Pea

Gatling官网
Gatling开源
Async Scala-Akka-Netty based Load Test Tool http://gatling.io
Gatling 是基于 Netty 和 Akka 技术实现的高性能压测工具.

关于 Pea
由于单独一台机器硬件资源和网络协议的限制存在, 在高负载测试中需要多台机器共同提供负载. Pea 是在以 Galting 为引擎, 在多节点场景下的压测工具. 包含以下特性:

管理和监控多个工作节点. 依赖 Zookeeper
运行过程中可以实时查看每个节点的具体执行状态
多个节点执行完后会自动收集日志, 生成统一的报告
支持原生的 Gatling 脚本, 原生的 HTTP 协议
扩展支持了 Dubbo和 Grpc 协议
以 Git 仓库管理脚本和资源文件
内置了 Scala 增量编译器, 脚本可在线快速编译
不同于其他实现, 所有这些功能都在同一进程内实现. 感谢 Gatling 作者高质量的代码
可以在实体机, 虚拟机, Docker 容器中运行

sniper

sniper官网
A powerful & high-performance http load tester

hey

hey官网
HTTP load generator, ApacheBench (ab) replacement, formerly known as rakyll/boom

Siege

Siege官网
Siege is an http load tester and benchmarking utility

http_load

http_load官网
http_load runs multiple http fetches in parallel, to test the throughput of a web server.

vegeta

vegete官网
HTTP load testing tool and library. It’s over 9000!

t50

t50官网
mixed packet injector tool

GoReplay

GoReplay官网
GoReplay is an open-source tool for capturing and replaying live HTTP traffic into a test environment in order to continuously test your system with real data. It can be used to increase confidence in code deployments, configuration changes and infrastructure changes.
https://github.com/buger/goreplay

tcpcopy

tcpcoy官网
An online request replication tool, also a tcp stream replay tool, fit for real testing, performance testing, stability testing, stress testing, load testing, smoke testing, etc

gryphon

Gryphon官网
Gryphon是由网易自主研发的能够模拟千万级别并发用户的一个软件,目的是能够用较少的资源来模拟出大量并发用户,并且能够更加真实地进行压力测试, 以解决网络消息推送服务方面的压力测试的问题和传统压力测试的问题。Gryphon分为两个程序,一个运行gryphon,用来模拟用户,一个是 intercept,用来截获响应包信息给gryphon。Gryphon模拟用户的本质是用一个连接来模拟一个用户,所以有多少个连接,就有多少个用户,而用户的素材则取自于pcap抓包文件。值得注意的是,Gryphon架构类似于tcpcopy,也可以采用传统使用方式和高级使用方式。

locust.io

Locust官网
An open source load testing tool.
Define user behaviour with Python code, and swarm your system with millions of simultaneous users.
python编写,用python脚本定义压测规则,分布式,有WEB UI界面
推荐使用

Jmeter

Apache JMeter官网
Apache JMeter是Apache组织开发的基于Java的压力测试工具。用于对软件做压力测试,它最初被设计用于Web应用测试,但后来扩展到其他测试领域。
比较轻量,较多测试同学喜欢用

Tsung

Tsung官网
Tsung is an open-source multi-protocol distributed load testing tool
It can be used to stress HTTP, WebDAV, SOAP, PostgreSQL, MySQL, LDAP, MQTT and Jabber/XMPP servers. Tsung is a free software released under the GPLv2 license.
Tsung是一个开源的支持多协议的分布式压力测试工具
目前支持HTTP分布式压力测试、WebDAV分布式压力测试、SOAP分布式压力测试、PostgreSQL分布式压力测试、MySQL分布式压力测试、LDAP分布式压力测试、MQTT分布式压力测试、Jabber/XMPP servers分布式压力测试

LoadRunner

LoadRunner官网
老牌压力测试工具,安装包非常之大,功能也非常全,目前被微软收购了

nGrinder

nGrinder官方
nGrinder is a platform for stress tests that enables you to execute script creation, test execution, monitoring, and result report generator simultaneously. The open-source nGrinder offers easy ways to conduct stress tests by eliminating inconveniences and providing integrated environments.
nGrinder是基于Grinder开源项目,由NHN公司的开发团队进行了重新设计和完善。nGrinder是一款非常易用,有简洁友好的用户界面和controller-agent分布式结构的强大的压力测试工具。
nGrinder测试基于python测试脚本(groovy也可),用户按照一定规范编写测试脚本,controller会将脚本一集需要的资源分发到agent,用jython执行。并且在执行的过程中收集运行情况、相应时间、测试目标服务器的运行情况等。并且保存这些数据生成测试报告,以供查看。
这款框架的一大特点就是非常的简单易用,安装也很容易,可以说是开箱即用。

nGrinder使用

nGrinder-运维人员轻量级性能测试平台 https://www.jianshu.com/p/f336180806cc

BuoyantIO/slow_cooker

BuoyantIO/slow_cooker官网
A load tester focused on lifecycle issues and long-running tests
负载测试人员专注于生命周期问题和长期运行的测试
大多数负载测试人员通过向后端发送尽可能多的流量来工作。
我们想要一种不同的方法,我们希望能够在很长一段时间内测试具有可预测的负载和并发级别的服务。
我们希望定期报告qps和延迟,而不是最后收到报告。

twitter/iago

twitter/iago官网
A load generator, built for engineers
面向开发者的负载生成器
Iago是一个负载生成工具,可以针对给定目标重放生产或合成流量。
除此之外,它与其他负载生成工具的不同之处在于它试图保持事务速率不变。
例如,如果您想以每分钟100K的请求测试您的服务,Iago会尝试达到该速率。
由于Iago重放流量,您必须指定流量来源。
您使用事务日志作为流量来源,其中每个事务都会向您的服务生成服务处理请求。
以固定速率重放事务使您可以在预期负载下研究服务的行为。
Iago还允许您识别在生产环境中可能无法轻易观察到的瓶颈或其他问题,在这种环境中,您的最大预期负载很少发生。

fortio

fortio官网
fortio
Fortio load testing library, command line tool, advanced echo server and web UI in go (golang).
Allows to specify a set query-per-second load and record latency histograms and other useful stats.
Fortio是一个用golang写的负载测试库,包括了命令行工具,高级echo服务器和Web UI。
允许指定设置的每秒查询负载和记录延迟直方图以及其他有用的统计信息。

autocannon

autocannon官网
A HTTP/1.1 benchmarking tool written in node, greatly inspired by wrk and wrk2, with support for HTTP pipelining and HTTPS. On my box, autocannon can produce more load than wrk and wrk2.

k6

k6官网
k6开源
使用Go和JavaScript的现代负载测试工具-https://k6.io
A modern load testing tool, using Go and JavaScript – https://k6.io
Scripting in ES6 JS: support for modules to aid code reusability across an organization
Everything as code: test logic and configuration options are both in JS for version control friendliness
Automation-friendly: checks (like asserts) and thresholds for easy and flexible CI configuration!
HTTP/1.1, HTTP/2 and WebSocket protocol support
TLS features: client certificates, configurable SSL/TLS versions and ciphers
Batteries included: Cookies, Crypto, Custom metrics, Encodings, Environment variables, JSON, HTML forms, files, flexible execution control, and more.
Built-in HAR converter: record browser sessions as .har files and directly convert them to k6 scripts
Flexible metrics storage and visualization: InfluxDB (+Grafana), JSON or Load Impact Insights
Cloud execution and distributed tests (currently only on infrastructure managed by Load Impact, with native distributed execution in k6 planned for the near future!)

几乎所有的WEB压力测试工具 WEB压力测试工具大全 原文出处

loader.io(在线服务)

loader.io官网
Simple Cloud-based
LOAD TESTING
Loader.io is a FREE load testing service that allows you to stress test
your web-apps & apis with thousands of concurrent connections.

gaps腾讯压测大师(在线服务)

gaps官网
一分钟完成用例配置,让压测更简单
支持HTTP、HTTPS协议的API接口、网站、公众号内页等主流压测对象

阿里云PTS(在线服务)

阿里云PTS官网
性能测试PTS(Performance Testing Service)是面向所有技术相关背景人员的云化性能测试工具,孵化自阿里内部平台。有别于传统工具的繁复,PTS以互联网化的交互,面向分布式和云化的设计,更适合当前的主流技术架构。无论是自研还是适配开源的功能,PTS都可以轻松模拟大量用户访问业务的场景,任务随时发起,免去搭建和维护成本。更是紧密结合监控类产品提供一站式监控、定位等附加价值,高效检验和管理业务性能。

需求

当前php-fpm进程是不是不够了,一通命令一打,也只能看当前情况。
熟话说无监控无度量就无诊断。
市面上有成熟的php-fpm status状态采集,展示工具那就是elasti stack全家桶
目标采集分析php-fpm status 状态数据

选型

Elastic公司全家桶之Beats(PHP-FPM Module@Metricbeat) + Elasticsearch + Kibana

Metricbeat 是 Beats 的一个采集组件
PHP-FPM Module 是 Metricbeat 的一个采集模块

整体的数据流方式

方式1 : Beats (Metricbeat )>Elasticsearch>Kibana
方式2: Beats (Metricbeat )>Logstash>[直连,redis队列,Kafka队列]>Elasticsearch>Kibana

先讲方式1

配置php-fpm

php-fpm.conf

[www]
...
pm.status_path = /phpfpm-status-www

[u]
...
pm.status_path = /phpfpm-status-u

[www] [u] 是独立的php-fpm pool 进程池,每个池子的pm.status_path是需要单独设置的

测试配置文件是否正确

/usr/local/php/sbin/php-fpm -t
[29-Mar-2017 16:15:08] NOTICE: configuration file /usr/local/php/etc/php-fpm.conf test is successful

重启php-fpm

/etc/init.d/php-fpm reload
Reload service php-fpm done

配置nginx

phpfpm.conf

server
{
  listen 80;
  server_name localhost;
  location ~ ^/(phpfpm-status-www|phpstatuswww)$
  {
    fastcgi_pass unix:/tmp/php-cgi.sock;
    include fastcgi.conf;
    fastcgi_param SCRIPT_FILENAME $fastcgi_script_name;
  }
  location ~ ^/(phpfpm-status-u|phpstatusu)$
  {
    fastcgi_pass unix:/tmp/u-php-cgi.sock;
    include fastcgi.conf;
    fastcgi_param SCRIPT_FILENAME $fastcgi_script_name;
  }
}

只给本地访问,这里监听的是localhost,也可用nginx allow deny的方式

测试配置文件是否正确

/usr/local/nginx/sbin/nginx -t
nginx: the configuration file /usr/local/nginx/conf/nginx.conf syntax is ok
nginx: configuration file /usr/local/nginx/conf/nginx.conf test is successful

重启nginx

/etc/init.d/nginx reload
Reloading nginx daemon configuration....

用curl测试php-fpm status状态是否可以获取

curl http://localhost/phpfpm-status-www
pool: www
process manager: static
start time: 29/Mar/2017:16:19:09 +0800
start since: 75
accepted conn: 900
listen queue: 0
max listen queue: 0
listen queue len: 0
idle processes: 696
active processes: 4
total processes: 700
max active processes: 60
max children reached: 0
slow requests: 1

配置Metricbeat

Metricbeat Reference [5.3] » Modules » PHP-FPM Module

/etc/metricbeat/metricbeat.yml

#========================== Modules configuration ============================
metricbeat.modules:

#------------------------------- PHP-FPM Module -------------------------------
- module: php_fpm
  metricsets: ["pool"]
  enabled: true
  period: 10s
  status_path: "/phpfpm-status-www"
  hosts: ["localhost:80"]

- module: php_fpm
  metricsets: ["pool"]
  enabled: true
  period: 10s
  status_path: "/phpfpm-status-u"
  hosts: ["localhost:80"]

#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
# Array of hosts to connect to.
  hosts: ["localhost:9200"]

测试配置文件是否正确

/usr/share/metricbeat/bin/metricbeat -configtest
Config OK

给metricbeat创建Elasticsearch的Index Template
Loading the Index Template in Elasticsearch

curl -XPUT 'http://localhost:9200/_template/metricbeat' -d@/etc/metricbeat/metricbeat.template.json

给metricbeat创建Kibana Dashboards
Loading Sample Kibana Dashboards

./scripts/import_dashboards -es http://localhost:9200

metricbeat中phpfpm模块,没有elastic官方的 Sample Kibana Dashboard
所以需要分析什么数据的自己在Kibana中自己创建visualize然后做成Dashboard

启动metricbeat

/etc/init.d/metricbeat start

配置Kibana

Management / Kibana / Indices > Add New
Index name or pattern 填写 “metricbeat-*”
Time-field name 选 “@timestamp”
Alt text

分享我做的 metricbeat phpfpm kibana dashboard (不完整)
export.json

[
  {
    "_id": "PHP-FPM导航",
    "_type": "visualization",
    "_source": {
      "title": "PHPFPM导航",
      "visState": "{\n  \"title\": \"PHP-FPM导航\",\n  \"type\": \"markdown\",\n  \"params\": {\n    \"markdown\": \"- [Overview](#/dashboard/Metricbeat-phpfpm-overview)\\n\\n\\n```\\npool:php-fpm池的名称,一般都是应该是www\\nprocess manage:进程的管理方法,php-fpm支持三种管理方法,分别是static,dynamic和ondemand,一般情况下都是dynamic\\nstart time:php-fpm启动时候的时间,不管是restart或者reload都会更新这里的时间\\nstart since:php-fpm自启动起来经过的时间,默认为秒\\naccepted conn:当前接收的连接数\\nlisten queue:在队列中等待连接的请求个数,如果这个数字为非0,那么最好增加进程的fpm个数\\nmax listen queue:从fpm启动以来,在队列中等待连接请求的最大值\\nlisten queue len:等待连接的套接字队列大小\\nidle processes:空闲的进程个数\\nactive processes:活动的进程个数\\ntotal processes:总共的进程个数\\nmax active processes:从fpm启动以来,活动进程的最大个数,如果这个值小于当前的max_children,可以调小此值\\nmax children reached:当pm尝试启动更多的进程,却因为max_children的限制,没有启动更多进程的次数。如果这个值非0,那么可以适当增加fpm的进程数\\nslow requests:慢请求的次数,一般如果这个值未非0,那么可能会有慢的php进程,一般一个不好的mysql查询是最大的祸首。\\n```\"\n  },\n  \"aggs\": [],\n  \"listeners\": {}\n}",
      "uiStateJSON": "{}",
      "description": "",
      "version": 1,
      "kibanaSavedObjectMeta": {
        "searchSourceJSON": "{\n  \"query\": {\n    \"query_string\": {\n      \"query\": \"*\",\n      \"analyze_wildcard\": true\n    }\n  },\n  \"filter\": []\n}"
      }
    }
  },
  {
    "_id": "PHPFPM-processes-active",
    "_type": "visualization",
    "_source": {
      "title": "PHPFPM processes active",
      "visState": "{\"title\":\"PHPFPM processes active\",\"type\":\"area\",\"params\":{\"shareYAxis\":true,\"addTooltip\":true,\"addLegend\":true,\"legendPosition\":\"right\",\"smoothLines\":false,\"scale\":\"linear\",\"interpolate\":\"linear\",\"mode\":\"stacked\",\"times\":[],\"addTimeMarker\":false,\"defaultYExtents\":false,\"setYExtents\":false,\"yAxis\":{}},\"aggs\":[{\"id\":\"1\",\"enabled\":true,\"type\":\"avg\",\"schema\":\"metric\",\"params\":{\"field\":\"php_fpm.pool.processes.active\",\"customLabel\":\"活动\"}},{\"id\":\"2\",\"enabled\":true,\"type\":\"date_histogram\",\"schema\":\"segment\",\"params\":{\"field\":\"@timestamp\",\"interval\":\"auto\",\"customInterval\":\"2h\",\"min_doc_count\":1,\"extended_bounds\":{},\"customLabel\":\"时间\"}},{\"id\":\"3\",\"enabled\":true,\"type\":\"terms\",\"schema\":\"group\",\"params\":{\"field\":\"php_fpm.pool.name\",\"size\":5,\"order\":\"desc\",\"orderBy\":\"1\",\"customLabel\":\"pool池\"}},{\"id\":\"4\",\"enabled\":true,\"type\":\"terms\",\"schema\":\"split\",\"params\":{\"field\":\"beat.name\",\"size\":5,\"order\":\"desc\",\"orderBy\":\"1\",\"customLabel\":\"主机\",\"row\":true}}],\"listeners\":{}}",
      "uiStateJSON": "{}",
      "description": "",
      "version": 1,
      "kibanaSavedObjectMeta": {
        "searchSourceJSON": "{\"index\":\"metricbeat-*\",\"query\":{\"query_string\":{\"query\":\"*\",\"analyze_wildcard\":true}},\"filter\":[]}"
      }
    }
  },
  {
    "_id": "PHPFPM-connections-queued",
    "_type": "visualization",
    "_source": {
      "title": "PHPFPM connections queued",
      "visState": "{\"title\":\"PHPFPM connections queued\",\"type\":\"area\",\"params\":{\"shareYAxis\":true,\"addTooltip\":true,\"addLegend\":true,\"legendPosition\":\"right\",\"smoothLines\":false,\"scale\":\"linear\",\"interpolate\":\"linear\",\"mode\":\"stacked\",\"times\":[],\"addTimeMarker\":false,\"defaultYExtents\":false,\"setYExtents\":false,\"yAxis\":{}},\"aggs\":[{\"id\":\"1\",\"enabled\":true,\"type\":\"avg\",\"schema\":\"metric\",\"params\":{\"field\":\"php_fpm.pool.connections.queued\",\"customLabel\":\"队列\"}},{\"id\":\"2\",\"enabled\":true,\"type\":\"date_histogram\",\"schema\":\"segment\",\"params\":{\"field\":\"@timestamp\",\"interval\":\"auto\",\"customInterval\":\"2h\",\"min_doc_count\":1,\"extended_bounds\":{},\"customLabel\":\"时间\"}},{\"id\":\"3\",\"enabled\":true,\"type\":\"terms\",\"schema\":\"group\",\"params\":{\"field\":\"php_fpm.pool.name\",\"size\":5,\"order\":\"desc\",\"orderBy\":\"1\",\"customLabel\":\"pool池\"}},{\"id\":\"4\",\"enabled\":true,\"type\":\"terms\",\"schema\":\"split\",\"params\":{\"field\":\"beat.name\",\"size\":5,\"order\":\"desc\",\"orderBy\":\"1\",\"customLabel\":\"主机\",\"row\":true}}],\"listeners\":{}}",
      "uiStateJSON": "{}",
      "description": "",
      "version": 1,
      "kibanaSavedObjectMeta": {
        "searchSourceJSON": "{\"index\":\"metricbeat-*\",\"query\":{\"query_string\":{\"query\":\"*\",\"analyze_wildcard\":true}},\"filter\":[]}"
      }
    }
  },
  {
    "_id": "PHPFPM-processes-idle",
    "_type": "visualization",
    "_source": {
      "title": "PHPFPM processes idle",
      "visState": "{\"title\":\"PHPFPM processes idle\",\"type\":\"area\",\"params\":{\"shareYAxis\":true,\"addTooltip\":true,\"addLegend\":true,\"legendPosition\":\"right\",\"smoothLines\":false,\"scale\":\"linear\",\"interpolate\":\"linear\",\"mode\":\"stacked\",\"times\":[],\"addTimeMarker\":false,\"defaultYExtents\":false,\"setYExtents\":false,\"yAxis\":{}},\"aggs\":[{\"id\":\"1\",\"enabled\":true,\"type\":\"avg\",\"schema\":\"metric\",\"params\":{\"field\":\"php_fpm.pool.processes.idle\",\"customLabel\":\"空闲\"}},{\"id\":\"2\",\"enabled\":true,\"type\":\"date_histogram\",\"schema\":\"segment\",\"params\":{\"field\":\"@timestamp\",\"interval\":\"auto\",\"customInterval\":\"2h\",\"min_doc_count\":1,\"extended_bounds\":{},\"customLabel\":\"时间\"}},{\"id\":\"4\",\"enabled\":true,\"type\":\"terms\",\"schema\":\"group\",\"params\":{\"field\":\"php_fpm.pool.name\",\"size\":5,\"order\":\"desc\",\"orderBy\":\"1\",\"customLabel\":\"pool池\"}},{\"id\":\"3\",\"enabled\":true,\"type\":\"terms\",\"schema\":\"split\",\"params\":{\"field\":\"beat.name\",\"size\":5,\"order\":\"desc\",\"orderBy\":\"1\",\"customLabel\":\"主机\",\"row\":true}}],\"listeners\":{}}",
      "uiStateJSON": "{}",
      "description": "",
      "version": 1,
      "kibanaSavedObjectMeta": {
        "searchSourceJSON": "{\"index\":\"metricbeat-*\",\"query\":{\"query_string\":{\"query\":\"*\",\"analyze_wildcard\":true}},\"filter\":[]}"
      }
    }
  },
  {
    "_id": "PHPFPM-slow_requests",
    "_type": "visualization",
    "_source": {
      "title": "PHPFPM slow_requests",
      "visState": "{\"title\":\"PHPFPM slow_requests\",\"type\":\"area\",\"params\":{\"shareYAxis\":true,\"addTooltip\":true,\"addLegend\":true,\"legendPosition\":\"right\",\"smoothLines\":false,\"scale\":\"linear\",\"interpolate\":\"linear\",\"mode\":\"stacked\",\"times\":[],\"addTimeMarker\":false,\"defaultYExtents\":false,\"setYExtents\":false,\"yAxis\":{}},\"aggs\":[{\"id\":\"1\",\"enabled\":true,\"type\":\"avg\",\"schema\":\"metric\",\"params\":{\"field\":\"php_fpm.pool.slow_requests\"}},{\"id\":\"2\",\"enabled\":true,\"type\":\"date_histogram\",\"schema\":\"segment\",\"params\":{\"field\":\"@timestamp\",\"interval\":\"auto\",\"customInterval\":\"2h\",\"min_doc_count\":1,\"extended_bounds\":{},\"customLabel\":\"时间\"}},{\"id\":\"4\",\"enabled\":true,\"type\":\"terms\",\"schema\":\"group\",\"params\":{\"field\":\"php_fpm.pool.name\",\"size\":5,\"order\":\"desc\",\"orderBy\":\"1\",\"customLabel\":\"pool池\"}},{\"id\":\"3\",\"enabled\":true,\"type\":\"terms\",\"schema\":\"split\",\"params\":{\"field\":\"beat.name\",\"size\":5,\"order\":\"desc\",\"orderBy\":\"1\",\"customLabel\":\"主机\",\"row\":true}}],\"listeners\":{}}",
      "uiStateJSON": "{}",
      "description": "",
      "version": 1,
      "kibanaSavedObjectMeta": {
        "searchSourceJSON": "{\"index\":\"metricbeat-*\",\"query\":{\"query_string\":{\"query\":\"*\",\"analyze_wildcard\":true}},\"filter\":[]}"
      }
    }
  }
]

补充logstash采集数据(方式2)

配置Metricbeat

/etc/metricbeat/metricbeat.yml

#================================ Outputs =====================================
# The Logstash hosts
output.logstash:

  hosts: ["localhost:5044"]

output输出到logstash的input tcp 5044上
input输入参考上面方式1 Metricbeat 的配置方法

配置logstash2.x

Logstash Reference [2.4] » Input plugins » beats

/etc/logstash/conf.d/beats.conf

input {
    beats {
        host => "127.0.0.1"
        port => 5044
    }
}
filter {
}
output {
    ... 输出到kafka 或者reids 或者 elasticsearch等,具体配置方法看官方文档
}

input输入tcp 5044 监听本机
output输出,根据自己环境来

测试配置文件是否正确

/opt/logstash/bin/logstash -t
Configuration OK

启动logstash

/etc/init.d/logstash start

提示

  • 使用测试配置文件的功能测试配置文件是否书写正确
  • 使用tail -f 日志的方式查错
  • 注意tcp监听的安全问题,别暴露到公网IP上

@echo off
title "WPS Office 2016 10.1.0.6135 个人版一键去广告"
set wps_vsion=10.1.0.6135
set wps_addr=%localappdata%\kingsoft\WPS Office\%wps_vsion%\
set desktoptip=desktoptip.exe
set wpsnotify=wpsnotify.exe
set wpscloudsvr=wpscloudsvr.exe
set wpsupdate=wpsupdate.exe
set wpsrenderer=wpsrenderer.exe

set desktoptip_addr=%wps_addr%wtoolex\%desktoptip%
set wpsnotify_addr=%wps_addr%wtoolex\%wpsnotify%
set wpsupdate_addr=%wps_addr%wtoolex\%wpsupdate%
set wpsrenderer_addr=%wps_addr%office6\%wpsrenderer%

taskkill /IM %desktoptip%
taskkill /IM %wpsnotify%
taskkill /IM %wpscloudsvr%
taskkill /IM %wpsupdate%
taskkill /IM %wpsrenderer%

copy "%desktoptip_addr%" "%wps_addr%wtoolex\%desktoptip%.bak"
copy "%wpsnotify_addr%" "%wps_addr%wtoolex\%wpsnotify%.bak"
copy "%wpsupdate_addr%" "%wps_addr%wtoolex\%wpsupdate%.bak"
copy "%wpsrenderer_addr%" "%wps_addr%office6\%wpsrenderer%.bak"

echo fwps > "%desktoptip_addr%"
echo fwps > "%wpsnotify_addr%"
echo fwps > "%wpsupdate_addr%"
echo fwps > "%wpsrenderer_addr%"

sc stop wpscloudsvr
sc config wpscloudsvr start= disabled

echo "处理完成!! 可以关闭本窗口了"
pause

保存为 “WPS Office 2016 10.1.0.6135 个人版一键去广告.bat” bat批处理,右键用超级管理员权限执行

简介

Prometheus

Prometheus 是一个开源的服务监控系统和时间序列数据库。作为容器领域的监控比较热门。
Prometheus 采用 PULL 拉的方式进行监控,如果不在一个网络中可以采用 Push gateway 作为中间件来被 PULL 拉。
架构图为

Prometheus 官方提供常用的 exporters 。
blackbox_exporter 是 Prometheus 官方提供的 exporter之一,可以提供 http、dns、tcp、icmp(ping)的监控数据采集。

Grafana

Grafana 是开源的,功能齐全的度量仪表盘和图形编辑器,支持 Graphite,OpenTSDB,Elasticsearch,Cloudwatch,Prometheus,InfluxDB,小米监控
等。

安装

分别安装grafana、prometheus、blackbox_exporter,安装文档参考官网。

配置

prometheus 配置和运行

在prometheus的配置文件 prometheus.yml 中增加一个新的job

  - job_name: 'ping_all'
    scrape_interval: 5s
    metrics_path: /probe
    params:
      module: [icmp]  #ping
    static_configs:
      - targets: ['219.150.32.132', '219.148.204.66', '183.59.4.178', '202.96.209.5', '202.100.4.15', '219.146.0.253', '222.173.113.3', '61.139.2.69', '61.128.114.133', '202.98.234.35', '202.100.64.68', '202.100.100.1', '221.232.129.35', '222.88.19.1', '222.190.127.1', '202.96.96.68', '202.101.103.54', '61.140.99.11', '202.96.161.1', '202.103.224.74']
        labels:
          group: '一线城市-电信网络监控'
      - targets: ['218.8.251.163', '218.107.51.1', '221.7.34.1', '112.65.244.1', '218.25.9.38', '124.89.76.214', '202.102.152.3', '202.102.128.68', '124.161.132.1', '221.7.1.20', '27.98.234.1', '218.104.111.122', '218.28.199.1', '221.6.4.66', '221.12.31.58', '218.104.136.149', '210.21.4.130', '210.21.196.6', '221.7.128.68', '221.199.15.5']
        labels:
          group: '一线城市-联通网络监控'
      - targets: ['211.137.241.34', '211.136.192.1', '218.203.160.194', '117.131.0.22', '211.137.32.178', '117.136.25.1', '211.137.191.26', '211.137.180.1', '211.137.106.1', '218.202.152.130', '211.139.73.34', '223.77.0.1', '211.138.16.1', '221.131.143.1', '211.140.10.2', '211.143.195.255', '183.238.55.1', '218.204.216.65', '211.138.245.180', '211.138.56.1']
        labels:
          group: '一线城市-移动网络监控'
      - targets: ['61.233.168.1', '61.234.113.1', '211.98.224.87', '222.44.44.1', '222.33.8.1', '61.232.201.76', '211.98.19.1', '61.232.49.1', '211.98.208.1', '61.233.199.129', '222.34.16.1', '61.232.206.1', '61.233.79.1', '211.98.43.1', '61.232.78.1', '61.232.93.193', '211.98.161.1', '61.235.99.1', '222.52.72.66', '211.138.56.1']
        labels:
          group: '一线城市-铁通网络监控'
      - targets: ['219.150.32.132', '219.153.32.1', '219.149.6.99', '220.189.218.45', '123.172.77.1', '202.103.124.34', '222.172.200.68', '202.100.64.68', '222.93.201.1', '222.191.255.1', '202.101.224.68', '222.85.179.1', '220.178.79.2', '59.49.12.97', '222.223.255.1', '219.148.172.97', '219.132.199.49', '14.120.0.1', '222.223.29.5', '222.173.222.1', '202.101.107.55', '222.74.58.29']
        labels:
          group: '二线城市-电信网络监控'
      - targets: ['60.24.4.1', '221.5.203.86', '202.96.86.18', '221.12.33.227', '202.99.96.68', '58.20.127.238', '221.3.131.11', '124.152.0.1', '221.3.131.11', '221.6.96.1', '220.248.192.13', '221.13.28.234', '211.91.88.129', '202.99.192.66', '202.99.160.68', '202.99.224.68', '58.252.65.1', '221.4.141.254', '60.2.17.49', '202.102.154.3', '36.249.127.25', '202.99.224.8']
        labels:
          group: '二线城市-联通网络监控'
      - targets: ['211.137.160.5', '218.201.17.2', '211.140.192.1', '211.140.35.1', '211.141.66.209', '211.138.225.2', '111.17.191.25', '218.203.160.195', '211.103.55.51', '211.103.114.1', '183.217.255.254', '211.139.2.18', '211.138.180.2', '211.142.12.2', '111.11.85.1', '211.138.91.2', '218.204.173.197', '183.233.223.25', '211.143.60.1', '120.192.167.2', '218.207.133.3', '120.193.138.1']
        labels:
          group: '二线城市-移动网络监控'
      - targets: ['61.234.70.2', '211.98.114.1', '61.232.157.1', '61.234.186.1', '61.235.225.1', '211.98.71.1', '211.98.72.7', '123.81.2.9', '211.98.44.65', '222.45.86.1', '222.49.82.1', '211.98.108.1', '222.48.116.17', '211.98.149.117', '61.235.145.1', '222.39.142.1', '110.211.253.9', '222.50.104.1', '61.234.76.1', '61.232.53.113', '61.234.196.185', '222.39.156.1']
        labels:
          group: '二线城市-铁通网络监控'
      - targets: ['166.111.8.28', '202.112.26.34', '202.114.240.6', '202.116.160.33', '211.65.207.170', '218.26.226.100', '222.221.252.162', '202.38.64.1', '202.201.48.1', '202.118.176.1', '121.193.138.1', '222.23.245.1', '58.195.136.3', '202.118.1.28', '59.77.229.1', '121.48.177.1', '121.250.214.1', '59.75.177.1', '59.75.112.254']
        labels:
          group: '教育网络监控'
    relabel_configs:
      - source_labels: [__address__]
        regex: (.*)(:80)?
        target_label: __param_target
        replacement: ${1}
      - source_labels: [__param_target]
        regex: (.*)
        target_label: ping
        replacement: ${1}
      - source_labels: []
        regex: .*
        target_label: __address__
        replacement: 127.0.0.1:9115  # Blackbox exporter.

顺序运行 blackbox_exporter,prometheus

grafana 配置和运行

运行grafana
在 grafana中增加 Data Sources 选 prometheus
然后按照grafana的文档新定制一个面板
ROW中指标选probe_duration_seconds{job=”ping_all”}

grafana prometheus ping监控