Python Elasticsearch client

Python 官方提供的 Elasticsearch 客户端库，支持 RESTful API

基本概念和架构

索引(Index):一个索引是一个逻辑上的容器，用于存储一组相关的文档(document)。每个文档都由多个字段组成，这些字段可以是文本、数字、日期等类型。
类型(Type):在Elasticsearch中，每个文档必须属于一个类型。类型定义了文档的结构和字段。
节点(Node):一个节点是一个运行Elasticsearch进程的计算机实例。一个集群可以由多个节点组成，每个节点都可以处理自己的索引和搜索请求。
集群(Cluster):一个集群是由多个节点组成的，它们共同工作以提供高可用性和可扩展性。
数据分片(Shard):一个索引可以被分成多个分片，每个分片都是一个独立的索引副本。分片可以在不同的节点上分布，以实现负载均衡和高可用性。
副本(Replica):一个分片可以有多个副本，以提高数据的可靠性和可用性。副本可以在不同的节点上分布，以实现负载均衡和高可用性。
映射(Mapping):映射定义了文档的结构和字段类型。它是一个JSON文件，包含了一系列的字段和它们的类型。
查询DSL(Domain Specific Language):查询DSL是一种特殊的语言，用于编写Elasticsearch查询语句。它支持各种查询类型，如过滤、聚合、排序等。

es链接

from elasticsearch import Elasticsearch
es = Elasticsearch(['localhost:9200'],
                   sniff_on_start=True,            # 连接前测试
                   sniff_on_connection_fail=True,  # 节点无响应时刷新节点
                   sniff_timeout=60,               # 设置超时时间
                   ignore=400)
print(es)

es相关操作

es.index 向指定索引添加或更新document,如果该索引不存在，会自动会创建索引添加document

1	print(es.index(index='student', id=3, body={"name": "小侯", "age": 24})) # 可以不指定id，默认生成一个id

es.create 创建索引,如果该索引不存在，会自动会创建索引添加document，索引存在,新增document（重复执行会报错）
1
print(es.create(index='student',id=2, body={"name": '一一', "age": 24}))

es.search 搜索符合匹配条件的document

index 搜索以逗号分隔的索引名称列表; 使用_all 或空字符串对所有索引执行操作
doc_type 要搜索的以逗号分隔的文档类型列表; 留空以对所有类型执行操作
body 使用Query DSL（QueryDomain Specific Language查询表达式）的搜索定义
_source 返回_source字段的true或false，或返回的字段列表，返回指定字段
_source_excludes 返回的_source字段中排除的字段列表，返回的所有字段中，排除哪些字段
_source_includes 返回的_source字段中包含的字段列表，返回的所有字段中，包含哪些字段
filter_path 参数用于减少elasticsearch返回的响应

print(es.search(index='student', body={"query": {"match": {"age": 24}}}))  # 一般查询
print(es.search(index='student', body={"query": {"match": {"age": 24}}}, _source=['name']))  # 结果字段过滤
print(es.search(index='student', body={"query": {"match": {"age": 24}}}, _source_excludes=['age']))  # 排除age字段
print(es.search(index='student', body={"query": {"match": {"age": 24}}}, _source_includes=['age']))  # 包含age字段
print(es.search(index='student',filter_path=['hits.total.value', 'hits.hits._source'])) # 过滤获取指定内容
print(es.search(index='test-index',filter_path=['hits.to*'])) # 支持统配符

es.get 查询索引中指定document
1
print(es.get(index='student', id=1))
es.get_source 通过索引、类型和id获取document的_source信息
1
print(es.get_source(index='student', id=2))

es.count 查询某索引下查询条件所匹配的document数量

body = {
    "query": {
        "match": {
            "name": '小侯'
        }
    }
}
print(es.count(index='student', body=body)['count'])

统计某索引下document的总数

1	print(es.count(index='student')['count'])

es.delete 删除指定document

1	print(es.delete(index='student', id=1))

es.delete_by_query 删除与查询匹配的所有document

1	print(es.delete_by_query(index='student', body={"query": {"match": {"name": "小侯"}}}))

es.exists 查询document是否存在,返回True或False

  print(es.exists(index='student', id=2))

- `es.info`    集群信息

  ```python
  print(es.info())

es.indices相关操作

es.indices.create 创建索引（常用）

# 举个栗子
# 创建一个名为blog的索引,并添加一个严格模式、有3个字段（title,name,content),title字段指定ik_max_word查询粒度的mappings
body = {
    "mappings": {
        "dynamic": "strict",
        "properties": {
            "title": {
                "type": "text",
                "analyzer": "ik_max_word"
            },
            "name": {
                "type": "text"
            },
            "content": {
                "type": "text"
            }
        }
    }
}
es.indices.create('blog', body=body)

es.indices.analyze 返回分词结果

1 2	print(es.indices.analyze(body={'analyzer': "ik_max_word", "text": "轻舟已过万重山"})) # 注：ik_max_word使用需要下载分词插件，交由es加载使用

es.indices.exists 查看索引是否存在，返回True或False
1
print(es.indices.exists(index="student"))
es.indices.delete 删除索引
1
es.indices.delete("blog")
es.indices.get 查询指定索引是否存在，不存在会报错
1
print(es.indices.get("blog"))

es.indices.put_alias 为一个或多个索引创建别名，查询多个索引的时候，可使用别名

1 2	print(es.indices.put_alias(index=['news', 'student'], name='new_alias')) print(es.search(index='new_alias'))

es.indices.delete_alias 删除一个或多个别名

es.indices.get_alias 检索一个或多个别名

1	print(es.indices.get_alias(index='new_alias'))

es.indices.get_mapping 查看document的映射关系
1
print(es.indices.get_mapping(index='blog'))

es.indices.get_field_mapping 检索特定字段的映射信息

1	print(es.indices.get_field_mapping(fields=['title', 'content'], index='blog'))

es.indices.get_settings 查看索引的设置信息
es.indices.exists_type 检查索引/索引中是否存在类型/类型
es.indices.flus 明确的刷新一个或多个索
es.indices.get_template 按名称检索索引模板
es.indices.open 打开一个封闭的索引以使其可用于搜索
es.indices.close 关闭索引以从群集中删除它的开销。封闭索引被阻止进行读/写操作
es.indices.clear_cache 清除与一个或多个索引关联的所有缓存或特定缓存
es.indices.get_uprade 监控一个或多个索引的升级程度
es.indices.put_mapping 注册特定类型的特定映射定义
es.indices.put_settings 实时更改特定索引级别设置
es.indices.put_template 创建一个索引模板，该模板将自动应用于创建的新索引
es.indices.segments 提供构建Lucene索引（分片级别）的低级别段信息

es.cat相关操作

es.cat.aliases 获取别名关联信息

1	print(es.cat.aliases(name="new_alias",format='json'))

es.cat.count 获取索引内document总数
1
print(es.cat.count(index='student'))
es.cat.health 从集群中获取健康度信息
1
print(es.cat.health(format='json'))

es.cat.indices 查询索引信息

1 2	print(es.cat.indices(format='json')) print(len(es.cat.indices(format='json'))) # 统计集群中索引数量

es.cat.master 获取主节点基础信息，IP地址，节点名称
1
print(es.cat.master(format='json'))
es.cat.plugins 获取插件信息
1
print(es.cat.plugins(format='json'))

es.cat.shards 获取索引分片信息

1	print(es.cat.shards(index='blog', format='json'))

cluster集群相关操作

es.cluster.get_settings 获取集群设
es.cluster.health 获取集群健康状态
es.cluster.state 获取集群综合状态
es.cluster.stats 获取集群当前节点状态

node节点相关操作

es.nodes.stats 获取节点统计信息
es.nodes.info 获取节点信息
es.nodes.hot_threads 获取节点线程信息
es.nodes.usage 获取节点使用信息

查询相关

查询所有文档
1
query = {"query": {"match_all": {}}}

查找名字叫一一的所有文档

1	query = {'query': {'term': {'username': '一一'}}}

查找年龄大于11的所有文档

  query = {'query': {'range': {'age': {'gt': 11}}}}

- match_phrase 按短语查询

  ```python
  body = {
      "query": {
          "match_phrase": {
              "name": "一一"
          }
      }
  }

must 逻辑与

must_body = {
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "name": "一一"
                    }
                },
                {
                    "match": {
                        "age": 25
                    }
                }
            ]
        }
    }
}

should 逻辑或

should_body = {
    "query": {
        "bool": {
            "should": [
                {
                    "match": {
                        "name": "一一"
                    }
                },
                {
                    "match": {
                        "age": 25
                    }
                }
            ]
        }
    }
}

逻辑非

must_not_body = {
    "query": {
        "bool": {
            "must_not": [
                {
                    "match": {
                        "name": "二"
                    }
                },
                {
                    "match": {
                        "age": 22
                    }
                }
            ]
        }
    }
}

filter过滤

filter_body = {
    "query": {
        "bool": {
            "should": [
                {
                    "match": {
                        "name": "一一"
                    }
                },
                {
                    "match": {
                        "age": 22
                    }
                }
            ],
            "filter": {
                "range": {  # gt大于,gte大于或等于,lt小于,lte小于或等于
                    "age": {
                        "gte": 23,
                        "lte": 26
                    }
                }
            }
        }
    }
}

sort排序

sort_body = {
    "sort": [
        {
            "age": {
                "order": "desc"
            }
        }
    ]
}

结果过滤

1	source_body = {"query": {"match": {"age": 24}}, "_source": ['name']}

结果分页

limit_body = {
    "from": 0,
    "size": 2
}

聚合函数：sum avg min max

sum_body = {
    "aggs": {
        "sum_age": {  # 自定义聚合名称
            "sum": {  # 聚合函数
                "field": "age"  # 按什么字段分组
            }
        }
    },
    "_source": False  # 是否显示其余字段
}

分组查询,输出每组的数量

group_body = {
    "aggs": {
        "myGroup": {  # 分组名
            "range": {
                "field": "age",  # 分组字段
                "ranges": [
                    {
                        "from": 20,  # 范围分组
                        "to": 22
                    },
                    {
                        "from": 22,
                        "to": 25
                    },
                    {
                        "from": 25,
                        "to": 30
                    }
                ]
            }
        }
    },
    "_source": False
}

设置主、复分片

shard_body = {
    "mappings": {
        "doc": {
            "properties": {
                "name": {
                    "type": "keyword"
                }
            }
        }
    },
    "settings": {
        "number_of_replicas": 1,
        "number_of_shards": 5
    }
}

其他

获取所有索引的名称

1
2
3

indices = es.cat.indices(format='json')
index_list = [indice.get('index') for indice in indices]
print(index_list)

批量写入

from elasticsearch import helpers

action = ({
    "_index": "s2",
    "_source": {
        "title": i
    }
} for i in range(100000))
helpers.bulk(es, action)