Elasticsearch

[Python] elasticsearch bulk insert contain _id

Pydole 2020. 3. 4. 09:14

 

 elasticserch에 데이터를 넣으면 UUID를 이용하여 자동으로 생성해주지만 실제 RDB와 연동하여 사용할 경우 ID를 PK과 값과

같이 관리할 일이 있다.

 


 

Sample Dataframe

 

import numpy as np
import pandas as pd
from datetime import datetime
from elasticsearch import Elasticsearch
from elasticsearch import helpers



productId = ['AAA001', 'AAA002', 'AAA003' , 'AAA004', 'AAA005']
price = [15000, 11000, 21000, 25000, 14500]
soldDay = [datetime(2020, 1, 2, 0, 0),
           datetime(2020, 1, 3, 0, 0),
           datetime(2020, 1, 5, 0, 0),
           datetime(2020, 1, 7, 0, 0),
           datetime(2020, 1, 9, 0, 0)]

df = pd.DataFrame([ x for x in zip(productId, price, soldDay)], columns=['productId','price','soldDay'])
df

 

 

 

 

bulk insert

 

es = Elasticsearch(host=' ', port=' ')

data = [
  {
    "_index": "product",
    "_type": "product",
    "_id": x[0],
    "_source": {
        "price": x[1],
        "soldDay": x[2],
        "timestamp": datetime.today().date()}
  }
    for x in zip(df['productId'],df['price'],df['soldDay'])
]

helpers.bulk(es, data)

 

 

 

 

 

productId 컬럼이 _id로 등록 

 

 

 

 

_id 'AAA001' 로 검색

 

 

res = es.get(index="product", doc_type='product', id='AAA001')
print(res['_source'])



{'price': 15000, 'soldDay': '2020-01-02T00:00:00', 'timestamp': '2020-03-04'}