1

I want index a pandas data frame into elasticsearch server. One of my columns is Timestamp and some of them are numbers and some are string. How can I import this type of dataframe in elasticsearch. I know that I can use _bulk API but I don't know How exactly?

import pandas as pd
df = pd.read_csv('week1_features.csv',index_col=0)
df.head()

<html>
<div>
  <table border="1" class="dataframe">
    <thead>
      <tr style="text-align: right;">
        <th></th>
        <th>srcIp</th>
        <th>collectionTimestamp</th>
        <th>destinationBytes</th>
        <th>destinationPackets</th>
        <th>sourceBytes</th>
        <th>sourcePackets</th>
        <th>hour</th>
        <th>WeekDay</th>
        <th>FlowNumber</th>
        <th>dstPort</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <th>0</th>
        <td>1.180.189.18</td>
        <td>2017-04-12 12:08:00</td>
        <td>0.0</td>
        <td>0.0</td>
        <td>60.0</td>
        <td>1.0</td>
        <td>12</td>
        <td>3</td>
        <td>1</td>
        <td>1</td>
      </tr>
      <tr>
        <th>1</th>
        <td>1.180.189.18</td>
        <td>2017-04-12 12:08:30</td>
        <td>0.0</td>
        <td>0.0</td>
        <td>0.0</td>
        <td>0.0</td>
        <td>12</td>
        <td>3</td>
        <td>1</td>
        <td>1</td>
      </tr>
      <tr>
        <th>2</th>
        <td>1.186.141.30</td>
        <td>2017-04-12 07:26:00</td>
        <td>0.0</td>
        <td>0.0</td>
        <td>60.0</td>
        <td>1.0</td>
        <td>7</td>
        <td>3</td>
        <td>1</td>
        <td>1</td>
      </tr>
      <tr>
        <th>3</th>
        <td>1.191.82.68</td>
        <td>2017-04-13 03:05:00</td>
        <td>0.0</td>
        <td>0.0</td>
        <td>60.0</td>
        <td>1.0</td>
        <td>3</td>
        <td>4</td>
        <td>1</td>
        <td>1</td>
      </tr>
      <tr>
        <th>4</th>
        <td>1.214.141.149</td>
        <td>2017-04-10 04:19:30</td>
        <td>0.0</td>
        <td>0.0</td>
        <td>136.0</td>
        <td>1.0</td>
        <td>4</td>
        <td>1</td>
        <td>1</td>
        <td>1</td>
      </tr>
    </tbody>
  </table>
</div>

</html>

1 Answer 1

4

By this function you can insert a pandas dataframe into elasticsearch easily. But for time column you have to apply map to time fieldName before insert dataframe.

def insertDataframeIntoElastic(dataFrame,index='index', typ = 'test', server = 'http://192.168.11.148:9200',
                           chunk_size = 2000):
    headers = {'content-type': 'application/x-ndjson', 'Accept-Charset': 'UTF-8'}
    records = dataFrame.to_dict(orient='records')
    actions = ["""{ "index" : { "_index" : "%s", "_type" : "%s"} }\n""" % (index, typ) +json.dumps(records[j])
                    for j in range(len(records))]
    i=0
    while i<len(actions):
        serverAPI = server + '/_bulk' 
        data='\n'.join(actions[i:min([i+chunk_size,len(actions)])])
        data = data + '\n'
        r = requests.post(serverAPI, data = data, headers=headers)
        print r.content
        i = i+chunk_size
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.