I have a Telegram bot @fengstatsbot which can record the group activities and generate reports. Recently the main db collection has reached around 1 million records. I can easily notice the waiting time before receiving the response from the bot. It can take up to 8 seconds for the bots to return result. When I looked into the restheart log, I can see that even inserting a new record is very slow, sometimes need more than 1 seconds.
After some research, I understand that adding index can improve reading speed. As my query is mainly use date and chat id to do the filtering, I first added a date index.
>db.messages.createIndex({"message.date": -1}) { "createdCollectionAutomatically" : false, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1 }
surprisingly, although my query selects a range of date for the data, adding date index didn’t improve much.
Then I added date and chat id as index
> db.messages.createIndex({"message.date": -1, "message.chat.id": -1}) { "createdCollectionAutomatically" : false, "numIndexesBefore" : 2, "numIndexesAfter" : 3, "ok" : 1 }
Cannot feel anything either. At last, I added chat id itself as a index
> db.messages.createIndex({"message.chat.id": -1}) { "createdCollectionAutomatically" : false, "numIndexesBefore" : 2, "numIndexesAfter" : 3, "ok" : 1 }
Then magic happened. I can feel that the bot is replying much faster, at least 3-5 times faster. By looking into the restheart log, I found that the aggregation query is perform much better, although insertion is as slow as before (it should be worse in theory, as now insertion need to update index too)
I am not sure weather the last index is the only thing I need to add, or all three of them are actually working together to make it work.
5/July/2019 updated: All this indexes helped, tested. How index is usefully depends on how the query is written.
db.messages.getIndexes() can be used to check all the indexes