Real-Time Log Collection with Fluentd and MongoDB
Tweet
About
This post shows how to use Fluentd’s MongoDB plugin to aggregate semi-structured logs in real-time (Please look at here for more detailed version).
Background
Fluentd is an advanced open-source log collector developed at Treasure Data, Inc (see previous post). Because Fluentd handles logs as semi-structured data streams, the ideal database should have strong support for semi-structured data. There are several databases that meet this criterion, but we believe MongoDB is the market leader.
For those of you who do not know what MongoDB is, it is an open-source, document-oriented database developed at 10gen, Inc. It is schema-free and uses a JSON-like format to manage semi-structured data.
This post shows how to import Apache logs into MongoDB with Fluentd, by really small configurations.
Mechanism
The figure below shows how the things work.

Fluentd does 3 things:
- It continuously “tails” the access log.
- It parses the incoming log entries into meaningful fields (such as
ip,path, etc) and buffers them. - It writes the buffered data to MongoDB periodically.
Install
For simplicity, this post shows the one-node configuration. You should have the following software installed on the same node.
- Fluentd with MongoDB Plugin
- MongoDB
- Apache (with the Combined Log Format)
Fluentd’s most recent version of deb/rpm package includes the MongoDB plugin. If you want to use Ruby Gems to install the plugin, gem install fluent-plugin-mongo does the job.
For MongoDB, please refer to the downloads page.
Configuration
Let’s start the actual configurations. If you use deb/rpm, the Fluentd’s config file is located at /etc/td-agent/td-agent.conf. Otherwise, it is located at /etc/fluentd/fluentd.conf.
Tail Input
For input, let’s set up Fluentd to track the recent Apache logs (usually at /var/log/apache2/access_log). This is what the Fluentd configuration looks like.
type tail
format apache
path /var/log/apache2/access_log
tag mongo.apache
Let’s go through the configuration line by line.
type tail: The tail plugin continuously tracks the log file. This handy plugin is part of Fluentd’s core plugins.format apache: Use Fluentd’s built-in Apache log parser.path /var/log/apache2/access_log: Assuming the Apache log is in/var/log/apache2/access_log.tag mongo.apache:mongo.apachtells Fluentd to parse the log entry into meaningtful fields.
That’s it. You should be able to output a JSON-formatted data stream for MongoDB to consume.
MongoDB Output
The output configuration should look like this:
# plugin type
type mongo
# mongodb db + collection
database apache
collection access
# mongodb host + port
host localhost
port 27017
# interval
flush_interval 10s
The match section specifies the regexp to match the tags. If the tag is matched, then the config inside the ... is used. In this example, the mongo.apache tag (generated by tail) is always used.
The ** in match.** matches zero or more period-delimited tag elements (e.g. match/match.a/match.a.b). flush_internal indicates how often the data is written to the database (MongoDB in this case). Other options specify MongoDB’s host, port, db, and collection.
Test
To test the configuration, just ping the Apache server however you want. This example uses ab (Apache Bench) program.
$ ab -n 100 -c 10 http://localhost/
Then, let’s access MongoDB and see the stored data.
$ mongo
> use apache
> db.access.find()
{ "_id" : ObjectId("4ed1ed3a340765ce73000001"), "host" : "127.0.0.1", "user" : "-", "method" : "GET", "path" : "/", "code" : "200", "size" : "44", "time" : ISODate("2011-11-27T07:56:27Z") }
{ "_id" : ObjectId("4ed1ed3a340765ce73000002"), "host" : "127.0.0.1", "user" : "-", "method" : "GET", "path" : "/", "code" : "200", "size" : "44", "time" : ISODate("2011-11-27T07:56:34Z") }
{ "_id" : ObjectId("4ed1ed3a340765ce73000003"), "host" : "127.0.0.1", "user" : "-", "method" : "GET", "path" : "/", "code" : "200", "size" : "44", "time" : ISODate("2011-11-27T07:56:34Z") }
Conclusion
Fluentd + MongoDB make real-time log collection simple, easy and robust.
And we’re hiring!
At Treasure Data, we are writing powerful software that makes Big Data accessible. All of your time should go into data analysis, not data management. We are here to help you do that.
We have a number of technical challenges ahead of us. We are small (a team of five) and actively looking for hackers and product managers who want to transform how people analyze Big Data. If you think you are a fit, please let us know. We’d love to talk to you!
Further Readings
- Store Apache Logs into MongoDB
- Fluentd MongoDB Plugin
- Fluentd Documentation
- Fluentd Plugins List
- Fluentd Source Code
Acknowledgement
Masahiro Nakagawa contributed the MongoDB plugin for Fluentd. Thanks Masahiro!
34 Notes/ Hide
-
messenger-bags-for-men-2 reblogged this from treasure-data
-
d1rk reblogged this from treasure-data
-
brightball-test likes this
-
brightball-test reblogged this from treasure-data
-
vdchuyen reblogged this from treasure-data
-
vdchuyen likes this
-
susatadahiro reblogged this from treasure-data and added:
tailというのがいいな。
-
seapomeranian reblogged this from treasure-data
-
dhotson likes this
-
jun26 likes this
-
jun26 reblogged this from treasure-data
-
marcelfahle likes this
-
treasure-data posted this