Blog /

Routing OSQuery Events via Apache Pulsar

Hubert Dulay

OSQuery - streaming OS events in SQL

OSQuery is an open source tool that lets you query operating system events using SQL.The events can be fed into a streaming platform, in this case Pulsar, for subsequent transformation and routing on the stream using Decodable.

OSQuery is unique in letting you use SQL to capture events as a stream. For example you can install OSQuery in your EC2 instance and it will capture events happening at the operating system level and forward it to a streaming platform. All the events are manifested as tables in OSQuery so that you can use SQL to query them.

OSQuery runs in the background as a daemon in the operating system, behind your running applications. You can schedule queries to be executed on the tables of your choice. OSQuery also runs in the foreground for interactive queries for debugging purposes as well as quick adhoc information.

Some popular security use cases for OSQuery are:
  • File Integrity Monitoring
  • Incident Investigation
  • Vulnerability Detection
  • Audit and Compliance

For all use cases, these events need to be forwarded, routed and monitored. This blog will demonstrate how to do this easily using Decodable to route, aggregate, and ultimately sink it into a data store.

OSQuery Extensions

OSQuery extensions are plugins that allow you to process the events that OSQuery has been configured to listen to. The extensions are code written in C++, Python, GoLang, or any language that supports Thrift. The OSQuery daemon executes the extension and communicates with it using Thrift to send it events to process.

We will write an extension to OSQuery to process events and send them to Apache Pulsar, an open source, high throughput streaming platform for real-time workloads.

The Python Extension

Below is an example of an OSQuery Python extension that forwards events to Pulsar. There are two important parts to this code: ConfigPlugin and LoggerPlugin.

ConfigPlugin configures OSQuery by providing the query to execute and the interval to execute it. You can provide multiple entries in the schedule so that you’re pulling multiple types of events to Pulsar. You can find more examples of configurations using OSQuery query packs. They are pre-built popular queries based on use cases.

LoggerPlugin performs the forwarding of events to Pulsar. The line load_dotenv('/home/ubuntu/.env') needs to be updated to point to your own .env file. Details about the  .env file will be available in the next section.

Routing Logs

In this demonstration, we will build out the solution illustrated in the diagram below. We will use Decodable to pull these logs and filter and route them to different endpoints. The endpoints will serve as different purpose, one for monitoring and aggregation and the other for immediate alerting and action.

We will create a set of SQL statements that will route the OSQuery logs to these different endpoints so that they can be consumed specifically for their users.

Filtering Out OSQuery Logs

The statement below filters out the OSquery related logs. These logs will only create noise to the consuming applications. If the intention is to train a ML model, then this noise will degrade the performance of the model causing many false positives or worse, false negatives (a false negative means a suspicious act was not detected).

The output of this statement creates a cleaner version of the activities happening in the operating system.

Identifying Suspicious Events

The SQL statement below searches for suspicious events that are not part of the normal activities of the operating system. In this case, we search the cmdline field for processes not expected to run. We also search for processes that have been running for more than a day.

These events go into a suspicious_processes stream in Decodable to be consumed by a threat hunter.


The SQL below takes all of the filtered processes and cleanses them so that aggregation and statistics can be applied to the data. This is necessary since the data will be consumed by a dashboard where adhoc queries can be requested.

We perform cleansing only as part of the path to the monitoring dashboard and not for the alerting. This is because the cleansing process can take time. The threat hunter doesn’t care about clean data, only that she is alerted.

Try it for Yourself

Get the code from Decodable’s examples repository.

Threat hunters and data analysts don’t tend to know how to program in Python, C, or Java. But they more than likely will know SQL (or at least easily learn it). Decodable enables these roles to ask harder questions of their data and react to events faster.

Watch the demo:

You can get started with Decodable for free - our developer account includes enough for you to build a useful pipeline and - unlike a trial - it never expires.

Learn more:

Join the community Slack

The Top 5 Streaming ETL Patterns

ETL and ELT are traditionally scheduled batch operations, but as the need for always-on, always-current data services becomes the norm, realtime ELT operating on streams of data is the goal of many organizations - if not the reality, yet.In real world usage, the ‘T’ in ETL represents a wide range of patterns assembled from primitive operations. In this blog we’ll explore these operations and see examples of how they’re implemented as SQL statements.

Learn more

Decodable Streaming with Eric Sammer

Eric Sammer is founder and CEO of Decodable and joins the show to discuss the potential of stream processing, its role in modern data platforms, and how it’s being used today.

Learn more

Decodable Demo @ Netflix

In this video, Decodable's CEO Eric Sammer and founding engineer Sharon Xie walk the real-time data team at Netflix through the Decodable story. Eric sets the scene with a summary of challenges faced by teams looking to adopt real-time data on existing technology platforms, and the rationale for Decodable. Sharon picks up the story with a demo-heavy deep dive into how Decodable works and the art of the possible with this new platform.

Learn more

Flink Deployments At Decodable

Decodable’s platform uses Apache Flink to run our customers’ real-time data processing jobs. This blog post explores how we securely, reliably and efficiently manage the underlying Flink deployments at Decodable in a multi-tenant environment.

Learn more

Demo Day - Connect Kafka to S3

Eric Sammer and Tim James demonstrate how Decodable connects Kafka to S3 by way of Decodable's own dogfooding - using for internal metrics - in the context of how Decodable connects to a range of systems including Redis, AWS Kinesis, Pulsar, RedPanda, RedShift, Snowflake, Snowpipe, Apache Pinot/StarTree and more.

Learn more


Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Learn more
Pintrest icon in black

Start using Decodable today.