OSQuery - streaming OS events in SQL
OSQuery is an open source tool that lets you query operating system events using SQL.The events can be fed into a streaming platform, in this case Pulsar, for subsequent transformation and routing on the stream using Decodable.
OSQuery is unique in letting you use SQL to capture events as a stream. For example you can install OSQuery in your EC2 instance and it will capture events happening at the operating system level and forward it to a streaming platform. All the events are manifested as tables in OSQuery so that you can use SQL to query them.
OSQuery runs in the background as a daemon in the operating system, behind your running applications. You can schedule queries to be executed on the tables of your choice. OSQuery also runs in the foreground for interactive queries for debugging purposes as well as quick adhoc information.
For all use cases, these events need to be forwarded, routed and monitored. This blog will demonstrate how to do this easily using Decodable to route, aggregate, and ultimately sink it into a data store.
OSQuery extensions are plugins that allow you to process the events that OSQuery has been configured to listen to. The extensions are code written in C++, Python, GoLang, or any language that supports Thrift. The OSQuery daemon executes the extension and communicates with it using Thrift to send it events to process.
We will write an extension to OSQuery to process events and send them to Apache Pulsar, an open source, high throughput streaming platform for real-time workloads.
The Python Extension
Below is an example of an OSQuery Python extension that forwards events to Pulsar. There are two important parts to this code: ConfigPlugin and LoggerPlugin.
ConfigPlugin configures OSQuery by providing the query to execute and the interval to execute it. You can provide multiple entries in the schedule so that you’re pulling multiple types of events to Pulsar. You can find more examples of configurations using OSQuery query packs. They are pre-built popular queries based on use cases.
LoggerPlugin performs the forwarding of events to Pulsar. The line load_dotenv('/home/ubuntu/.env') needs to be updated to point to your own .env file. Details about the .env file will be available in the next section.
In this demonstration, we will build out the solution illustrated in the diagram below. We will use Decodable to pull these logs and filter and route them to different endpoints. The endpoints will serve as different purpose, one for monitoring and aggregation and the other for immediate alerting and action.
We will create a set of SQL statements that will route the OSQuery logs to these different endpoints so that they can be consumed specifically for their users.
Filtering Out OSQuery Logs
The statement below filters out the OSquery related logs. These logs will only create noise to the consuming applications. If the intention is to train a ML model, then this noise will degrade the performance of the model causing many false positives or worse, false negatives (a false negative means a suspicious act was not detected).
The output of this statement creates a cleaner version of the activities happening in the operating system.
Identifying Suspicious Events
The SQL statement below searches for suspicious events that are not part of the normal activities of the operating system. In this case, we search the cmdline field for processes not expected to run. We also search for processes that have been running for more than a day.
These events go into a suspicious_processes stream in Decodable to be consumed by a threat hunter.
The SQL below takes all of the filtered processes and cleanses them so that aggregation and statistics can be applied to the data. This is necessary since the data will be consumed by a dashboard where adhoc queries can be requested.
We perform cleansing only as part of the path to the monitoring dashboard and not for the alerting. This is because the cleansing process can take time. The threat hunter doesn’t care about clean data, only that she is alerted.
Try it for Yourself
Get the code from Decodable’s examples repository.
Threat hunters and data analysts don’t tend to know how to program in Python, C, or Java. But they more than likely will know SQL (or at least easily learn it). Decodable enables these roles to ask harder questions of their data and react to events faster.
Watch the demo:
You can get started with Decodable for free - our developer account includes enough for you to build a useful pipeline and - unlike a trial - it never expires.
Join the community Slack