Get Excited About The Splunk Cloud ACS CLI

Splunk Cloud Admins rejoice! The Splunk Cloud ACS Command Line Interface is here! Originally, the Splunk Cloud Admin Config Service (ACS) was released in January 2021 to provide various self-service features for Splunk Cloud Admins. It was released as an API-based service that can be used for configuring IP allow lists, configuring outbound ports, managing HEC tokens, and many more which are all detailed in the Splunk ACS Documentation.

To our excitement Splunk has recently released a CLI version of ACS. The ACS CLI is much easier to use and less error-prone compared to the complex curl commands or Postman setup one has had to deal with to-date. One big advantage we see with the ACS CLI is how it can be used in scripted approach or within a deployment CI/CD pipeline to handle application management and index management.

We would recommend that you first refer to the ACS Compatibility Matrix to understand what features are available to the Classic and Victoria experience Splunk Cloud platforms.

ACS CLI Setup Requirements

Before you get started with the ACS CLI there are a few requirements to be aware of:

  • You must have the sc_admin role to be able to leverage the ACS CLI.
  • You must be running a Mac or Linux operating systems. However, if you are a Windows user you can use the Windows Subsystem for Linux (WSL), or any Linux VM running on Windows, to install and use the ACS CLI.
  • The Splunk Cloud version you are interacting with must be above 8.2.2109 to use the ACS CLI. To use Application Management functions, your Splunk Cloud version must be 8.2.2112 or greater.

Please refer to the Splunk ACS CLI documentation for further information regarding the requirements and the setup process.

ACS CLI Logging

At the time of authoring this blog, logging and auditing of interactions through the Splunk Cloud ACS is not readily available to customers. However, when using the ACS CLI it will create a local log on the system where it is being used. It is recommended that any administrators given access to work with the ACS CLI have the log file listed below collected and forwarded to the their Splunk Cloud stack. This log file can be collected using the Splunk Universal Forwarder, or other mechanism, to create an audit trail of activities.

  • Linux: $HOME/.acs/logs/acs.log
  • Mac: $HOME/Library/Logs/acs/acs.log

The acs.log allows an administrator to understand what operations were run, request IDs, status codes and much more. We will keep an eye out for Splunk adding to the logging and auditing functionality not just in the ACS CLI but ACS as a whole and provide a future blog post on the topic when available.

Interacting With The ACS CLI

Below are examples of common interactions an administrator might have with Splunk Cloud now done by leveraging the Splunk Cloud ACS CLI. There are many more self-service features supported by the ACS CLI, details of the supported features and CLI operations are available in the Splunk Cloud ACS CLI documentation

Application Management

One of the most exciting features of the ACS CLI is the ability to control all aspects of application management. That means, using the ACS CLI you can install both private applications and Splunkbase applications.

The command is easy to understand and straightforward, for both private and Splunkbase applications it supports commands to install, uninstall, describe applications within your environment as well as a list command to return a complete list of all installed applications, with their configurations. Specific to only Splunkbase applications there is an update command which allows you to, you guessed it, update the application to the latest version published and available.

For both private and Splunkbase apps, when running a command it will prompt you to enter your splunk.com credentials. You can pass –username –password parameters along with the command to avoid prompting for credentials. For private apps these credentials will be used to authenticate to AppInspect for application vetting.

Application Management: Installing a Private App

Let’s look at how we use the ACS CLI to install a private application. The following command will install a private app named company_test_app:

acs apps install private --acs-legal-ack Y --app-package /tmp/company_test_app.tgz

Now when a private app is installed using the ACS CLI it will automatically be submitted to AppInspect for vetting. A successful execution of the command will result in the following response, which you will note includes the AppInspect summary:

Submitted app for inspection (requestId='*******-****-****-****-************')
Waiting for inspection to finish...
processing..
success
Vetting completed, summary:
{
    "error": 0,
    "failure": 0,
    "skipped": 0,
    "manual_check": 0,
    "not_applicable": 56,
    "warning": 1,
    "success": 161
}
Vetting successful
Installing the app...
{
    "appID": "company_test_app",
    "label": "Company Test App",
    "name": "company_test_app",
    "status": "installed",
    "version": "1.0.0"
}
Application Management: Installing a Splunkbase Application

Let’s now look at an example of installing a Splunkbase application by running a command to install the Config Quest application:

acs apps install splunkbase --splunkbase-id 3696 --acs-licensing-ack http://creativecommons.org/licenses/by/3.0/

The licensing URL passed as a parameter in the command above can be found in the application details on Splunkbase. Additionally, by running a curl command the licensing URL can be retrieved from the Splunkbase API:

curl -s --location --request GET 'https://splunkbase.splunk.com/api/v1/app/3696' --header 'Content-Type: text/plain' | jq .license_url

Finally, a successful execution of the command will result in the following response:

Installing the app...
{
    "appID": "config_quest",
    "label": "Config Quest",
    "name": "config_quest",
    "splunkbaseID": "3696",
    "status": "installed",
    "version": "3.0.2"
}
Index Management

Index management using the ACS CLI supports a wide range of functionality. The supported commands allow you to create, update, delete and describe an index within your environment as well as a list command to return a list of all of the existing indexes, with their configurations.

Let’s now look at how we run one of these commands by running a command that creates a metrics index with 90 days searchable retention period. Note that ACS supports creating either event or metrics index, however it does not yet support configuring DDAA or DDSS.

acs indexes create --name scratch_01 --data-type metric --searchable-days 90

Finally, a successful execution of the command will return the following JSON response:

{
    "name": "scratch_01",

    "datatype": "metric",
    "searchableDays": 90,
    "maxDataSizeMB": 0,
    "totalEventCount": "0",
    "totalRawSizeMB": "0"
}
HEC Token Management

Managing HTTP Event Collector (HEC) token’s just got real easy. The ACS CLI supports commands to create, update, delete and describe a HEC token within your environment as well as a list command to return a list of all of the existing HEC token’s, with their configurations.

Let’s now look at how we run one of these commands by running a command to create a HEC token in Splunk Cloud quickly and easily:

acs hec-token create --name test_token --default-index main --default-source-type test

A successful execution of the command provides the token value in the JSON response:

{
    "http-event-collector": {
        "spec": {
            "allowedIndexes": null,
            "defaultHost": "************.splunkcloud.com",
            "defaultIndex": "main",
            "defaultSource": "",
            "defaultSourcetype": "test",
            "disabled": false,
            "name": "test_token",
            "useAck": false        },
        "token": "**********************"
    }
}

Looking to expedite your success with Splunk? Click here to view our Splunk Professional Service offerings.

© Discovered Intelligence Inc., 2022. Unauthorized use and/or duplication of this material without express and written permission from this site’s owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Discovered Intelligence, with appropriate and specific direction (i.e. a linked URL) to this original content.

Moving bits around: Automate Deployment Server Administration with GitHub Actions

Planning a sequel to the blog –  Moving bits around: Deploying Splunk Apps with Github Actions – led me to an interesting experiment. What if we could manage and automate the deployment server the same way, without having to log on to the server at all. After all, the deployment server is just a bunch of app directories and a serverclass.conf file.

Read more

Moving bits around: Deploying Splunk Apps with Github Actions

It would be reasonable to argue that no matter the size of the deployment, there aren’t many Splunk deployments out there that have not leveraged the Deployment Server to manage and distribute Splunk apps to other components. Just put everything in the $SPLUNK_HOME/etc/apps/deployment-apps directory of the Deployment Server and create server classes connecting the relevant apps to the appropriate clients that are phoning home. Easy, right? But the big catch with that is this — what if we overwrite a working app with some modifications that may then have to be rolled back, or say, multiple Splunk admins are editing the same configurations or if we accidentally delete one or more apps within the directory and we don’t know which ones. Of course, restoring a full backup of that directory might solve all these problems, provided a full back-up is regularly taken at a short enough interval but this isn’t a great way of managing it in a dynamic environment where there are always changes getting pushed over the apps. It turns out that these are the problems that a version control tool is designed to solve.

Now for most folks, when you hear about version control or source code control, Git is the first and perhaps the only word that comes to mind. And the second word will likely be GitHub which is arguably the most popular source code hosting tool out there that’s based on Git. But is it enough to use Git and Github for version-controlling and hosting Splunk apps for deployment? In a functional sense yes, but not so much from an admin perspective. You must still manage deploying these apps to Splunk Deployment server. This is what could be an example of a “toil” according to Google’s SRE principles. This can and should be eliminated by simply having a CI/CD setup. By the end of 2019, GitHub introduced their own CI/CD setup native to the GitHub platform called GitHub Actions. GitHub Actions is a workflow orchestration and automation tool that can trigger actions based on events such as changes in the GitHub repository. GitHub Actions in our case, can help automate the task of deploying apps to the Deployment Server staging directory.

Automate Splunk App Deployment with GitHub Actions

So we have hosted our Splunk apps in a GitHub repository properly source-controlled. Now let’s explore how we can automate deploying them to the Deployment Server using GitHub Actions. 

Note: What this article covers is not a production ready prescriptive solution. The use of GitHub Actions here is solely because of the relatively simple one-stop-shop approach in realizing the benefits of version-controlled hosting as well as continuous deployment of Splunk apps.

The setup consists of three parts – the source (GitHub Repository), the intermediary (runner) and the destination (Deployment Server). GitHub Actions invokes a runner instance as an intermediary to run the actions from. This instance is what will connect to the target server. This can either be a self-hosted runner that you must provision in your infrastructure or a GitHub-hosted runner.

Let me highlight a couple of important factors at play in choosing the runner instance type.

1. Security Considerations

Hosting self-hosted runners or using GitHub-hosted runners have some common as well as unique security implications. While network connectivity requirements are unique to each approach, SSH authentication is common to both. You may either not want to allow external connections directly to Deployment Server or you may be having a public repository. GitHub recommends that you only use self-hosted runners with private repositories. This is because forks of your repository can potentially run dangerous code on your self-hosted runner machine by creating a pull request that executes the code in a workflow. This is not an issue with GitHub-hosted runners because each GitHub-hosted runner is always a clean isolated virtual machine, and it is destroyed at the end of the job execution.

2. Usage limits and Billing

Usage limits are primarily based on storage and free minutes. Self-hosted runners are free to use but come with some usage limits. For GitHub-hosted runners, different usage limits apply.

I have linked the documentation in the appendix for further reading on this topic.

For demonstration purposes, I am going to use a self-hosted runner.

Destination:

Let’s configure the destination first which is the Deployment Server.

On a high level, the steps involve

  1. Creating an SSH key-pair
  2. Creating a user specific for the task in the Deployment Server
  3. Making the Deployment Server accessible using the above created SSH key-pair for the created user
  4. Setting proper permissions on the target staging directory

First off, we create an SSH key locally like so:

ssh-keygen -t ed25519 -C "your_email@example.com"

Enter the file name to save the keys and leave the passphrase field empty.

Then we login to the Deployment Server and create a user, say, ghuser, in there.

Make the host accessible for the user over SSH by adding the above created public key to the /home/ghuser/.ssh/authorized_keys. I have linked a page in the appendix that covers step-by-step instructions on how to do this in a Linux instance.

Next, we need to give this user full access to $SPLUNK_HOME/etc/deployment-apps directory. For instance, if Splunk is installed under /opt, then:

setfacl -R -m u:ghuser:rwx /opt/splunk/etc/deployment-apps

Now, if Splunk is run as a non-root user, commonly named as splunk, then that user can be leveraged for this purpose in which case you do not need to grant any additional directory permissions as above.

Once this is completed, we now have a user that can SSH to the deployment server and modify the deployment-apps directory. We will be using this user in our GitHub Actions.

Intermediary:

Once the runner instance is provisioned , we need to install the client application on the host to poll the repository. Go to Settings -> Actions -> Runners in the GitHub Repository.

When you click on the Add runner button as shown above and select the OS and CPU arch, you are presented with the instruction to set up the client application. Now for the client application to successfully do HTTPS long polls to the GitHub repository, you must ensure that the host has the appropriate network access to communicate with specific GitHub URLs. Appendix has a link that points to those URLs.

Next the self-hosted runner needs to be set up with Docker for the specific GitHub Action that we are going to set up in the next step. This is also straightforward. Here I am using an Amazon Linux 2 EC2 instance and here are the installation steps for that:

  1. Update your system
    $ sudo yum update -y
  2. Install Docker
    $ sudo yum install docker -y
  3. Start Docker
    $ sudo service docker start
  4. Add your user to the docker group
    $ sudo usermod -a -G docker USERNAME
  5. Log out and log back in.
  6. Verify Docker runs without sudo
    $ docker run hello-world

I have linked a document in Appendix that covers Docker installation on different Linux flavors.

Source:

GitHub Actions has a marketplace where we can look for off-the-shelf solutions which in our case is to push the apps out from the repository to the deployment server. In this example, I have used two workflows; 1) checkout that is a standard GitHub-provided Action to check out the repository and 2) rsync-deployments that essentially spins up a docker container in the runner to rsync the specified directory from the checked-out repository to the destination directory in the target host.

First, we create a repository with a sub-directory that contains all the Splunk apps to be copied to the deployment server’s deployment-apps directory. In this example the repository I have used is test-deploy-ds and all the Splunk apps reside within a subdirectory that I have named as deployment-apps to match with the target directory, but this can be any name you want. See below:

Then we create a simple workflow from the Actions tab of the repository like so:

Name the yml file that opens in the next screen suitably like push2ds.yml or so.

Modify the file as below.

# This is a basic workflow to help you get started with Actions

name: CI

# Controls when the action will run. 
on:
  # Triggers the workflow on push or pull request events but only for the main branch
  push:
    branches: [ main ]

  # Allows you to run this workflow manually from the Actions tab
  workflow_dispatch:

# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
  # This workflow contains a single job called "build"
  build:
    # The type of runner that the job will run on
    runs-on: self-hosted

    # Steps represent a sequence of tasks that will be executed as part of the job
    steps:
      # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
      - uses: actions/checkout@v2
 
      # Runs the Rsync Deployment action
      - name: Rsync Deployments Action
        uses: Burnett01/rsync-deployments@4.1
        with:
          switches: -avzr --delete --omit-dir-times --no-perms --no-owner
          path: deployment-apps/
          remote_path: /opt/splunk/etc/deployment-apps
          remote_host: ${{ secrets.DEPLOY_HOST }}
          remote_user: ${{ secrets.DEPLOY_USER }}
          remote_key: ${{ secrets.DEPLOY_KEY }}

Explanation:

1) This workflow is triggered upon a push to main branch

2) The build specifies the job that will be run on a self-hosted runner

3) The steps in the build job includes checking out the repository using the checkout action followed by the rsync execution using the rsync-deployments action

Lets dissect the rsync-deployments action as this is the custom code I had to write for the use case:

  • the name attribute is a briefly descriptive name of what the Action does
  • the uses attribute then includes the marketplace action rsync-deployments to be referenced
  • the with attribute has several attributes inside as below
    • switches attribute has the parameters required to be passed with the rysnc command. Check out the link in the appendix for what each of them does.
    • path represents the source directory name within the repository which in this case has been named as deployment-apps
    • remote_path is the deployment server $SPLUNK_HOME/etc/deployment-apps directory
    • remote_host is the deployment server public IP or hostname
    • remote_user is the username we created in the deployment server that is ghuser
    • remote_key is the SSH private key created earlier to be used to authenticate into the deployment server

Note the use of GitHub Secrets in the last few attributes. This is a simple yet secure way to storing and accessing sensitive data that is susceptible to misuse by a threat actor. Below image shows where to set them.

PS: remote_port is an accepted attribute that has been skipped here as it defaults to 22. You can choose to specify a port number if default port 22 is not used for SSH.

As soon as the above yml file is committed or a new app is committed, the workflow job kicks off. The job status can be verified as seen in the below images.

Go to Actions tab:

Click on the latest run Workflow at the top – here ‘trigger GHA only on push to main’ which is the commit message:

Click on the job – build. You can expand all steps in the build job to look for detailed execution of that step. The build status page also highlights any failed step in red. Expand that step to check failure reasons.

Once it is verified that the job has successfully completed, we can login to the deployment server and confirm that the Splunk apps are pushed to the $SPLUNK_HOME/etc/deployment-apps directory.

$ ls -lart /opt/splunk/etc/deployment-apps/
total 8
drwxr-xr-x  16 splunk splunk 4096 Jun 24 18:11 ..
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 TA-org_splunk
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_APP_TEMPLATE
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_all_indexer_base
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_all_forwarder_outputs
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_all_deploymentclient
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_all_app_props
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_search_volume_indexes
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_indexer_volume_indexes
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_full_license_server
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_dept_app_inputs
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_cluster_forwarder_outputs
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_all_search_base
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_all_indexes
drwxrwxr-x+ 16 splunk splunk 4096 Jun 29 05:00 .
drwxrwxr-x   3 ghuser ghuser   37 Jun 29 15:21 000_all_forwarder_outputs_route_onprem_and_cloud

A word of caution though, if we are pushing the apps using a user other than splunk that owns $SPLUNK_HOME, then such apps when pushed to the deployment clients will not preserve the ownership or permissions, instead, will have a permission mode of 700. Let’s look at how one of these apps org_APP_TEMPLATE will appear at a target forwarder of a serverclass.

$ ls -lart /opt/splunkforwarder/etc/apps/ | grep org
drwx------  4 splunk splunk   35 Jun 29 18:37 org_APP_TEMPLATE

Now if you’re wondering – wait, do I need to provision an extra server? – be aware that there is also the option of using a GitHub-hosted runner. This needs an update in the push2ds.yml’s runs-on: attribute; for e.g. If you want to simply have a Linux-flavored host as the intermediary, just update the attribute like so –  runs-on: ubuntu-latest . But keep in mind that this will require opening the SSH port of the deployment server to external IPs as well as some cost implications.

Conclusion

In this article we touched upon the benefits of version control for Splunk apps managed and distributed via a Deployment Server. Then we explored a simple practical approach to this using GitHub Actions and the main considerations if we’re going down this path. We then proceeded to apply it in a practical use case. If you are not using GitHub in your organization, depending on your CI/CD pipeline, you could possibly re-engineer the solution to fit for purpose. If you found this useful, please watch this space for a sequel about how this opens up further possibilities in end-to-end Splunk apps management in a distributed clustered deployment.


Appendix:

Communication between self-hosted runners and GitHub
About Github-hosted runners – IP Addresses allow-list
About billing for GitHub Actions
Self-hosted runners – Usage limits
Github-hosted runners – Usage limits
How to create a new user that can SSH into a Linux host
Install Docker on Linux
Github Action for Rsync – rsync deployments
Rsync Parameters
Customizing Github-hosted runners


Looking to expedite your success with Splunk? Click here to view our Splunk Professional Service offerings.

© Discovered Intelligence Inc., 2021. Unauthorised use and/or duplication of this material without express and written permission from this site’s owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Discovered Intelligence, with appropriate and specific direction (i.e. a linked URL) to this original content.

Solving Roaming Users: HTTP Out for the Splunk Universal Forwarder

The release of version 8.1.0 of the Splunk Universal Forwarder introduced a brand new feature to support sending data over HTTP. Traditionally, a Splunk Universal Forwarder uses the proprietary Splunk-to-Splunk (S2S) protocol for communicating with the Indexers. Using the ‘HTTP Out Sender for Universal Forwarder’ it can now send data to a Splunk Indexer using HTTP. What this feature does is effectively encapsulates the S2S message within a HTTP payload. Additionally, this now enables the use of a 3rd party load-balancer between Universal Forwarders and Splunk Receivers. To date, this is a practice which has not been recommended, or supported, for traditional S2S based data forwarding.

Where the new HTTP Out feature is especially useful is in scenarios such as collecting data from systems in an edge location or collecting data from a roaming user’s device. Typically in these situations it would require more complex network configuration, or network traffic exceptions, to support traditional S2S for the connection from the Universal Forwarder to the Indexers. HTTP Out now allows the Universal Forwarder to make use of a standard protocol and port (443), which is generally open and trusted, for outgoing traffic.

Use Case: The Roaming User

Let’s take a look at how we can use the HTTP Out feature of the Splunk Universal Forwarder to transmit data from the laptop of a roaming user, or generally a device outside of our corporate perimeter, which is an occurrence that has become more and more common with the shift to work from home during the pandemic.

For the purpose of this demonstration, we will be working with the following environment configuration:

  1. Splunk environment in AWS with 2 Indexers and 1 Search Head
  2. Internet-facing AWS Load Balancer
  3. Laptop with the Splunk Universal Forwarder (8.1.0)

Step 1: Configure The Receiver

On our Splunk Indexers we have already configured the HTTP Event Collector (HEC) and created a token for receiving data from the Universal Forwarder. Detailed steps for enabling HEC and creating a token can be found on the Splunk Documentation site here.

Step 2: Configure The Load Balancer

The next thing we need is a Load Balancer which is Internet facing. HTTP Out on the Splunk Universal Forwarder supports Network Load Balancers and Application Load Balancers.  For this use case, we have created an Application Load Balancer in AWS. The Load Balancer has a listener created for receiving connection requests on port 443 and forwards them to the Splunk Indexer on port 8088 (the default port used for HEC). The AWS Application Load Balancer provides a DNS A record which we will be using in the Universal Forwarder outputs configuration.

Step 3: Configure The Universal Forwarder

The last step is to install Splunk Universal Forwarder on the roaming user’s laptop and configure HTTP Out using the new httpout stanza in outputs.conf.

We have installed the Universal Forwarder on one of our laptops and created the following configuration within the outputs.conf file. For ease of deployment, the outputs.conf configuration file is packaged in a Splunk application and deployed to the laptop to enable data forwarding via HTTP.

[httpout]
httpEventCollectorToken = 65d65045-302c-4cfc-909a-ad70b7d4e593
uri = https://splunk-s2s-over-http-312409306.us-west-2.elb.amazonaws.com:443 

The URI address within this configuration is the Load Balancer DNS address which will handle the connection requests to Splunk HTTP Event Collector endpoints on the Indexers. 

The Splunk Universal Forwarder HTTP Out feature also supports batching to reduce the number of transactions used for sending out the data. Additionally, a new configuration LB_CHUNK_BREAKER is introduced in props.conf. Use this configuration on the Universal Forwarder to break events properly before sending the data out. When HTTP Out feature is used with a 3rd party load balancer, LB_CHUNK_BREAKER prevents partial breaking of a data, and sends a complete event to an Splunk Indexer. Please refer to the Splunk Documentation site here for detailed information on the available parameters.

Test and Verify Connectivity

Now that we have our configuration in place we need to restart the Splunk Universal Forwarder service. After this restart occurs we can immediately see that the internal logs are being received by our Splunk Indexers in AWS. This is a clear indicator that the HTTP Out connection is working as expected and data is flowing from the Universal Forwarder to the Load Balancer and through to our Splunk Indexers.

To demonstrate the roaming use case, we have written a small PowerShell script that will run on the laptop. The PowerShell script will generate events printing the current IP address, user, location, city, etc. The Splunk Universal Forwarder will execute this PowerShell script as a scripted input and read the events generated by it. Now, when we search within our Splunk environment we can see that the events being generated by the PowerShell script are flowing correctly, and continuously, to our Splunk Indexers. The laptop connects to the Load Balancer via a home network with no special requirements for network routing or rules.

Let’s now move to a different network by tethering the laptop through a mobile phone for Internet connectivity. This is something that may be common for people while on the road or in areas with minimal wifi access. What we will now observe is that data forwarding to our Splunk Indexers continues without any interruption even though we are on a completely new network with its own infrastructure, connectivity rules, etc. The screenshot below shows that the location and IP address of the laptop has changed however the flow of events from the laptop has not been interrupted.

This configuration could now be deployed to an entire fleet of roaming user devices to ensure that no matter where they are or what network they are on, there is continuous delivery of events using an Internet-facing Load Balancer. This will help IT and Security teams make sure they have the necessary information at all times to support, and protect, their corporate devices.


Looking to expedite your success with Splunk? Click here to view our Splunk Professional Service offerings.

© Discovered Intelligence Inc., 2021. Unauthorised use and/or duplication of this material without express and written permission from this site’s owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Discovered Intelligence, with appropriate and specific direction (i.e. a linked URL) to this original content.

Interesting Splunk MLTK Features for Machine Learning (ML) Development

The Splunk Machine Learning Toolkit is packed with machine learning algorithms, new visualizations, web assistant and much more. This blog sheds light on some features and commands in Splunk Machine Learning Toolkit (MLTK) or Core Splunk Enterprise that are lesser known and will assist you in various steps of your model creation or development. With each new release of the Splunk or Splunk MLTK a catalog of new commands are available. I attempt to highlight commands that have helped in some data science or analytical use-cases in this blog.

Read more

Becoming a Dad: A Data Story using Splunk

This blog post is not technical in nature whatsoever, it’s simply a fun story about two passions; Becoming a first time father and Splunk. So sit back, grab your beverage of choice and let me tell you a little data story about how I Splunk’d myself Becoming a Dad.

I never really expected to share this story but earlier this summer I had the opportunity to speak in front (well, virtually of course) of my fellow SplunkTrust members at the SplunkTrust Summit so I thought why not share the story about how I decided to use Splunk to understand what it was like becoming a dad for the first time. Ok, but why am I only blogging about this now, especially since my son is now 2 years old? Well Splunk .conf20 has just come to a close and I was inspired by a lot of the fun use cases and stories being shared in the Data Playground and I thought it was time to sit down and write about my data story.

Now if you’d simply prefer to watch my presentation from the SplunkTrust Summit rather than read about it here, you can certainly do so by navigating to the video on our YouTube channel.

Before I begin, I must also give a shout out to my amazing wife, Rachel, who without her strength and positivity this data story simply would not be possible.

The “News”

In early January 2018 my wife shared with me the exciting news that we were going to be first time parents. Naturally in the days, weeks, heck even months to follow a lot of questions entered my mind; What is it like? How to prepare? What to expect? and so on. Well of course, everywhere you turn everyone has their own experience to share but it’s exactly that, their own experience. So I thought, when someone asks me about my experience in the future I want to have the best understanding possible. I want to be able to look back at it, better understand what it was like and have the ability to ask questions of the experience after the fact so, of course, why not Splunk becoming a dad? And that’s exactly what I did.

The Questions

To know where to begin I started out with asking myself, what questions would I want to answer about the process of becoming a dad? When talking to other parents about their experience no one ever shied away from sharing the ups and downs, “I slept only one hour a night” … “The experience was beautiful, no stress” … “It all happened real fast” and so on.

So I thought what if for myself I could answer some of these basic questions:

  • Am I even getting any sleep?
  • Is my sleep consistent or just completely broken?
  • Am I getting any exercise walking and carrying our new baby around?
  • What was my heart rate like leading up, during the birth and after?
  • Did I experience a high heart rate (stress) during certain milestones of the process?

The Data

To answer the questions knowing what was a head of me, I really wanted to focus on keeping the data collection simple and not have to worry about instrumenting much or having something break and lose out on that data collection. So to achieve that simplicity I bought myself a Fitbit Alta HR. The Fitbit was capable of tracking heart rate, sleep, calories burned and of course steps all while having good battery life with a small and comfortable design.

I collected the Fitbit data by writing a simple python script to call the Fitbit API and collect all of the stats I was looking for.

Admittedly, I did have one other data set to work with after my son was born that I did not foresee at all and that was his diaper change log written out as a Splunk lookup. Yup, I went that far.

The New Daddy Dashboard

After my son was born and all of the data had been collected and indexed in Splunk (don’t worry, I waited a month to do this, it wasn’t immediately after he was born :)) it was time to begin to answers those questions that I started out with and build myself a dashboard.

Enter The New Daddy Dashboard, a visual representation of the questions I wanted to originally answer and more questions I realized I had after I had the data. The data spans the three weeks before my son was born up to six weeks after.

custom splunk dashboard

There is certainly a lot going on within this dashboard so let’s break it down.

First, I started off with averaging out my heart rate and sleep. This is broken out into three time range’s, the time leading up to my son being born (“Pre-Baby”), the 48 hours when he was born (“During Birth”) and the two weeks following (“Post Birth”).

splunk for health care data

The questions we are answering here were, am I getting any sleep and how was my heart rate? Well it went pretty much as I expected, the 48 hour period during birth my average heart rate shot up and my sleep was almost non-existent each day. No real surprises there however you will notice that my average minutes of sleep post-birth dropped quite a bit from where it was pre-baby. Needless-to-say I did not really take the advice of “sleep when your baby sleeps” and it cost me.

Next, I was interested to see how I handled what I call “Go Time”, which you guessed it, labour is happening, baby is coming and ultimately the baby is here. What I created was a visualization charting my heart rate over this time frame; green representing a normal heart rate range, yellow a moderately high heart rate and red being a very high heart rate. Additionally on this chart I added annotations to mark each major “milestone” of the experience.

heart rate monitor "go time"

What I could see here is what I kind of expected of myself, my heart rate would quickly rise as the stress grew with each major milestone encountered; labour starting, driving to the birth center, the trip to the hospital, my son’s birth, my first ever diaper change and our first few hours after leaving the hospital.

If we zoom in on this timeline you can see the correlation of high heart rate to milestone much clearer and how after leaving our home for the birth center my heart rate rarely ever went down until after I changed my first ever diaper and thought ‘ok, I got this’.

time zoomed

Now that I knew the answer to the question “Did I experience stress during this process?” it was time to move on to figure out if I actually got any exercise. Again, an answer that probably is not going to be shocking at all.

Looking at these charts what I could see was that exercise came and went. The couple days that we spent at the hospital after my son was born there was a good bit of walking mainly for the desire to be out of the hospital room. Then really after that exercise went out the window with funny enough the only consecutive spikes in the chart were for when I attended Splunk .conf18 in Orlando.

To round out The New Daddy Dashboard I decided to have a little bit of fun with that unexpected data set I mentioned earlier. I created a set of charts which I called the Baby Change Tracker, for you guessed it, diaper changes. Even writing this now it still makes me laugh that my wife and I tracked this.

splunk for activity tracking

I’m pretty sure the colors are a dead give away about the data so I wont elaborate on that but what I was trying to see here was two fold; were there any consistencies or patterns to the types of activity driving the diaper change and could we start to predict and forecast the diaper changes?

Where I landed with this analysis was that, although we could start to see patterns in the data I still had a very unpredictable baby, as Im sure we all do, but in the end (no pun intended) it was still fun to visualize this data and this blog post is something I’m sure my son will look back on one day and go ‘why dad, why?!‘.

What I Discovered Becoming a Dad

In the end what did I discover and what did Splunk show me about becoming a dad?

  1. Get exercise before hand; your heart will thank you.
  2. Expect the unexpected; there’s very little predictability.
  3. You’ll get little-to-no Sleep; but take whatever you can get!
  4. Get up and walk; not just at the hospital.
  5. It’s one heck of a ride, enjoy it!

Looking to expedite your success with Splunk? Click here to view our Splunk Professional Service offerings.

© Discovered Intelligence Inc., 2020. Unauthorised use and/or duplication of this material without express and written permission from this site’s owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Discovered Intelligence, with appropriate and specific direction (i.e. a linked URL) to this original content.

Oops! You Indexed Sensitive Data in Splunk

Every organization deals with sensitive data like Personally Identifiable Information (PII), Customer Information, Propriety Business Information…etc. It is important to protect the access to sensitive data in Splunk to avoid any unnecessary exposure of it. Splunk provides ways to anonymize sensitive information prior to indexing through manual configuration and pattern matching. This way data is accessible to its users without risking the exposure of sensitive information. However, even in the best managed environments, and those that already leverage this Splunk functionality, you might at one point discover that some sensitive data has been indexed in Splunk unknowingly. For instance, a customer facing application log file which is actively being monitored by Splunk may one day begin to contain sensitive information due to a new application feature or change in the application logging.

This post provides you with two options for handling sensitive data which has already been indexed in Splunk.

Option 1: Selectively delete events with sensitive data

The first, and simplest, option is to find the events with sensitive information and delete them. This is a suitable choice when you do not mind deleting the entire event from Splunk. Data is marked as deleted using the Splunk ‘delete’ command. Once the events are marked as deleted, they are not going to be searchable anymore.

As a pre-requisite, ensure that the account used for running the delete command has the ‘delete_by_keyword’ capability. By default, this capability is not provided to any user. Splunk has a default role ‘can_delete’ with this capability selected, you can add this role to a user or another role (based on your RBAC model) for enabling the access.

Steps:

  1. Stop the data feed so that it no longer sends the events with sensitive information.
  2. Search for the events which need to be deleted.
  3. Confirm that the right events are showing up in the result. Pipe the results of the search to delete command.
  4. Verity that the events deleted are no longer searchable.

Note: The delete command does not remove the data from disk and reclaim the disk space, instead it hides the events from being searchable.

Option 2: Mask the sensitive data and retain the events

This option is suitable when you want to hide the sensitive information but do not want to delete the events. In this method we use rex command to replace the sensitive information in the events and write them to a different index.

The summary of steps in this method are as follows:

  1. Stop the data feed so that it no longer sends the events with sensitive information.
  2. Search for the intended data.
  3. Use rex in sed mode to substitute the sensitive information.
  4. Create a new index or use an existing index for masked data.
  5. With the collect command, save the results to a different index with same sourcetype name.
  6. Delete the original (unmasked) data using the steps listed in Option 1 above.

As mentioned in Option 1 above, ensure that the account has the ‘delete_by_keyword’ capability before proceeding with the final step of deleting the original data.

Let’s walk through this procedure using a fictitious situation. Let us take an example of an apache access log monitored by Splunk. Due to a misconfiguration in the application logging, the events of the log file started registering customer’s credit card information as part of the URI.

Steps:

  1. Disable the data feed which sends sensitive information.
  2. Search for the events which contains the sensitive information. As you can see in the screenshot, the events have customer’s credit card information printed.

3. Use the rex command with ‘sed’ mode to mask the CC value at search time.

index="main" sourcetype="apache_access" action=purchase CC=* 
| rex field=_raw mode=sed "s/CC=(\S+)/CC=xxxx-xxxx-xxxx-xxxx/g"

The highlighted regular expression matches the credit card number and replaces it with its new masked value of ‘xxxx’.

4. Verify that the sensitive information is replaced with the characters provided in rex command.

5. Pipe the results of the search to ‘collect’ command to send the masked data to a different index with same sourcetype.

index="main" sourcetype="apache_access" action=purchase CC=* 
| rex field=_raw mode=sed "s/CC=(\S+)/CC=xxxx-xxxx-xxxx-xxxx/g" 
| collect index=masked sourcetype=apache_access

6. Verify the masked data has been properly indexed using the collect command and is now searchable.

Note: Adjust the access control settings so that the users can access the masked data in the new/different index.

7. Once all events have been moved over to the new index, we need to delete the original data from the old index by running the delete command.

As mentioned earlier, ensure that you have capabilities to run ‘delete’ command.

8. Verify that data has been deleted by searching for it, as noted in Step 2 above.

9. Remove the ‘delete_by_keyword’ capability from the user/role now that the task is completed.

What Next?

Enable Masking at Index Time

It is always ideal to configure application logging in such a way that it does not log any sensitive information. However, there are exceptions where you cannot control that behavior. Splunk provides two ways to anonymize/mask the data before indexing it. Details regarding the methods available can be found within the Splunk documentation accessible through the URL below:

https://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedata

Additionally, products such as Cribl LogStream (free up to 1TB/day ingest) provide a more complete, feature-rich, solution for masking data before indexing it in Splunk.

Audit Sensitive Data Access

Finally, if you have unintentionally indexed sensitive data before it was masked then it is always good to know if that data has been accessed during the time at which it was indexed. To audit if the data was accessed through Splunk, the following search can shed some light into just that. You can adjust the search to filter the results based on your needs by changing the filter_string text to the index, sourcetype, etc, which is associated with the sensitive data.

index=_audit action=search info=granted search=* NOT "search_id='scheduler" NOT "search='|history" NOT "user=splunk-system-user" NOT "search='typeahead" NOT "search='| metadata type=sourcetypes | search totalCount > 0" 
| search search="*filter_string*" 
| stats count by _time, user, search savedsearch_name

Looking to expedite your success with Splunk? Click here to view our Splunk Professional Service offerings.

© Discovered Intelligence Inc., 2020. Unauthorised use and/or duplication of this material without express and written permission from this site’s owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Discovered Intelligence, with appropriate and specific direction (i.e. a linked URL) to this original content.

Make Your Splunk Dashboards Smile! 😀

Recently a customer was reviewing asset information in Aura Asset Intelligence, our premium application for Splunk, and some interesting data showed up. Users had mobile devices that had emoji’s in their name of their device.

splunk dashboard

It was a bit surprising at first as it’s not what you would normally expect in a corporate IT environment, but after thinking about it, it’s perfectly normal to see – especially with companies fully adopting BYOD programs these days.

If you weren’t already aware, Splunk can handle different character sets. You can work with non-ascii characters in various different ways – including emojis! From indexing data, searches, alerts, and dashboards. Once you get into the world of non-ascii, you are dealing with Unicode. Unicode is a complex topic. There are many different concepts and terminology to keep straight. But that’s not really the point of this blog 😉 . For more information on Unicode you can start here.

It certainly gets you thinking 🤔 , where could emojis be used in Splunk to inject a bit of fun. Why not give your searches and Splunk dashboards a little ❤️ ?

To start, you can use them in searches:

index=main sourcetype=access_combined | eval alt_status = if(status==200,"👍","👎") | stats count by alt_status


You can use them in dashboards:

Response Time single-value panel:
index=main sourcetype=access_combined | stats avg(response) as avg_response | eval avg_response=round(avg_response,1) | eval avg_response = avg_response." ".if(avg_response < 30," 👍  "," 👎 ")

Errors single-value panel:
index=main sourcetype=access_combined | stats count(eval(status >= 500)) as errors count as total | eval error_rate=round((errors/total)*100,1) | eval alt_status = if(error_rate >= 3, "😕","😄")| fields alt_status

Status Codes table panel:
index=main sourcetype=access_combined | stats count by status | eval alt_status = case(status >= 500, "😠",status >=400, "😕", status >= 200, "😄", 1==1,"❓")


Or even using them in alerts (results will vary depending if the target of the alert can handle Unicode). Here’s an email example with the results embedded inline:


Maybe you can live on the wild side and even ask your developers to start using emoji’s in their logs….


Ok, that’s fun and all, but is there a practical use for emoji’s in Splunk? Sure! Why not give your dashboards some more visual eye candy when it comes to location data. You can easily create a lookup that maps Country name to their emoji flag. 

Top Country single-value panel:
index=main sourcetype="access_combined" | top limit=1 clientip | iplocation clientip | eval Country = if(Country=="", "Unknown", Country) | lookup emoji_flags name as Country OUTPUT emoji | fillnull value="❓" emoji | eval top_country= Country." ".emoji | fields top_country

Requests By Country table panel:
index=main sourcetype="access_combined" | stats count by clientip | iplocation clientip | eval Country = if(Country=="", "Unknown", Country) | stats sum(count) as total by Country | lookup emoji_flags name as Country OUTPUT emoji | fillnull value="❓" emoji | sort - total

You can download the flag to emoji lookup CSV here to use in your own searches.

The possibilities are endless! So have some fun with emojis in your dashboards, lets just hope that at no point do your dashboards or data go to 💩 …


Looking to expedite your success with Splunk? Click here to view our Splunk Professional Service offerings.

© Discovered Intelligence Inc., 2020. Unauthorised use and/or duplication of this material without express and written permission from this site’s owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Discovered Intelligence, with appropriate and specific direction (i.e. a linked URL) to this original content.

Quick Guide to Outlier Detection in Splunk

There are multiple (almost discretely infinite) methods of outlier detection. In this blog I will highlight a few common and simple methods that do not require Splunk MLTK (Machine Learning Toolkit) and discuss visuals (that require the MLTK) that will complement  presentation of outliers in any scenario.  This blog will cover the widely accepted method of using averages and standard deviation for outlier detection. The visual aspect of detecting outliers using averages and standard deviation as a basis will be elevated by comparing the timeline visual against the custom Outliers Chart and a custom Splunk’s Punchcard Visual.

Some Key Concepts

Understanding some key concepts are essentials to any Outlier Detection framework. Before we jump into Splunk SPL (Search Processing Language)  there are basic ‘Need-to-know’ Math terminologies and definitions we need to highlight:

  • Outlier Detection Definition:  Outlier detection is a method of finding events or data that are different from the norm.
  • Average: Central value in set of data.
  • Standard Deviation: Measure of spread of data. The higher the Standard Deviation the larger the difference between data points. We will use the concept of standard substantially in today’s blog. To view the manual method of standard deviation calculation click here.
  • Time Series: Data ingested in regular intervals of time. Data ingested in Splunk with a timestamp and by using the correct ‘props.conf’ can be considered “Time Series” data   

Additionally, we will leverage aggregate and statistic Splunk commands in this blog. The 4 important commands to remember are:

  • Bin:  The ‘bin’ command puts numeric values (including time) into buckets. Subsequently the ‘timechart’ and ‘chart’ function use the bin command under the hood
  • Eventstats: Generates statistics (such as avg,max etc) and adds them in a new field. It is great for generating statistics on ‘ALL’ events
  • Streamstats: Similar to ‘stats’ , streamstats calculates statistics at the time the event is seen (as the name implies). This feature is undoubtedly useful to calculate ‘Moving Average’ in additional to ordering events
  • Stats: Calculates Aggregate Statistics such as count, distinct count, sum, avg over all the data points in a particular field(s)

Data Requirements

The data used in this blog is Splunk’s open sourced “Bots 2.0” dataset from 2017. To gain access to this data please click here. Downloading this data set is not important, any sample time series data that we would like to measure for outliers is valid for the purposes of this blog. For instance, we could measure outliers in megabytes going out of a network OR # of logins in a applications using the using the same type of Splunk query. The logic used to the determine outliers is highly reusable.

Using SPL

There are four methods commonly seen methods applied in the industry for basic outlier detection. They are in the sections below:

1. Using Static Values

The first commonly used method of determining an outlier is by constructing a flat threshold line. This is achieved by creating a static value and then using logic to determine if the value is above or below the threshold. The Splunk query to create this threshold is below :

<your spl base search> … | timechart span=6h sum(mb_out) as mb_out
| eval threshold=100 
| eval isOutlier=if('mb_out' > threshold, 1, 0)
Static threshold timeline visual
Static threshold timeline visual

2. Average with Static Multiplier

In addition to using arbitrary static value another method commonly used method of determining outliers, is a multiplier of the average. We calculate this by first calculating the average of your data, following by selecting a multiplier. This creates an upper boundary for your data. The Splunk query to create this threshold is below:

<your spl base search> …  
| timechart span=12h sum(mb_out) as mb_out 
| eventstats avg("mb_out") as average 
| eval threshold=average*2 
| eval isOutlier=if('mb_out' > threshold, 1, 0)
Average + Static threshold timeline visual
Average + Static threshold timeline visual

3. Average with Standard Deviation

Similar to the previous methods, now we use a multiplier of standard deviation to calculate outliers. This will result in a fixed upper and lower boundary for the duration of the timespan selected. The Splunk query to create this threshold is below:

<your spl base search> ... | timechart span=12h sum(mb_out) as mb_out 
 | eventstats avg("mb_out") as avg stdev("mb_out") as stdev 
 | eval lowerBound=(avg-stdev*exact(2)), upperBound=(avg+stdev*exact(2))
 | eval isOutlier=if('mb_out' < lowerBound OR 'mb_out' > upperBound, 1, 0) 
2*Standard Deviation timeline visual
2*Standard Deviation timeline visual

Notice that with the addition of the lower and upper boundary lines the timeline chart becomes cluttered.

4. Moving Averages with Standard Deviation

In contrast to the previous methods, the 4th most common method seen is by calculating moving average. In short, we calculate the average of data points in groups and move in increments to calculate an average for the next group. Therefore, the resulting boundaries will be dynamic. The Splunk search to calculate this is below:

<your spl base search> ... | timechart span=12h sum(mb_out) as mb_out 
 | streamstats window=5 current=true avg("mb_out") as avg stdev("mb_out") as stdev
 | eval lowerBound=(avg-stdevexact(2)), upperBound=(avg+stdevexact(2)) 
 | eval isOutlier=if('mb_out' < lowerBound OR 'mb_out' > upperBound, 1, 0) 
Moving Average with Standard Deviation timeline chart
Moving Average with Standard Deviation timeline chart

Tips: Notice the “isOutliers” line in the timeline chart, in order to make smaller values more visible format the visual by changing the scale from linear to log format.

Using the MLTK Outlier Visualization

Splunk’s Machine Learning Toolkit (MLTK) contains many custom visualization that we can use to represent data in a meaningful way. Information on all MLTK visuals detailed in Splunk Docs. We will look specifically at the ‘Outliers Chart’. At the minimum the outlier chart requires 3 additional fields on top of your ‘_time’ & ‘field_value’. First, would need to create a binary field ‘isOutlier’ which carries the value of 1 or 0, indicating if the data point is an outlier or not. The second and third field are ‘lowerBound’ & ‘upperBound’ indicating the upper and lower thresholds of your data. Because the outliers chart trims down your data by displaying only the value of data point and your thresholds, we can conclude through use that it is clearer and easier to understand manner. As a recommendation it should be incorporated in your outliers detection analytics and visuals when available.

Continuing from the previous paragraph, take a look at the below snippets at how the impact the outliers chart is in comparison to the timeline chart. We re-created the same SPL but instead of applying timeline visual applied the ‘Outliers Chart’ in the same order:

Using and outliers chart to display outliers
Static threshold w outliers chart
Using outliers chart to display a static threshold (average * multiplier)
Average + Static threshold timeline visual
Using outliers chart to display 2*Standard Deviation outliers chart
2*Standard Deviation outliers chart
Using outliers chart for moving averages
Moving Average with Standard Deviation outliers chart
AdvantagesDisadvantages
Cleaner presentation and less clutterYou need to install Splunk MLTK (and its pre-requisites) to take advantage of the outliers chart
Easier to understand as determining the boundaries becomes intuitive vs figuring out which line is the upper or lower thresholdUnable to append additional fields in the Outliers chart

Adding Depth to your Outlier Detection

Determining the best technique of outlier detection can become a cumbersome task. Hence, having the right tools and knowledge will free up time for a Splunk Engineer to focus on other activities. Creating static thresholds over time for the past 24hrs, 7 days, 30 days may not be the best approach to finding outliers. A different way to measure outliers could be by looking at the trend on every Monday for the past month or 12 noon everyday for the past 30 days. We accomplish this by using two simple and useful eval functions:

| eval HourOfDay=strftime(_time, "%H") 
| eval DayOfWeek=strftime(_time, "%A") 

Using Eval Functions in SPL

Continuing from the previous section, we incorporate the two highlighted eval functions in our SPL to calculate the average ‘mb_out’. However, this time the average is based on the day of the week and the hour of the day. There are a handful of advantages of this method:

  • Extra depth of analysis by adding 2 additional fields you can split the data by
  • Intuitive method of understanding trends

Some use cases of using the eval functions are as follows:

  • Network activity analysis
  • User behaviour analysis
Calculate averages based on day of week and hour of day
Tables representing averages by DayOfWeek & HourOfDay

Visualizing the Data!

We will focus on two visualizations to complement our analysis when utilizing the eval functions. The first visual, discussed before, is the ‘Outliers Chart’ which is a custom visualization in Splunk MLTK. The second visual is another custom visualization ‘PunchCard’, it can be downloaded from Splunkbase here (https://splunkbase.splunk.com/app/3129/).

The outliers chart has a feature which results in a ‘swim lane’ view of a selected field/dimension and your data points while highlighting points that are outliers. To take advantage of this feature, we will use a Macro “splitby” which creates a hidden field(s) “_<Field(s) you want data to split by>”. The rest of the SPL is shown below

< your base SPL search >  ...  | eventstats avg("mb_out") as avg stdev("mb_out") as stdev  by "HourOfDay" 
| eval avg=round(avg,2) 
| eval stdev=round(stdev,2)
| eval lowerBound=(avg-stdev*exact(2)), upperBound=(avg+stdev*exact(2)) 
| eval isOutlier=if('mb_out' < lowerBound OR 'mb_out' > upperBound, 1, 0) 
| `splitby("HourOfDay")` 
| fields _time, "mb_out", lowerBound, upperBound, isOutlier, * 
| fields - _raw source kb* byt* 
| table _time "mb_out" lowerBound upperBound isOutlier *

This search results in an Outlier Chart that looks like this:

Outliers Chart split by hour of day
Outliers Chart split by hour of day

The Outliers Chart has the capability to split by multiple fields, however in our example splitting it by a single dimension “HourOfDay” is sufficient to show its usefulness.

The PunchCard visual is the second feature we will use to visualize outliers. It displays cyclical trends in our data by representing aggregated values of your data points over two dimensions or fields. In our example, I’ve calculated the sum of outliers over a month based on “DayOfWeek” as my first dimension and “HourOfDay” as my second dimension. I’ve adding the outliers of these two fields and displaying it using the PunchCart visual. The SPL and image for this visual is show below:

< your base SPL search > ... | streamstats window=10 current=true avg("mb_out") as avg stdev("mb_out") as stdev by "DayOfWeek" "HourOfDay"
| eval avg=round(avg,2)
| eval stdev=round(stdev,4)
| eval lowerBound=(avg-stdevexact(2)), upperBound=(avg+stdevexact(2))
| eval isOutlier=if('mb_out' < lowerBound OR 'mb_out' > upperBound, 1, 0)
| splitby("DayOfWeek","HourOfDay")
| stats sum(isOutlier) as mb_out by DayOfWeek HourOfDay
| table HourOfDay DayOfWeek mb_out
PunchCard Visual
PunchCard Visual

Summary and Wrap Up

Trying to find outliers using Machine Learning techniques can be a daunting task. However I hope that this blog gives an introduction on how you can accomplish that without using advanced algorithms. Consequently, using basic SPL and built-in statistic functions can result in visuals and analysis that is easier for stakeholders to understand and for the analyst to explain. So summarizing what we have learnt so far:

  1. One solution does not fit all. There are multiple methods of visualizing your analysis and exploring your result through different visual features should be encouraged
  2. Use Eval functions to calculate “DayOfWeek” and “HourOfDay” wherever and whenever possible. Adding these two functions provides a simple yet powerful tool for the analyst to explore the data with additional depth
  3. Trim or minimize the noise in your Outliers visual by using the Outliers Chart. The chart is beneficial in displaying only your boundaries and outliers in your data while shaving all other unnecessary lines
  4. Use “log” scale over “linear” scale when displaying data with extremely large ranges


Looking to expedite your success with Splunk? Click here to view our Splunk Professional Service offerings.

© Discovered Intelligence Inc., 2020. Unauthorised use and/or duplication of this material without express and written permission from this site’s owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Discovered Intelligence, with appropriate and specific direction (i.e. a linked URL) to this original content.

Harnessing Ingest-Time Eval Fields

Anyone who is familiar with writing search queries in Splunk would admit that eval is one of the most regularly used commands in their SPL toolkit. It’s up there in the league of stats, timechart, and table.

For the uninitiated, eval, just like in any other programming context, evaluates an expression and returns the result. In Splunk, especially when searching, holds the same meaning as well. It is arguably the Swiss Army knife among SPL commands as it lets you use an array of operations like mathematical, statistical, conditional, cryptographic, and text formatting operations to name a few.

Read more about eval here and eval functions here.

What is an Ingest-time Eval?

Until Splunk v7.1, the eval command was only limited to search time operations. Since the release of 7.2, eval has also been made available at index time. What this means is that all the eval functions can now be used to create fields when the data is being indexed – otherwise known as indexed fields. Indexed fields have always been around in Splunk but didn’t have the breadth of capabilities for populating them until now.

Ingest-time eval doesn’t overlap with other common index-time configurations such as data filtering and routing, but only complements it. It lets you enrich the event with fields that can be derived by applying the eval functions on existing data/fields in the event.

One key thing to note is that it doesn’t let you apply any transformation to the raw event data, like masking.

When to use Ingest-time eval

Ingest-time eval can be used in many different ways, such as:

  • Adding data enrichment such as a data center field based on a host naming convention
  • Normalizing fields such adding a field with a FQDN when the data only contains a hostname
  • Using additional fields used for filtering data before indexing
  • Performing common calculations such as adding a GB field when there is only a MB field or the length of a field with a string

Ingest-time eval can also be used with metrics. Read more here.

When not to use Ingest-time eval

Ingest-time eval, like index-time field extractions, adds a performance overhead on the indexers or heavy forwarders (whichever is handling the parsing of data based on your architecture) as they will be evaluated on all events of the specific sourcetypes you define it for. Since the new fields are going to be permanently added to the data as they are indexed, the increase in disk space utilization needs to be accounted for as well. Also there is no reverting these new fields as these are indexed/persisted in the index. To remove the data, the ingest-time eval configurations would need to be disabled/deleted and letting the affected data age out.

When using Ingest-time eval also consider the following:

  • Validate if the requirement is something that can be met by having an eval function at search time – usually this should be yes!
  • Always use a new field name that’s not part of the event data. There should be no conflict with the field name that Splunk automatically extracts with the `KV_MODE=auto` extraction.
  • Always ensure you are applying eval on _raw data unless you have some index time field extraction that’s configured ahead of it in the transforms.conf.

Always ensure that your indexers or heavy forwarders have adequately hardware provisioned to handle the extra load. If they are already performing at full throttle, adding an extra step of processing might be that final straw. Evaluate and upgrade your indexing tier specs first if needed.

Now, lets see it in action!

Here is an Example…

Lets assume for a brief moment you are working in Hollywood, with the tiny exception that you don’t get to have coffee with the stars but just work with their “PCI data”. Here’s a sample of the data we are working with. It’s a sample of purchase details that some of my favorite stars made overseas (Disclaimer: The PCI data is fake in case you get any ideas 😉):

2019-12-09 23:46:44,283 - name=Tom Hardy, amount=2620.08063223, currency=USD, dest_country=Tanzania, cc=8888192373782645, cvc=151
2019-12-09 23:46:45,284 - name=Ryan Reynolds, amount=4229.66241228, currency=USD, dest_country=Canada, cc=9999047123456789, cvc=101
2019-12-09 23:46:48,288 - name=Frances McDormund, amount=6033.83328530, currency=USD, dest_country=Budapest, cc=9999513562353615, cvc=856
2019-12-09 23:47:11,320 - name=Daniel Day-Lewis, amount=5603.00466255, currency=USD, dest_country=Iceland, cc=9999463984323578, cvc=029
2019-12-09 23:47:21,333 - name=Clint Eastwood, amount=8321.50139290, currency=USD, dest_country=Sri Lanka, cc=8888847290573791, cvc=347
2019-12-09 23:47:22,335 - name=Tom Hardy, amount=3773.86328145, currency=USD, dest_country=Tanzania, cc=8888192373782645, cvc=151
2019-12-09 23:47:23,336 - name=Jeff Goldblum, amount=9475.63602049, currency=USD, dest_country=Sri Lanka, cc=8888485176493782, cvc=730

Now we are going to create some ingest-time fields:

  1. Making the name to all upper case (just for the sake of it)
  2. Rounding off the amount to two decimal places
  3. Applying a bank field based on the starting four digit of the card number
  4. Applying md5 hashing on the card number
  5. Applying a mask to the card number

First things first, lets set up our props.conf for the data with all the recommended attributes defined. What really matters in our case here is the TRANSFORMS attribute.

[finlog]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)
TRUNCATE=10000
TIME_FORMAT=%Y-%m-%d %H:%M:%S,%f
MAX_TIMESTAMP_LOOKAHEAD=25
TIME_PREFIX=^
TRANSFORMS = fineval1, fldext1, fineval2 # order of values for transforms matter

Now let’s define how the transforms.conf should look like. This essentially is the place where we define all our eval expressions. Each expression is comma separated.

[fineval1]
INGEST_EVAL= uname=upper(replace(_raw, ".+name=([\w\s'-]+),\stime.*","\1")), purchase_amount=round(tonumber(replace(_raw, ".+amount=([\d\.]+),\scurrency.*","\1")),2)
# notice how in each case we have to operate on _raw as name and amount fields are not index-time extracted.

[fldext1]
REGEX = .+cc=(\d{15,16})
FORMAT = cc::"$1"
WRITE_META = true

[fineval2]
# INGEST_EVAL= cc=md5(replace(_raw, ".+cc=(\d{15,16})","\1"))
# have commented above as we need not apply the eval to the _raw data. fldext1 here does index time field extraction so we can apply directly on the extracted field as below...
INGEST_EVAL= cc1=md5(cc), bank=case(substr(cc,0,4)=="9999","BNC",substr(cc,0,4)=="8888","XBS",1=1,"Others"), cc2=replace(cc, "(\d{4})\d{11,12}","\1xxxxxxxxxxxx")

All the above settings should be deployed to the indexer tier or heavy forwarders if that’s where the data is originating from.

A couple things to note – you can define your ingest-time eval in separate stanzas if you choose to define them separately in the props.conf. Below is a use case for that. Here I have defined an index time field extraction to extract the value of card number. Then in a separate stanza, I used another ingest-time eval stanza to process on that extracted field. This is a good use case of reusability of regex (instead of applying it on _raw repeatedly) in case you need to do more than one operations on specific set of fields.

Now we need to do a little extra work that’s not common with a search time transforms setting. We have to add all the new fields created above to fields.conf with the attribute INDEXED=true denoting these are index time fields. This should be done in the Search Head tier.

[cc1]
INDEXED=true

[cc2]
INDEXED=true

[uname]
INDEXED=true

[purchase_amount]
INDEXED=true

[bank]
INDEXED=true

The result looks like this:

One important note about implementing Ingest-time eval configurations, is that they require manual edits to .conf files as there is no Splunk web option for it. If you are a Splunk Cloud customer, you will need to work with Splunk support to deploy them to the correct locations depending on your architecture.

OK so that’s a quick overview of Ingest-time eval. Hope you now have a pretty fair understanding of how to use them.

Looking to expedite your success with Splunk? Click here to view our Splunk Professional Service offerings.

© Discovered Intelligence Inc., 2020. Unauthorised use and/or duplication of this material without express and written permission from this site’s owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Discovered Intelligence, with appropriate and specific direction (i.e. a linked URL) to this original content.