How can I log from my python application to splunk, if I use celery as my task scheduler?

I have a python script running on a server, that should get executed once a day by the celery scheduler. I want to send my logs directly from the script to splunk. I am trying to use this splunk_handler library. If I run the splunk_handler without celery locally, it seems to work. But if I run it together with celery there seem to be no logs that reach the splunk_handler. Console-Log:

[SplunkHandler DEBUG] Timer thread executed but no payload was available to send

How do I set up the loggers correctly, so that all the logs go to the splunk_handler?

Apparently, celery sets up its own loggers and overwrites the root-logger from python. I tried several things, including connecting the setup_logging signal from celery to prevent it to overwrite the loggers or setting up the logger in this signal.

import logging
import os

from splunk_handler import SplunkHandler

This is how I set up the logger at the beginning of the file

logger = logging.getLogger(__name__)
splunk_handler = SplunkHandler(
host=os.getenv('SPLUNK_HTTP_COLLECTOR_URL'),
port=os.getenv('SPLUNK_HTTP_COLLECTOR_PORT'),
token=os.getenv('SPLUNK_TOKEN'),
index=os.getenv('SPLUNK_INDEX'),
debug=True)

splunk_handler.setFormatter(logging.BASIC_FORMAT)
splunk_handler.setLevel(os.getenv('LOGGING_LEVEL', 'DEBUG'))
logger.addHandler(splunk_handler)

Celery initialisation (not sure, if worker_hijack_root_logger needs to be set to False...)

app = Celery('name_of_the_application', broker=CELERY_BROKER_URL)
app.conf.timezone = 'Europe/Berlin'
app.conf.update({
    'worker_hijack_root_logger': False,
})

Here I connect to the setup_logging signal from celery

@setup_logging.connect()
def config_loggers(*args, **kwags):
    pass
    # logger = logging.getLogger(__name__)
    # splunk_handler = SplunkHandler(
    #     host=os.getenv('SPLUNK_HTTP_COLLECTOR_URL'),
    #     port=os.getenv('SPLUNK_HTTP_COLLECTOR_PORT'),
    #     token=os.getenv('SPLUNK_TOKEN'),
    #     index=os.getenv('SPLUNK_INDEX'),
    #     debug=True)
    #
    # splunk_handler.setFormatter(logging.BASIC_FORMAT)
    # splunk_handler.setLevel(os.getenv('LOGGING_LEVEL', 'DEBUG'))
    # logger.addHandler(splunk_handler)

Log statement

logger.info("ARBITRARY LOG MESSAGE")

When activating debug on splunk handler (set to True), the splunk handler logs out that there is no payload available as already posted above. Does anybody have an idea what's wrong with my code?

Asked By: sidi7
||

Answer #1:

After hours of figuring out what eventually could be wrong with my code, I now have a result that satisfies me. First I created a file loggingsetup.py where I configured my python loggers with dictConfig:

LOGGING = {
    'version': 1,
    'disable_existing_loggers': True,
    'formatters': { # Sets up the format of the logging output
        'simple': {
            'format': '%(asctime)s - %(name)s - %(levelname)s - %(message)s',
             'datefmt': '%y %b %d, %H:%M:%S',
            },
        },
    'filters': {
        'filterForSplunk': { # custom loggingFilter, to not have Logs logged to Splunk that have the word celery in the name
            '()': 'loggingsetup.RemoveCeleryLogs', # class on top of this file
            'logsToSkip': 'celery' # word that it is filtered for
        },
    },
    'handlers': {
        'splunk': { # handler for splunk, level Warning. to not have many logs sent to splunk
            'level': 'WARNING',
            'class': 'splunk_logging_handler.SplunkLoggingHandler',
            'url': os.getenv('SPLUNK_HTTP_COLLECTOR_URL'),
            'splunk_key': os.getenv('SPLUNK_TOKEN'),
            'splunk_index': os.getenv('SPLUNK_INDEX'),
            'formatter': 'simple',
            'filters': ['filterForSplunk']
        },
        'console': { 
            'level': 'DEBUG',
            'class': 'logging.StreamHandler',
            'stream': 'ext://sys.stdout',
            'formatter': 'simple',
        },
    },
    'loggers': { # the logger, root is used
        '': {
            'handlers': ['console', 'splunk'],
            'level': 'DEBUG',
            'propagate': 'False', # does not give logs to other logers
        }
    }
}

For the logging filter, I had to create a class that inherits from the logging.Filter class. The class also relies in file loggingsetup.py

class RemoveCeleryLogs(logging.Filter): # custom class to filter for celery logs (to not send them to Splunk)
    def __init__(self, logsToSkip=None):
        self.logsToSkip = logsToSkip

    def filter(self, record):
        if self.logsToSkip == None:
            allow = True
        else:
            allow = self.logsToSkip not in record.name
        return allow

After that, you can configer the loggers like this:

logging.config.dictConfig(loggingsetup.LOGGING)
logger = logging.getLogger('')

And because celery redirected it's logs and logs were doubled, I had to update the app.conf:

app.conf.update({
    'worker_hijack_root_logger': False, # so celery does not set up its loggers
    'worker_redirect_stdouts': False, # so celery does not redirect its logs
})

The next Problem I was facing was, that my chosen Splunk_Logging library mixed something up with the url. So I had to create my own splunk_handler class that inherits from the logging.Handler class. The important lines here are the following:

auth_header = {'Authorization': 'Splunk {0}'.format(self.splunk_key)}
json_message = {"index": str(self.splunk_index), "event": data}
r = requests.post(self.url, headers=auth_header, json=json_message)

I hope that I can help someone with this answer who is facing similar problems with python, splunk and celery logging! :)

Answered By: sidi7
The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .



# More Articles