Sunday, 19 September 2010

Python Logging Functionality - Facts vs. Myths

tl; dr: Some inaccuracies and misunderstandings about how stdlib logging works have been expressed in the documentation and marketing presentations of a suggested alternative. On closer examination, certain statements which imply a lack of functionality or other shortcomings in stdlib logging have been shown to be inaccurate. You should feel confident that in using the stdlib logging package you are very unlikely to find it wanting, and that, if it seems too hard or not possible to achieve some result that you want to achieve, you should raise the issue on comp.lang.python or bugs.python.org and be assured of prompt responses and resolutions. Now, you can read on if you want more details :-).

Recently Armin Ronacher, developer of Werkzeug and Jinja among other things, released a library for logging which, in his opinion, is preferable to the logging package provided as part of Python since the 2.3 release. Now preferences are a very personal thing, and since Armin has made worthy software contributions to the Python community, there will no doubt be many people who trust his judgement and follow where he leads. In general, choice in software is good, and people should be free to make up their own minds about the software they want to use.   However, in order to make an informed decision, people need accurate information on which to base that decision. For example, I chose to use argparse over the stdlib's optparse even before argparse landed in the stdlib; in making that choice, I looked at Steven Bethard's rationale as described here and was fortunate to have the time to be able to examine and evaluate each of his points for myself.

In the case of choosing whether to use Python logging or something else, people may or may not have the time to investigate in any depth the relative merits of the alternatives on offer. Armin has seemingly made it easier for busy people by including, in his library's documentation, reasons why you should use his offering in preference to the stdlib's logging package, and what he perceives to be problems with the code in the stdlib. This is what Steven did for argparse, but in the case of logging Armin has made a number of statements which are inaccurate or misleading. No doubt he has not meant to deliberately mislead anyone, but some of the things he has said have led me to conclude that perhaps Armin doesn't really understand some of the aspects of how stdlib logging works, and how you can use it in Web applications, desktop applications, casual utility scripts, and of course libraries which support all of these application development constituencies. And if Armin - a clearly talented developer - is making some egregious mistakes in his thinking about how stdlib logging works and what you can or can't do with it, then it's possible that the lack of understanding which he demonstrates is more widespread; as maintainer of the stdlib logging package, I feel there's a need to do something to rectify this. Hence, this post.

Isn't that what the stdlib documentation is for? Of course, at least partly, and while the stdlib logging documentation undergoes continual additions and revisions (which hopefully can be viewed as continuous improvement), it is a difficult medium in which to address certain points. Of necessity, the stdlib documentation for most modules is presented largely in a reference style, with only limited examples which might serve as tutorials. And even for sections where much effort has been expended in presenting using a tutorial style, there is always room for expositions which do not fit easily in the stdlib documentation: for example, a "cookbook" style presentation which tries to show how to solve common and less common problems which arise in considering how best to use a package. And then, who reads documentation, anyway? ;-)

Of course no software is perfect, nor can any design or implementation hope to please all of its users all of the time. I have always viewed stdlib logging as capable of being improved, not because there was anything especially wrong with it to start with, but rather that through the philosophy of continuous improvement you have the best chance of providing something which remains relevant and useful to as many users as possible. That, it seems to me, is a core Python philosophy, enshrined in the procedures, standards and practices of the PEP process.

You might be thinking that this post is a bit of an over-reaction on my part. Perhaps, but as well as the concern I expressed about there being a lack of understanding out there about how logging works, I have another concern. This is based on a strong view that for certain types of functionality, too much choice is not a good thing. For example, imagine if there were several competing implementations of regular expressions in Python, or implementations of pickling functionality, in common use. Some library developers would use one particular version, others a different one. So an application developer using several such libraries would not only have to pull in additional dependencies but also spend some time and thought on how to get the different libraries to work together. And so it is with logging, in my view; it's an infrastructure function and a lot of libraries out there already use it. Of course if it fell short of meeting developer requirements, then it should not be immune from being supplanted by a better alternative (as with optparse and argparse). However, I believe that stdlib logging still remains fit for purpose, and am willing to stand up and say so when the criticisms made of it are without merit or overstated.

So, let's review some of the statements in Armin's documentation and presentations which merit closer examination. It's a long list, hence the tl;dr at the top of this post, but there are even more points that I could have made if not for the lack of time :-(

Statements What they imply ("Myths") How it really is ("Facts") Conclusions
In the opening paragraph in the "Why you should use Logbook" section
Despite the existence of stdlib logging, you should use Logbook. Logbook is better than stdlib logging now, and is ready to be used. As stated in the same paragraph, Logbook is in the alpha stage of development and should be treated as a developer preview, and only supports Python 2.6 and 2.7. Logbook is at present suitable for evaluation use only.
In the "Advantages over Logging" section
If properly configured, Logbook’s logging calls will be very cheap and provide a great performance improvement over the standard library’s logging module. While we are not there yet, there will be some performance improvements in the upcoming versions when we implement certain critical code paths in C. stdlib logging is slow. You can evaluate these test scripts yourself: time_logging.py and time_logbook.py. In the output of these scripts, you need to look at the "log_simple" line on each for a meaningful comparison. It's not a scientific benchmark, but on my machine, with these scripts, the Logbook code runs around 30% slower than the stdlib code. Logbook is at present a fair bit slower than stdlib logging, so it is somewhat premature to boast about "great performance improvements".
While it's possible for Logbook to improve performance by writing performance-critical code in C, it's also possible for stdlib code to improve using the same approach. (Note, however, that building C extensions of third-party libraries from source under Windows can be a pain).
Anyway, according to the numbers shown by these test scripts, logging calls take of the order of tens of microseconds - not a real concern in most scenarios.
If you are having performance problems which you think are due to logging overhead, the recommendation is to profile and see where the bottlenecks really are. If they are found to be in logging, please post your findings on comp.lang.python or bugs.python.org if you think that's appropriate.
Logbook also supports the ability to inject additional information for all logging calls happening in a specific thread. For example, this makes it possible for a web application to add request-specific information to each log record such as remote address, request URL, HTTP method and more. stdlib logging can't do this. I have already explained to Armin how to do this - see this post. stdlib logging CAN do this fairly easily.
The Logbook system is (besides the stack) stateless and unit testing it is very simple. If context managers are used, it is impossible to corrupt the stack, so each test can easily hook in custom log handlers. There's something inherently good about a shared stack, but having a shared dict is somehow worse. Unit testing with stdlib logging is not simple. Unit tests with stdlib logging can't hook in custom handlers. There's no convincing argument that the type of shared state (execution context) offered by Logbook is better than how stdlib works. You just have to take it on trust. Unit testing with stdlib logging is not hard, since you can add/remove handlers to/from loggers, close handlers in tearDown which you opened in setUp, and disable loggers you don't need. Unit tests with stdlib logging CAN hook in any handlers you want; in fact, Python's own regression tests use unittest to do testing of the logging package itself. From the evidence presented, there's no proof of any actual functional advantage of Logbook over logging. You just have to take it on trust.
In the "Design Principles" section
Logbook [...] supports paradigms we think are more suitable for modern applications than the traditional Java inspired logging system that can also be found in the Python standard library and many more programming languages. Having a design that was inspired by a Java library is somehow bad, just because of Java. Tha fact that it appears in many more programming languages damns it even further. Although the Java log4j system was an influence on stdlib logging, this is because it had some good abstract ideas, well worth using. Though we have a lot of very clever people in the Python community, not every clever person is a Pythonista; it's very parochial to think that good ideas can only come from the Python community. The ideas that emerged in log4j and other packages are the ideas of "what happened? where did it happen? how important is it? who wants to know?" and if you think about it, these ideas are hardly Java-specific. In fact, they are pretty central to the problem domain addressed by logging. So - "what happened?" is the details of the logging call, "where did it happen?" is the logger name, "how important is it?" is the level, and "who wants to know?" is the handler. Hardly that complicated, and pretty much a minimum requirement for any logging package that aspires to the name.
Also, anyone who bothers to look at log4j in detail will see that Python logging is not a mindless translation - it's very Pythonic. Beyond the basic abstractions of "What? Where? How Important? Who Needs To Know?", there's no real correspondence between the Java artefacts and the Python ones. Using David A. Wheeler's SLOCCount, log4j 1.2.15 = 168 source files, around 16K SLOC; Python 2.6 logging = 3 source files, < 1.5K SLOC. Almost every class in stdlib logging maps to a core concept of logging, other than the handler classes, which are geared towards specific types of audience for logging messages.
There's no indication that a Java influence has been detrimental to stdlib logging. All of the core concepts from stdlib logging are also present in Logbook: loggers ("where?"), levels ("how important?"), handlers ("who wants to know?") and the details passed to logging calls ("what happened?"). So if stdlib logging "suffers" from Java influences, so equally does Logbook.
It's also worth bearing in mind that Python logging works in Python 2.3 -> 2.7 and 3.0 -> 3.2, without the need to depend on specific features present only in the more recent versions of Python. And while a case can be made for certain programming styles like the with statement to have better support in stdlib logging, these are by no means essential to the basic business of logging; in any case support for new features can always be added as and when it's needed.
Logbook is unique in that it has the concept of logging channels but that it does not keep a global registry of them. In the standard library’s logging module a logger is attached to a tree of loggers that are stored in the logging module itself as global state. The shared state embodied in Logbook's process or thread context is somehow good, but the shared tree of loggers stored as "global" state in stdlib logging is somehow bad. One reason for the shared state in stdlib logging is that application developers often need explicit access to loggers used by libraries, in order to control the level of verbosity of logging from those libraries, in the context of the application being developed. This verbosity varies over the life of an application - for example, when a bug is discovered in an application in production, it's common to selectively turn the verbosity of different library loggers up and down, while diagnosing the problem. You can't do that when the logger used by a library is hidden completely from a user of that library.
This is a fundamental point which may not bite developers working in limited scenarios, but I've had reason to rely on this functionality time and again. For example: I sometimes need to turn the verbosity of SQLAlchemy's logging so that I can examine SQL statements being generated from the ORM. If I couldn't access SQLAlchemy's top-level logger, using the name "sqlalchemy" and the fact that because of the "global" registry, the SQLAlchemy code and my code are both referring to the same object when they say logging.getLogger('sqlalchemy')), I couldn't turn this verbosity up and down, as and when needed.
Strictly speaking, the tree of loggers in stdlib logging does not need to be global. As a look at logging's source code will show, the tree of loggers is held as an instance variable of an instance of a Manager class. A binding to this Manager instance is currently stored in the Logger class, and through this binding you can say that the tree of loggers is global. This state of affairs has been the situation since the first release of logging in the stdlib.
It would be well within the bounds of possibility to either expand the Manager class to be a context manager, or derive a context manager class from it. If this was done, then the tree of loggers need not be global: it would just be an attribute of the Manager instance, and there could be more than one Manager (or derived context manager) instance in a Python process. This improved context manager may well appear in the future, but at present there has not been a strong enough case made for the need for doing this. If such a case is made (for example, by someone generating a PEP or proto-PEP making the justification) then providing such a feature can be considerered, and it would not be a major undertaking (but not a trivial one, either).
There is nothing wrong with logging's design just because it uses a shared registry of loggers. In fact, the shared registry confers some positive, practical benefits in real-life use cases.
In logbook a logger is just an opaque object [...] the lifetime and availability of that object is controlled by the person creating that logger. The registry is necessary for the logging library to give the user the ability to configure these loggers. It's better for a library developer to keep their loggers completely encapsulated in the library, so application developers using that library have no access to it. The need to access the loggers in libraries comes up again and again in practice, particularly in production environments and long-running service processes where you can't just fire up a debugger, when a problem arises, to see what's causing it. With stdlib logging, features exist to turn the verbosity of library logging up and down, for different libraries at different times, without needing to stop the long-running processes. The stated advantage of Logbook over stdlib logging seems illusory. The logger registry actually solves practical problems in real-world environments where multiple libraries from multiple vendors are integrated into an application. It's not clear, from Logbook's documentation, whether you can tune the verbosity of logging in a library component which utilised Logbook for its logging functionality.
Logbook has a completely different concept of dispatching from loggers to the actual handlers which removes the requirement and usefulness of such a registry. The advantage of the logbook system is that it’s a cheap operation to create a logger and that a logger can easily be garbage collected to remove all traces of it. There's no utility in a shared registry between different components of an application. Creating a logger in stdlib logging is expensive. It's important to be able to remove all traces of a logger that you created. The preceding discussions try to show why a shared registry is useful. There's no example given in the Logbook documentation of how you might (as an application developer) tune the logging verbosity of a component you use. It's not clear that this is even possible in Logbook.
Creating loggers in the stdlib is not particularly expensive, and is a one-off operation; so amortized over many uses, the creation should actually be cheaper in the stdlib.
It's true that loggers can't be garbage collected to remove all traces of it, but not clear why that is really important, as the memory used by loggers is not significant in normal usage.
stdlib logging is designed the way it is for good reasons, not because its designer lacks imagination. The stated disadvantages of stdlib logging over Logbook are not proven, just stated as if they were.
Logbook moves the burden of delivering a log record from the log channel’s attached log to an independent entity that looks at the context of the execution to figure out where to deliver it. The loggers in stdlib logging are suffering from some kind of burden when delivering log records. In both stdlib logging and Logbook, records are delivered using handlers. Loggers become a way for an application developer to finely control the logging verbosity of their application, so that they are in control rather than library developers who, of necessity, cannot anticipate every circumstance about how their libraries are used and deployed. The context of execution is insufficient to determine how records generated in libraries should be delivered. In practice, you need a combination of the execution context and the wishes of the application developer regarding how library logging is to be configured.
In the "Context sensitive handler stack" section
Let’s take a GUI application [which] might fail or log messages. The typical default behaviour here would be to log into a logfile. Fair enough, that’s how these applications work. But what’s the point in logging if not even a single warning happened? The traditional solution with the logging library from Python is to set the level high (like ERROR or WARNING) and log into a file. When things break, you have a look at the file and hope it contains enough information. When you are in full control [...] with a stack based system like Logbook has, there is a lot more you can do. For example you could immediately after your application boots up instanciate a FingersCrossedHandler. This handler buffers all log records in memory and does not emit them at all. What’s the point? That handler activates when a certain threshold is reached. For example, when the first warning occurs you can write the buffered messages as well as the warning that just happened into a logfile and continue logging from that point. Because there is no point in logging when you will never look at that file anyways. This kind of functionality is only possible with a context sensitive handler stack, which only Logbook has. You can't do this with stdlib logging. You can do exactly this with stdlib logging, where from the earliest release there's been a MemoryHandler class which buffers up records in memory until a message with a specified threshold level is seen, when all the messages buffered so far are then forwarded to a target handler for processing. The supposed advantage of a context-sensitive handler stack in this scenario isn't any particular advantage at all.
But that alone is not the killer feature of a stack. In a GUI application there is the point where we are still initializing the windowing system. So a file is the best place to log messages. But once we have the GUI initialized, it would be very helpful to show error messages to a user in a console window or a dialog. So what we can do is to initialize at that point a new handler that logs into a dialog. When then a long running tasks in the GUI starts we can move that into a separate thread and intercept all the log calls for that thread into a separate window until the task succeeded. This is somehow a killer feature of a stack, which you can't do any other way. You can do this kind of processing perfectly easily with stdlib logging. There are examples of dialog- or window-based handlers (for Qt, say) in answers to questions on Stack Overflow. You can combine the provided stdlib logging handlers such as MemoryHandler and FileHandler together with your own window-specific handlers (dependent on which windowing system you are using) to achieve the sort of effect described. Another supposed advantage of a context sensitive handler stack, which isn't.
In Armin's Logbook presentation
In the "Why not logging?" slide
stdlib logging is not really suited for Web applications. Don't even bother trying to use stdlib logging with Web applications, it's not worth it. It's perfectly possible to use stdlib logging with Web application, as I showed Armin how to meet his requirements, in this post. This statement is just plain wrong and illustrates a lack of understanding of how to use the functionality of stdlib logging, rather than any lack of functionality in stdlib logging.
Logging's central registry of loggers makes unittesting a pain. It's too much trouble to use stdlib logging in your unittests. It doesn't need to be a pain. A future post on this blog will describe scenarios showing how to use logging in your unit tests.
It's probably possible to provide additional utility functions in stdlib logging to make some aspects of unit testing easier, but there are no real showstoppers; if Armin describes his problem scenario in as much detail as he described his Web application configuration problem, I will (hopefully) be able to suggest some ways of easing his pain.
Unless specific details are given as to why unit testing with stdlib logging is considered to be a pain (i.e. what you'd like to do but what you can't do) then this statement must be taken with a pinch of salt.
The same registry also causes issues when libraries want to start logging The registry is central to the issues caused by libraries doing logging There are documented things that library authors need to do when setting up logging for libraries. If library developers do not follow the recommendations then this might in fact cause problems, but that has nothing to do with the existence of a central registry (or at least, the connection has not been demonstrated). There's no evidence that the existence of a central registry is somehow responsible for how third-party libraries misuse logging.
You can't delete loggers once created. It's important that you be able to delete loggers once created. The reason why loggers are not deleted once created is that multiple threads (other than the one that wants to delete the logger) can have references to the logger. This is facilitated by having a central registry, but even in a system without a central registry, loggers could be passed between threads. In that scenario, you can't actually guarantee to delete the logger since there could be references to it in another thread.
It's not demonstrated that it's important to expunge all loggers completely from memory. The number of logger instances is related to the number of logical "areas" in an application - the idea of "where" in a component a particular event occurred. This is not so large that the amount of memory used by loggers becomes significant. And stdlib logging allows loggers to be disabled, which means that they cease to perform any logging (until re-enabled) and are as good as gone.
It's true that you can't delete loggers, but they can be easily disabled; they do not, under normal usage patterns, occupy large amounts of memory.
In the "Why does nobody like logging?" slide
Why does nobody like logging? Nobody likes logging. This is an opinion, not a fact. There are many people who have expressed their thanks to me for the existence of the logging package and feedback saying that it has been very useful to them.
Even Armin has told me that he liked logging "a lot", which is hard to reconcile with what he's now saying - since logging's basic design has been the same from the outset, and it seems to meet his functional needs even if it doesn't press his aesthetic hot buttons.
There's no basis in fact for this statement. It's marketing FUD.
The default configuration of stdlib logging for applications is bad. The statement speaks for itself. On the contrary, the default configuration for applications is consistent with the principle of least surprise and the Zen of Python's "explicit is better than implicit". In common with the Unix philosophy, software should not be excessively verbose except when it needs to be, or is asked to be. By default, stdlib logging will only output messages of severity WARNING or higher. (It's easy to change this default.)
In fact, in the default configuration of logbook 0.2, every DEBUG logging call in every library which uses Logbook in your application will print debug messages to stderr when your application runs, even if you specify that you want to see only messages of severity WARNING or greater. Perhaps you won't care if your application is a GUI application or a Web application, since you won't have a console, and perhaps logbook is expressly designed only for these types of application. For a console application, it's very likely you will get Too Much Information. Here is an example.
The default configuration of stdlib logging is quite sane; there's nothing wrong with it.
Useless default configuration for libraries The statement speaks for itself. Library developers are supposed to add a NullHandler to their top-level logger and to the level and propagation flag as needed, but to not add any other handlers to their logging configuration because that configuration is the prerogative of the application developer. However, it's not possible to prevent library authors from breaking these rules. By default, libraries aren't supposed to handle messages - only to log them - and the application developer is expected to configure any logging (though they don't have to; if they don't, no logging output should be generated). This is consistent with the principle of least surprise and Zen of Python point mentioned earlier.
Who sets up the logging config? Libraries sometimes call basicConfig. If library developers don't follow the guidelines set down for logging from libraries, that's somehow the fault of stdlib logging. There is documentation about how to configure logging in libraries, and while this may be lacking in some way, it's not immediately clear what that lack is. Perhaps more or better documentation is needed, but that won't eliminate completely the possibility that a library developer will misguidedly add handlers to the logging configuration, say by calling basicConfig().
logging.warn() and friends are not thread-safe. Stay away from stdlib logging if you want to use threads. If logging is correctly configured in an application, there should be no untoward behaviour by stdlib logging. In versions of Python earlier than 2.7.1 and 3.2, the code in basicConfig() omitted acquiring and releasing an internal lock, and you could demonstrate incorrect behaviour in stdlib logging by calling basicConfig() directly or indirectly from multiple threads; this has been fixed in 2.7.1 and 3.2, but nevertheless, it is wrong to do any logging from multiple threads before configuring handlers (if you want any logging output, that is) - because if no handlers have been configured before threads start running, any messages they log could be lost (because there are no handlers to deliver them).
So, good practice would mean that this problem shouldn't occur in practice, and even if it does, the worst consequence is that handlers are added multiple times, resulting in multiple messages output for the same event.
It's disingenuous to say that these functions are not thread-safe. Although the statement is technically correct, the unexpected behaviour only occurs if logging is not configured before multiple threads are started, and more than one of these threads calls basicConfig() either directly or via logging.warn() and friends.
Even then, the behaviour occurs only rarely, due to thread timing uncertainties.
In any case, the incorrect behaviour will not occur in Python versions >= 2.7.1 or >= 3.2, because the requisite lock acquisition/release code has now been added.

22 comments:

  1. It's true that the library docs are written in a reference style, but you could add a logging HOWTO in:
    http://docs.python.org/dev/howto/index.html

    As for the "thread-safety" problem, I think it would be good if logging had a "strict" behaviour which prevented implicit calls to basicConfig(), and instead raised an error. It's a common problem that library loggers don't get the right settings, because they get created before you call basicConfig().

    ReplyDelete
  2. You seem to agree with Armin on some issues, yet it appears you are trying to cover up some valid criticisms with an absurd amount of words. It would be better if you just got to the point.

    ReplyDelete
  3. Currently in one project, I instantiate a logging.Logger object, and manually add to it a handler. I allow application users to define the logging level, and so forth, but not via the configuration knobs exposed by "python-logging-the-application", but via my own application's knobs. This is most appropriate for this particular application.

    I'm happy to be able to do this, but it's harder than it needs to be. Your package just "does too much" here; it is a lot more an application framework than it is a small library that just has some useful stuff in it that developers can reach in and use, due to the omnipresence of user-defined-configuration.

    The idea that each logger has a "manager", and that each manager is configured by the user of an application indirectly via getLogger is just plain inappropriate for some environments. As the developer, on some projects, *I* want to be in charge of that integration point.

    The "well-you-cant-please-everyone" argument is often lost on the folks for whom the pattern really *is* inappropriate, so you can't really blame them much for seeking other options. As you've said before, you're constrained by backwards compatibility and you can't make many design changes. If I were you, I'd either ignore current criticisms related to configuration (because there's nothing you can do about it, you're constrained by bw compat) or truly try to address them (by not even trying to do configuration) in another logging-derived system.

    ReplyDelete
  4. @pitrou: You are right, Antoine, that a HOWTO could be added, but in the case of logging I think there are a fair number of "how to get started" sections in the docs already. For example,

    http://docs.python.org/library/logging.html#logging-tutorial

    If anyone feels this is not adequate, they are welcome to suggest specific areas where improvements are needed, and/or what those improvements are to be, and I'll certainly try to improve the docs in that area.

    I don't know if the stdlib docs are the best place for putting in detailed recipes for a lot of application scenarios in a "cookbook" style. Typically, for example, this is done outside the Python docs, e.g. Alex Martelli's Python Cookbook, Dave Beazley's Python reference etc. I'm not even sure whether experienced developers like Armin even read the HOWTOs.

    As for thread safety of basicConfig, the missing lock code has been added to basicConfig, and the docs have been updated to warn about the consequences of calling basicConfig from threads in particular ways. I don't know how you can be more strict than this, but if you have a more specific suggestion, I'm listening.

    The design, as in most designs, is a trade-off between various concerns. The reason why logging.warning() etc. call basicConfig() is to make using logging as easy as possible for simple uses, e.g. in utility scripts. There is no need for logging.warning() to be called from an application which actually cares about setting up a real, non-trivial logging configuration: it's equivalent to basicConfig() + getLogger().warning(). Apart from in these convenience functions for simple use, there are no implicit calls to basicConfig().

    If you had a "strict" flag *just* to decide whether logging.warning() etc. call basicConfig(), that seems a bit like overkill, as the documentation should be sufficient warning for most people. And it's a non-issue for 2.7.1 and 3.2 anyway.

    Re. library loggers getting the right settings, there's only three things they need to do, two of which are optional:

    1. Add a NullHandler to their top-level logger.
    2. If they want to force their users to add handlers explicitly to their loggers, set their top-level logger's propagate flag to False.
    3. If they want to have a specific verbosity on their loggers, which is different from the default (WARNING), they need to set that level on the top-level logger, or on individual loggers for which they want that specific, non-default verbosity.

    Of course if library authors don't follow guidelines then there's only so much you can do; While Armin keeps saying that he's found "tons" of misuses of logging in libraries, he has never actually named even one single library which does this.

    It doesn't matter that library loggers get created before basicConfig() or more elaborate configuration gets done in the application; if they follow the steps above, there should be no surprises.

    ReplyDelete
  5. @cbm: I did get to the point right at the top of the post: that's what the tl; dr paragraph is for. However, you can't rebut certain criticisms without getting into details. If you don't want the details, you're welcome to stick to the tl; dr.

    I didn't say that NONE of Armin's criticisms were worth looking at; it's not a knee-jerk on my part. However, the main criticisms (such as unsuitability for Web applications) are baseless.

    ReplyDelete
  6. @chrism: If you make a vague criticism such as "it's harder than it needs to be", it's hard to get a handle on it in terms of "making it easier" - you could be referring to a number of possible paint points for you. When Armin gave me a precise problem statement about configuring logging for Web applications, I gave him a very specific answer which solved his problem in a way he acknowledged was quite neat.

    I don't get the "logging does too much here" either: please give more details. Logging is certainly meant to support logging from applications as well as libraries, and aims to let the application developer control how logging is done for each of the libraries used. You don't necessarily need to press every button provided; are you saying that too much happens when some particular buttons are pressed? Which buttons, and what's too much?

    There's no omnipresence of user-defined-configuration if you mean end users of an application: the arbiter of logging configuration is the application developer, who decides when developing the application how much control to cede to end users, or application support teams, or system administrators.

    Regarding managers, I think you misread my post. Perhaps there's some ambiguity about terminology; by "user" I usually mean "user of logging" which is a library or application developer. I don't mean the "user" is the end user of an application, though it's not uncommon for the developer of a command-line script to allow logging level setting via a command-line parameter. That's at their discretion, though, and not mandated by stdlib logging in any way.

    Library developers who use logging get to decide the "what happened?", "where did it happen?", "how important is it?" parts, but not the "who wants to know" because that will be application-specific.

    Application developers also get the decide the three things above, plus the "who wants to know?" part when they configure handlers for the application.

    End users only get the control that the application developers allow them to have.

    I state "you can't please everyone" in terms of their aesthetic preferences. I can make some design changes but backward compatibility can be constraining; however, logging has had steady improvements over the time it has been available, so it's by no means moribund.

    If you have configuration difficulties with logging, please give me more specific information about them, as Armin did. I will try to provide advice which might help. You can post on comp.lang.python, or email me on my_name_with_underscores@yahoo.co.uk.

    To all of you, thanks for the feedback.

    ReplyDelete
  7. @vinay: I already have actually... http://plope.com/Members/chrism/logging_blues .. but really it doesn't matter. You explain there that "the process" guided the decisions I complained there about, and suggest that changes would be hard due to bw compat. So I'm not asking for any changes, just trying to explain why I don't mind people trying to create alternatives.

    ReplyDelete
  8. @chrism: I see. Actually, the post you link to is a whole set of aesthetic "Never, ever ..." and "Should not ...", and not a specific problem statement about something you can't do with logging, or about how you get some unexpected results when adhering to the documented way of using it.

    To that, I can certainly reiterate "you can't please everyone" because your points are aesthetic rather than to do with function.

    In my reply to your post, I stated that I thought your "Never, ever ..." stance is too dogmatic for my taste, and I stand by that. And while I mentioned backward compatibility as a possible constraint, it has yet to be seen whether it is really an issue for any specific problem of function you might be facing.

    If logging just offends your aesthetic sensibilities, I can't do much to help you. If you are facing some specific practical problem relating to logging, I might be able to help. Which is it?

    BTW I don't mind people trying to create alternatives either. But they should be promoted without misrepresenting the capabilities of stdlib logging or stating opinions as if they were fact. If the alternative truly is worthy, don't you agree that it should be possible to promote it with honesty and integrity, rather than resorting to cheap shots?

    ReplyDelete
  9. @vinay I find your tl;dr misleading with what is in the actual text of the article. And nowhere does it mention that you find some of Armin's criticisms (in particular) accurate. Again, you just seem to be covering up some of the point's Armin has made with long drawn out sentences that never actually answer the question. It's frankly, very exhausting trying to decipher the technical truths from your ego.

    ReplyDelete
  10. @cbm: I'm sorry you find it misleading - it's not meant to be. Perhaps you didn't understand that the purpose of the post is to highlight mistakes in the statements made in Logbook documentation and presentation materials. In order to do that properly, I'm afraid I've had to descend into details, the reading of which may well be exhausting for you. Anyway, it appears you have read the article now; feel free to not re-read it. If you don't understand the points in it I'm not sure what I can do to help you reach that understanding, as obviously I tried to be as clear as I could be when I first posted it.

    When you say "seem to be covering up", please be a bit more specific. If you find the whole subject matter confusing or difficult to grasp then I can see how you might reach that conclusion. Which points am I covering up exactly? How is the cover-up being done? Did I make some mistakes in my analysis?

    You will see that when discussing for example (a) performance and (b) suitability for Web applications, I have provided links to actual working scripts which demonstrate the truth of the assertions I made, which is more than you can say for the Logbook documentation and materials.

    Frankly, I don't understand your statement about ego. Obviously, as maintainer of the stdlib logging package, I'm defending it against misinformation about it, and perhaps that's why you bring ego into it.

    I don't know what your values are, but I believe in standing up for my work, and that's what I'm doing.

    ReplyDelete
  11. Logging was one stdlib package that I avoided for years because it seemed to be big and scary and complicated, compared to most other standard library modules (e.g. csv). It could very well be that the mere mention of "logging was inspired by the Java log4j library" invokes a knee-jerk reaction that reinforces the big/scary/complicated impression.

    Now that I've finally learned enough, I appreciate and like it. Seeing you proactively respond to logging-related questions and complaints in various places around the Internet made me appreciate it even more, and I would like to add my own thanks to the ones you've already received.

    A question, though: why put the burden on all the libraries to copy-and-paste the definition of NullHandler (assuming they want to support Pythons older than 2.7) and to add it to their logger? Even the short version isn't very aesthetically pleasing to me:

    import logging
    # sacrifice a dead chicken to the logging gods to prevent spurious "No handlers configured" warnings
    logging.getLogger(__name__).addHandler(logging.NullHandler())

    Why can't the root logger have a NullHandler installed by default, if the desired standard configuration is "nothing gets logged unless you ask for something"?

    ReplyDelete
  12. @vinay, I've been following your discussion with Armin on Reddit, so I understand the purpose of your post.

    In your tl;dr section you claim Armin's documentation is "inaccurate" and that using stdlib logging one would be "very unlikely to find wanting". This is quite absurd considering the 4800 words you follow it up with.

    Example 1:
    In your conclusion to Armin's presentation that "logging.warn() and friends are not thread-safe" you suggest it is disingenuous to say these are not thread-safe, yet it's technically correct to state that they are? You can't have it both ways. You even mention that this issue is fixed in recent releases. Yet the majority of Python users use 2.4-2.6, so we still struggle with this.

    Example 2:
    You mention that Logbook's documentation states that in stdlib logging you can't delete loggers once they are created. Your conclusion is that yes this is true, but you can disable them. Why even bring up this point? It's a feature that is lacking in stdlib logging, it makes you look silly.

    Lets see if blogspot eats this post...

    ReplyDelete
  13. @vinay, I have struggled with logging in the past, and I really do not prefer it's global state. I find it is easy to shoot yourself in the foot mixing logging in different libraries and frankly, I try to avoid using it, since it's not what I would call intuitive. Perhaps it is to others, and that's fine, but I for one feel logging should be dead simple to use.

    I think Armin's statement that people don't like logging is accurate, this is the reason you don't see as many people using it. I would prefer it used everywhere, and I am sure you would also. So lets drop the ego, don't take someone else writing competing code as an attack, but instead look at it as perhaps a way to learn something about what it is you are doing. Even if there are no faults, there should always be something you can learn.

    Finally I'd like to mention that I am a fan of Armin's work. He releases high quality code and libraries and the time in which he was able to produce a viable alternative to logging is commendable. I wish you two would see this as an opportunity to improve Python rather than the verbal war it's turned into.

    ReplyDelete
  14. @Marius: Perhaps I should never have mentioned the Java connection, and maybe no-one would have noticed :-) log4j and Python logging are very different in their implementations, after all. However, I was only giving credit where it was due, and it would have been a point of familiarity for anyone fleeing the choppy waters of Java for the decidedly more pleasant haven of Python.

    I haven't added a NullHandler to the root logger because with an older release of Python, you couldn't rely on NullHandler being in the stdlib, and so there could be no reliance by a library of an instance of it attached to the root logger.

    It would also mean making a change to basicConfig(), because that bails when it sees any handler attached to the root logger - so the previously working code

    import logging; logging.warning('Watch out')

    would stop working unless basicConfig() made a more specialised check specifically for a NullHandler.

    As for that line ... it may not be pretty, but at least it's just one line.

    ReplyDelete
  15. @cbm: The 4800 (or whatever) words are proportional to the discussion of the points which Armin made which I felt were inaccurate or misleading, so I don't see why you think it's absurd that I should go into detail about his criticisms which I thought were inaccurate.

    To take Example 1 - about thread-safety, I explained in the post that the problem occurs, rarely, only if you do something that you are not supposed to do: call basicConfig() from multiple concurrent threads without having configured logging first. The reason I put the fix into 2.7.1 and 3.2 is that should anyone *incorrectly* call basicConfig() from multiple threads, they will not get a surprise. But even if you don't understand this point, Armin does, and that's why I said it was disingenuous. In that other Reddit thread that I presume you were following, in this comment:

    http://www.reddit.com/r/Python/comments/ddkal/django_vs_web2py_what_do_you_use_and_why/c0znsmb

    Armin acknowledged that he "Might not have been 100% honest, because it's only happening in rare circumstances and can be countered, true nonetheless" and "I will not bring up that threading issue as an argument against logging".

    Notice that I said he was being disingenuous, not that he was being dishonest. There is a distinction, and in his comment he has acknowledged it. If you don't understand the distinction between "the letter" and "the spirit" of the law, then we probably don't need to discuss this point further as we are unlikely to arrive at a common understanding.

    Furthermore, I don't understand how you will "struggle" with the thread-safety issue in Python 2.4 - 2.6, unless you insist on calling basicConfig() from multiple threads without having configured logging before starting the threads, and even though you now know, having followed the discussion, that you shouldn't be doing it that way.

    Taking Example 2 - I acknowledged that you can't delete loggers once created, but that it's not so important a thing that you have to stop using stdlib logging because of it. I explained that in threaded environments where there are multiple references to a logger (or any other object) then you can't really be sure that you are deleting it, since CPython uses reference counting to determine object lifetimes. So there is no guaranteed way in general of removing a logger, but you will definitely not want it to log any more (since you would like to delete it) so the next best thing is offered: disable the logger so that it stops logging, and it's as good as gone. I made the further point that you don't need to fret that the loggers haven't been expunged from memory, because they are unlikely to take up so much memory that you would notice, unless you are doing something wrong.

    I hope that explains how I can say that Armin's point here was true but that it was not important in real-world scenarios. I'm not sure what's silly about that, but perhaps some nuance of my point eluded you, or I didn't explain it clearly enough for you. I hope that this response comment has done the trick now.

    ReplyDelete
  16. @cbm: I'm sorry that you've struggled with logging in the past, but I'm not convinced that it's because of global state. That's because I don't yet see how that prevented you from doing some practical, specific thing relating to logging - but if you can explain the scenario in some more detail, maybe I'll get it and be able to give you a better response. When you found you shot yourself in the foot mixing logging in different libraries, what sources of help did you seek? People have that feeling all the time on comp.lang.python, and there's a friendly crowd there who try to help solve the sort of problems people come across (not just to do with logging, of course). I regularly monitor comp.lang.python for posts which contain the text "logging", using Google Groups.

    As I've no idea what you expect in terms of simplicity/ease of use, you'd have to tell me. If you're really saying that the tutotial examples at

    http://docs.python.org/library/logging.html#simple-examples

    are not simple enough, then I'm not sure I know what you mean by "dead simple". Please compare and contrast the simplest logbook script and the simplest logging script side by side so I can understand why you think logging is hard to use. Or, explain with an intermediate scenario in more detail so I can see what you're driving at. When Armin gave me a well-defined problem statement, I was able to give him an appropriate solution, and I like to think I could do the same for you.

    If the statement had been that "some people don't like logging", I would have no quarrel with that. But his statement implies that "nobody likes logging", and even if he didn't mean it literally, he meant something close to it. If that is among a group of like-minded friends or colleagues, I can see how that would work - but I can't see it extrapolating to the Python community in general. As I mentioned, I have had positive feedback in the past (some evidence for which you can see on the comments here and on Reddit) and so I have to disagree with your, and Armin's, view. And as to not seeing AS many people using it, I suppose that's fair in that if it pleased more people aesthetically, it would have more people using it. Unfortunately, as Marius pointed out, many people have knee-jerk reactions to things and so don't give things a fair shake. That's a shame, but I can't do much if people's objections are based not on rationality and practicality but on aesthetics.

    Perhaps you missed the bit in my original post when I said: "I have always viewed stdlib logging as capable of being improved [...]" I am not sure that statement demonstrates any particular amount of ego; certainly not as much as Armin referring to his own work as "awesome", see logbook's setup.py ;-) Of course he would say it was done in jest, and perhaps it was.

    Perhaps you've missed something - it wasn't the writing of competitive code I was objecting to, rather it was the use of inaccurate or misleading statements about stdlib logging. That is why this post has "Facts" vs. "Myths" in the title.

    I understand that you are a fan of Armin's work - I am too, and I am particularly grateful for his contributions of Werkzeug and Jinja. However, I am not a fanboi (and not saying that you are, either). While I appreciate that he is very clever, I do not automatically take whatever he says on trust just because he is clever - after all, clever people can be wrong sometimes.

    Because of his cleverness and his excellent contributions, he is - rightly - highly regarded in the Python community. In my view, that also places on his shoulders a certain duty to use the influence that he enjoys in a responsible way. A lot of people will take what he says on trust, because of his status - so he needs to say things about other people's work after exercising due care and consideration. I don't believe he has done so in this case.

    ReplyDelete
  17. @Vinay

    I love your defense, especially when pointing out inaccuracies. When I saw logbook marketing, I found it compelling, so am glad that you pointed out the noise.

    As a sidenote, I noticed that the initial author of log4j has developed logback, considering it an improvement. What are your thoughts on that?

    ReplyDelete
  18. @tshepang: One problem with the original log4j design was that it was not (Java) interface based, which meant that new implementations couldn't be swapped in for old ones. This caused no end of problems with integrating multiple logging implementations - log4j, java.util.logging, commons-logging. So a new approach was tried with slf4j, which uses Java interfaces so that user code never has to worry about which implementation is in use. A number of jar files are now provided which implement the slf4j interface over multiple back-ends, one of which is logback. You just pick one of the slf4j implementations + the slf4j jar containing the interfaces, and you're set.

    Saying that, I haven't really used logback on any real projects, nor done an in-depth evaluation, because most of my customers are already committed to earlier implementations. (I also do less in Java than I used to, and more in Python - yay!) Logback is supposed to be faster and adds things which are already in Python logging, such as Filters. The log4j implementation was very class heavy (e.g. instead of using integer levels, it used instances of a Level class) so it wouldn't be surprising if the logback developers managed to improve on log4j's performance, as they claim to have done.

    That's my relatively uninformed 2 cents :-)

    ReplyDelete
  19. Thank you for taking the time to write all this. I comes off as a reasonable and comprehensive rebuttal of insinuations that logging is unusable. I am now positively inclined to use logging in future projects (I have previously used ad-hoc logging)

    ReplyDelete
  20. @Vinay

    Thanks a lot for the log4j/logback info, which was by no means "relatively uninformed" :-)

    ReplyDelete
  21. Vinay, I have a strange issue: one script running in crontab
    "steals" a logger instance from another script running at the same time.
    Logger names are unique to every script run.
    How can this be explaned?
    more detailes are here http://stackoverflow.com/questions/3744751/python-script-in-crontab-gets-input-arguments-of-another-process-running-at-the-s
    thank you

    ReplyDelete
  22. @yb: See my answer to this on Stack Overflow.

    ReplyDelete