by Ole Breulmann, 01/21/2025

This second issue of my newsletter / blog is a bit nerdy. It covers a thinking framework that I use every day in my life for many aspects of life. Everybody applies this framework subconsciously and automatically but I found that if applied more consciously it is much easier to master.

This framework is not only at the core of human thinking. It is also the foundation of machine learning. I find that applying terminology and abstraction from machine learning to this human framework is really helpful in becoming more aware of it and in applying it more effectively.

You can apply this to building companies, investing in companies, cooking, baking, building products, skiing, skateboarding, interior design, writing songs and planning holidays. It does not matter. All we do in life is searching in spaces of opportunity, trying to maximize some kind of reward function.

The reward functions are defined by us, by clients, by society, etc., and they implicitly define the goal we try to achieve:

The food should taste good
The investment should return a 1000% yield in 5 years
The skateboard ollie should be at least 40cm high
The living room should be less cluttered and more comfy
The song should make me happy

Spaces and rewards

When talking about spaces of opportunity in life and connecting them to machine learning abstractions you can think of a machine learning model as an entity that tries to adapt its action (the calculation) by changing parameters (the weights) so that it maximizes a reward (technically it minimizes the error in form of a cost function, but that is the same as maximizing a reward). This is exactly what we humans do - all the time.

To make this more tangible let’s look at two real life optimization problems: Building an AI chat that users like and learning to ollie high with a skateboard. Two very different tasks it seems. By using machine learning abstraction we can show the very similar nature of these two tasks, identify how they differ and which aspects of the optimization process need higher attention to tackle task-specific challenges.

*The darker the blue of the dots, the higher the reward function value. The optimum has a cyan border.*

For both problems we are looking at two parameters for our action. For the action of "AI chat response” we use (1) the length of the chat response in characters as parameter dimension x1, and (2) the relative amount of domain terms used in the chat response as parameter dimension x2. For the action of “trying to do an ollie” we use (1) the backfoot pressure on the tail as parameter dimension x1, and (2) the forefoot drag on the nose as parameter dimension x2. Both actions have a lot more parameters, but for the sake of simplicity in this context, we only pick two.

In the above diagram the two parameter spaces are visualized. The blue dots represent parameter combinations. The reward value is represented by the color intensity (strong blue = high reward, weak blue = low reward). In both cases you want to find the optimum parameter combination with the highest reward (dark blue dot with cyan border).

This video shows the two ollie parameters in action.

Brute force sampling

Theoretically, you can sample the parameter space densely by trying hundreds of different combinations of x1 and x2. This would give you a really good understanding of the reward function distribution in that space and you could pick the parameter combination with the highest reward. With a high sampling density, this would probably be very close to the global optimum—the very best that is possible.

Unfortunately in most cases this kind of sampling is not feasible, for many different reasons. In the AI chat case we have to do user experiments in order to measure how much they like the responses. The more parameter combinations we want to try the more user experiments we have to do. User experiments are expensive and slow.

In the ollie case we have to perform an attempt at the ollie to sample the height. This is very time-consuming. Another major reason why in this case parameter space sampling is not feasible is that we need to be able to recreate the parameter combination. If you randomly try different things with your body without analyzing what is going on, you will have a very hard time recreating the optimal parameter combination after two hours of jumping around.

Gradients

So what humans and machine learning algorithms do is this: Start at some position in parameter space, get the reward (user feedback; ollie height) and then analyze in which direction you should change your parameters. Should the AI chat responses be shorter? Should they contain less domain terms? Should the backfoot pressure be higher? Should the forefoot drag be weaker? You can try to answer these questions by looking at the data (interview protocols and chat interaction data; iPhone videos of you doing an ollie). This way you approximate the so-called gradient of the reward in respect to your parameter space. The gradient shows into the direction of higher reward. Next time you try you change your parameters a little bit into that direction. You change how the AI chat responses are and you change how you do the ollie. That is gradient ascent, which is the mirrored variant of gradient descent, the standard learning algorithm in ML.

*Gradient ascent (or descent) is the ML algorithm and it is also how we humans optimize.*

Learning Rate

In both cases, the amount of change you apply between attempts—known as the “learning rate”—is crucial. However, each case calls for a different approach to step size.

In the AI chat case, because you need to distinguish meaningful differences in user feedback, your parameter changes should not be too small. If two parameter sets are almost identical, it’s harder to measure which one users prefer. By making larger adjustments, you can gather clearer data and validate your gradient approximation more effectively.

In contrast, in the skateboard ollie case the step size is typically very small. When learning a physical skill, you rely on tactile feedback and don’t immediately know how the “optimal” movement should feel. So you experiment with small tweaks—foot position, timing, pressure—over many attempts. This incremental process explains why new skateboarders need countless tries to land a trick. You often see them fail repeatedly, but each attempt refines their technique bit by bit until they approach the optimum.

Reward Function Estimation

How you estimate the reward function is also crucial to effective optimization. In the skateboard ollie scenario, using slow-motion iPhone videos can significantly improve reward measurement. By recording each attempt from a consistent angle and environment, you can more accurately compare ollie heights or identify technique changes that lead to better performance. In the AI chatbot scenario, refining the reward function might involve combining user feedback (e.g., ratings, satisfaction surveys) with objective usage metrics (e.g., conversation length, re-engagement rates). By triangulating multiple data sources, you can get a clearer sense of how well the chatbot is meeting user needs, ultimately leading to more informed parameter adjustments and better outcomes.

How to apply this to your life

Once you are aware of the general nature of all optimization processes it is easier to optimize the optimization process itself. To be more effective with it. And to be faster with it.

So next time you find yourself trying to improve something or learn something in life, consciously ask yourself, if the parameter space you are searching in is the ideal one, if your reward definition and measurement could be improved, if your step size is too small or too big, if your gradient approximation is good enough. Any of these hyper-parameters can make a huge difference in how well you can optimize whatever you care about. If you do this you are not just learning or optimizing something, you are learning how to learn better, you optimize your optimization. This is a key step on the path to mastery.

Best, Ole

Hyper-parameter tuning is the key to learning effectively

Apply machine learning to your life