Rethinking Biases: Concatenation and String Builder

Everybody knows that string builder classes are more efficient than concatenation, right? Statements like this are passed between generations of developers, quickly becoming common wisdom. Like all things in technology, rapid language evolution can render dated information irrelevant. So, in the context of Apex development, does this piece of common wisdom hold up? I was recently tasked with a project that required assembling a massive amount of string data into a large JSON payload — it presented the perfect opportunity to put this claim to the test. The answer? Well, it depends, but it was not what I expected. To tightly control as many variables as possible, I created a short code snippet to concatenate two identical strings of a precise size using both techniques. As anticipated, the string builder technique is faster while utilizing few CPU resources with large strings; however, basic concatenation wins in both speed and efficiency when used for smaller tasks.

5,000 Iterations 50,000 Iterations
Concatenation 241 ms 4715 ms
String Builder 445 ms 4614 ms

In fact, concatenation maintains a speed advantage up to a surprisingly high number of iterations. The chart below shows that concatenation holds the lead until just after 55,000 iterations! So, what's the verdict? Basic concatenation is faster under most circumstances. Only extremely large strings benefit from the string builder technique.

Interpretability in Machine Learning

Since OpenAI released its large language model (LLM) chatbot, ChatGPT, machine learning, and artificial intelligence have entered mainstream discourse. The reaction has been a mix of skepticism, trepidation, and panic as the public comes to terms with how this technology will shape our future. Many fail to realize that machine learning already shapes the present, and many developers have been grappling with introducing this technology into products and services for years. Machine learning models are used to make increasingly important decisions – from aiding physicians in diagnosing serious health issues to making financial decisions for customers.

How it Works

I strongly dislike the term "artificial intelligence" because what the phrase describes is a mirage. There is no complex thought process at work – the model doesn't even understand the information it is processing. In a nutshell, OpenAI's model powering ChatGPT calculates the statistically most probable next word given the immediately surrounding context based on the enormous amount of information developers used to train the model.

A Model?

Let's say we compiled an accurate dataset containing the time it takes for an object to fall from specific heights:

Height Time
100 m 4.51 sec
200 m 6.39 sec
300 m 7.82 sec
400 m 9.03 sec
500 m 10.10 sec

What if we need to determine the time it takes for that object to fall from a distance we don't have data for? We build a model representing our data and either interpolate or extrapolate to find the answer:

{\displaystyle \ t=\ {\sqrt {\frac

Models for more complex calculations are often created with neural networks, mathematical systems that learn skills by analyzing vast amounts of data. A vast collection of nodes evaluate a specific function and pass the result to the next node. Simple neural networks can be expressed as mathematical functions, but as the number of variables and nodes increase, the model can become opaque to human comprehension.

The Interpretability Problem

Unfortunately, opening many complex models and providing a precise mathematical explanation for the decision is impossible. In other words, models often lack human interpretability and accountability. We often can't say, mathematically speaking, exactly how the network makes the distinction it does; we only know that its decisions align with those of a human. It doesn't require a keen imagination to see how this presents a problem in regulated, high-stakes decision-making.

Let's say John visits a lender and applies for a $37,000 small business loan. The lender needs to determine the probability that John will default on the loan, so they feed John's information into an algorithm, which computes a low score causing a denial. By law, the lender must provide John with a statement of the specific reasons for the denial. In this scenario, what do we tell John? Today, we can reverse engineer the model and provide a detailed answer, but even simple models of tomorrow will quickly test the limits of human understanding as computing resources become more powerful and less expensive. So how do we design accountable, transparent systems in the face of exponentially growing complexity?

Solutions?

Proponents of interpretable models suggest limiting the number of variables used in a model. The problem with this approach becomes apparent after considering how neural networks weigh variables. Models multiply results by coefficients that determine the relative importance of each variable or calculation before passing them to the next node. These coefficients and variables are often between 20 and 50 decimal places long, containing positive and negative numbers. While understanding the data underpinning a decision is essential, more is needed to truly elucidate a clear explanation. We can partially solve this problem by building tooling to abstract implementation details and provide a more intelligible overview of the model; however, this still only provides an approximation of the decision-making process.

Other thought leaders in machine learning argue that the most viable long-term solutions may not involve futile attempts to explain the model but should instead focus on auditing and regulating performance. Do large volumes of test data reveal statistical trends of bias? Does analyzing the training data show any gaps or irregularities that could result in harm? Unfortunately, this does not solve the issue in my hypothetical scenario above. I can't conclusively prove that my current decision was correct by pointing to past performance.

Technology is simply moving too rapidly to rely on regulations, which are, at best, a lagging remedy. We must pre-emptively work to build explainability into our models, but doing this in an understandable and actionable way will require rethinking our current AI architectures. We need forward-looking solutions that address bias at every stage of the development lifecycle with strong internal governance. Existing systems should undergo regular audits to ensure small changes haven't caused disparate impacts.

I can't help but feel very lucky to live in this transformative sliver of time, from the birth of the personal computer to the beginning of the internet age and the machine learning revolution. Today's developers and system architects have a massive responsibility to consider the impact of the technology they create. The future adoption of AI heavily depends on the trust we build in our systems today.

Increase Efficiency with Platform Cache

Platform Cache is a memory layer that stores your application's session and environment data for later access. Applications run faster because they store reusable data instead of retrieving it whenever needed. Note that Platform Cache is visible and mutable by default and should never be used as a database replacement. Developers should use cache only for static data that is either frequently needed or computationally expensive to acquire. Let's explore the use of cache in a simple Apex class.

In the example above, we acquire objects in the environment to create a schema. The Schema.getGlobalDescribe() function returns a map of all sObject names (keys) to sObject tokens (values) for the standard and custom objects defined in the environment in which we're executing the code. Unfortunately, we're not caching the data, which makes this an expensive process. This code consumes 1,307 ms of CPU time with a heap size of 80,000 bytes. Let's improve this code by using a cache partition.

This code performs the same operation but caches the result. In line 5, we're instantiating a cache partition. We're running the same function to build our schema map; however, line 15 instructs the program to place the results in the cache for later use. Our processing requirements diminished significantly, consuming only 20 ms of CPU time.

Despite the breathtaking advances in processing power, developers should always ensure they are writing efficient code that possesses a minimal processing footprint and scales with increased volume.

Further Reading

Salesforce Developer Guide - Platform Cache

Ivory

On January 12, 2023, Twitter revoked third-party access to their API without warning. As a developer, this is one of the most repulsive actions I've seen a social media company take in recent memory, and that's saying something. Deprecating an API is a process typically measured in months or years, giving developers time to create alternative solutions and shift their business model. Instead, these independent development shops found their apps (and primary sources of revenue) destroyed overnight. This situation is particularly egregious considering how integral a role these third-party apps have played in the improvement of the platform. Features we now take for granted, such as pull to refresh, were created by these developers—they were integral to improving the user experience (UX) of the platform. For many users, these apps were the face of Twitter.

Luckily, Tapbots, the small two-person developer team responsible for the iconic Twitter app named Tweetbot, had been working on a Mastodon app called Ivory. The unexpected destruction of Tweetbot accelerated the development of Ivory, and Tapbots decided to release an early access version of the app.

The rise of a Phoenix an Elephant from the Ashes

The pricing model of Ivory has proven to be divisive. Many prospective customers feel it is inappropriate to charge for what is essentially an incomplete app. Others find the $1.99 per month/$14.99 per year subscription fee excessive. While I typically feel annoyed at the prospect of paying for an unfinished product, I'm more than willing to give Tapbots a pass, given the circumstances. Software subscriptions are a reality of the current market. Customers are unwilling to pay large sums of money for software, and most app developers incur an ongoing cost per customer due to cloud sync features and data storage. It is economically unsustainable for a company to provide a lifetime of updates and support for no additional cost.

If I had to distill the interface of Ivory down to one word, it would be sophisticated. The minimalist interface possesses wonderful clarity, and the iconography lends a unique flair to the app. It follows the Apple Human Interface Guidelines while still feeling unique. With Ivory, simple doesn't mean primitive, and less doesn't feel sparse. The app displays remarkable attention to detail. The trumpet icon is movable, and flicking it toward a corner causes it to bounce into place. Tapping buttons activate subtle animations that make the app feel alive. A separate dark theme explicitly designed to take advantage of the pure blacks offered by OLED screens provides eye-popping contrast. Ivory performs as well as it looks. It has some of the smoothest scrolling animations I've seen on a third-party app and remains responsive when quickly navigating through the interface.

 
Ivory's beautiful iconography

Ivory’s beautiful iconography

 

As good as Ivory is, it could be better. There are several features I missed over the course of using the app. The ability to edit my posts and quote those of others would greatly augment the app's utility. Ivory's ability to sync your place in the timeline over iCloud is welcome, but the ability to sync other customizations, such as filters, would be welcome. Luckily, most of the features I want are on the official roadmap. Given that the developers at Tapbots have a long history of delivering flawlessly executed features, I have no doubt Ivory has a very bright future ahead.