Yes, that is a valid concern. As for how to mitigate it, you have several (not mutually exclusive) options:
One thing that you've already implemented is not doing slow key derivation on every request, but rather having an authentication endpoint that takes a passphrase (or equivalent), applies a slow key-stretching KDF, verifies the correctness of the result and returns a (typically time-limited) token that can be used for fast authentication of subsequent requests.
What such a token should contain depends on your backend implementation. If you can easily and securely store session data on the backend, probably the easiest solution is to simply generate a (cryptographically) random token of, say, 128 or 256 bits and return it to the client. You can then store any sensitive information needed for backend processing — possibly including the master pseudorandom key output by the KDF, or one or more subkeys derived from it — in the backend session storage keyed by the random token.
If you want your backend to be stateless, things get more complicated. One option, if you can arrange your backend to have access to a secret encryption key, is to use something like a JWE token https://datatracker.ietf.org/doc/html/rfc7518#section-4.5 with the secret key (using an authenticated encryption algorithm — but fortunately all the https://datatracker.ietf.org/doc/html/rfc7518#section-5.1 supported by JWE are authenticated!) and containing any information the backend needs for fast authentication. Depending on what you're doing in the backend, that might include one or more keys derived from the KDF output, but for applications that don't need to do any per-user encryption or decryption on the server, even just the ID of the authenticated account may be sufficient.
Now, obviously, just restricting slow key derivation to a single endpoint won't prevent that endpoint from being DoSed. But it does reduce the server load from key derivation in normal usage, and it also paves the way for further DoS countermeasures, such as:
Rate limit your authentication endpoint. With appropriate rate limits in place, a DoS attack on your authentication endpoint should only be able to deny access to that endpoint, but not interfere with clients that have already authenticated. While that's still not ideal, it's a significant improvement, especially if you allow your clients to establish fairly long-lived sessions (say, 1 day).
For some simple types of DoS attacks, finer-grained rate limiting based on e.g. source IP address can be even more effective than just a single global rate limit. Yes, distributed attacks e.g. via botnets can circumvent such rate limiting, but the reason why server-side KDFs can be tempting DoS targets in the first place is because they can allow easy DoS without the massive bandwidth of a botnet. If your attacker has a botnet, they probably don't need to target your KDF.
If you cannot or don't want to implement explicit rate limiting, a "soft" alternative can be to run your authentication endpoint on a separate server, or at least in a separate resource-limited container. This also prevents a DoS attack on the authentication code from taking down the rest of the service. Of course, for this to work, the authentication server and the rest of the endpoints need to share access to the same session storage, and/or to the same token encryption keys.
As noted in https://security.stackexchange.com/a/257285 , you could also require the client to submit a proof of work as part of the authentication request. Essentially, this forces the client to spend as much effort on the request as the server will spend on computing the KDF, or at least some reasonable fraction thereof, thus eliminating or reducing the attacker's leverage. However, I would not actually recommend this approach, since if you can do this, there's an even better alternative:
IMO the absolute best way to avoid DoS via slow KDFs is to offload the slow key derivation to the client. Basically, instead of having the client send a passphrase to the server, which then uses a slow KDF (such as PBKDF2 or Argon2, etc.) to derive a pseudorandom master key from it, just have the client run the slow KDF and send its output as part of the request.
This does require somehow ensuring that the client knows which salt, iteration count and other KDF parameters it needs to use. Probably the easiest way to handle this is simply to have the client request these parameters from the server in a separate request. For most parameters this is no problem (although the client should definitely at least enforce a minimum iteration count!), but the salt does require some extra consideration:
If you don't want your pre-authentication endpoint to disclose which user IDs exist on your system, you'll have to generate fake salts for nonexistent usernames, e.g. by hashing the username together with a server-side secret. (Of course, preventing the leakage of user IDs is not always either desirable or practical anyway.)
In any case, you'll leak the user's salt, which means that an attacker may observer any changes to it. If you follow the standard procedure of changing the salt whenever the user changes their passphrase, this can allow an attacker to both confirm that a user ID exists (assuming your fake salts don't change) and that the user has (or has not) changed their passphrase since the attacker's last query. In general, this leak seems more or less unavoidable, except by using a fixed salt for each user (which has its own issues).
You may also want to have the client augment the salt sent by the server e.g. by appending the user ID and possibly some server- or application-specific string to it. This is to prevent a MiTM attacker from tricking the client into using a salt and KDF parameters belonging to the same user on another service, which could be an issue if the user used the same passphrase for both services.
Also, you'll probably still want to run the KDF output sent by the client through a second KDF on the server — but this second KDF can be a fast KBKDF such as https://en.wikipedia.org/wiki/HKDF ( https://datatracker.ietf.org/doc/html/rfc5869 ). Depending on your application, this may not be strictly required, but it doesn't hurt and can have various advantages. In particular:
it allows you to derive multiple subkeys and/or check values of any desired length from the KDF output, without the client needing to be aware of this;
if you're using (part of) the KDF output for user authentication, by comparing it with a "password hash" string stored in your user database, having a server-side KDF step prevents an attacker who compromises your database from using the stored hash directly to authenticate;
it protects you against potential attacks using malformed input by ensuring that, whatever the client sends you, it goes through HKDF before it touches any other cryptographic code on your server.