toad.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
Mastodon server operated by David Troy, a tech pioneer and investigative journalist addressing threats to democracy. Thoughtful participation and discussion welcome.

Administered by:

Server stats:

227
active users

#swad

0 posts0 participants0 posts today

Trying to further improve #swad, and as I'm still unhappy with the amount of memory needed ....

Well, this little and special custom #allocator (only dealing with equally sized objects on a single thread and actively preventing #fragmentation) reduces the resident set in my tests by 5 to 10 MiB, compared to "pooling" with linked lists in whatever #FreeBSD's #jemalloc serves me. At least something. 🙈

github.com/Zirias/poser/blob/m

The resident set now stabilizes at 79MiB after many runs of my somewhat heavy jmeter test simulating 1000 distinct clients.

GitHubposer/src/lib/core/objectpool.c at master · Zirias/poserPOsix SERvices framework for C. Contribute to Zirias/poser development by creating an account on GitHub.
Continued thread

Related question, only for people who *have* some need for either authentication or proof-of-work added to their #nginx:

Would you consider #swad if there was a pre-built #package (or port) for your OS? IOW, is building and installing it manually from source an issue?

Please help me spread the link to #swad 😎

github.com/Zirias/swad

I really need some users by now, for those two reasons:

* I'm at a point where I fully covered my own needs (the reasons I started coding this), and getting some users is the only way to learn about what other people might need
* The complexity "exploded" after supporting so many OS-specific APIs (like #kqueue, #epoll, #eventfd, #signalfd, #timerfd, #eventports) and several #lockfree implementations based on #atomics while still providing fallbacks for everything that *should* work on any #POSIX systems ... I'm definitely unable at this point to think of every possible edge case and test it. If there are #bugs left (which is somewhat likely), I really need people reporting these to me

Thanks! 🙃

GitHubGitHub - Zirias/swad: Simple Web Authentication DaemonSimple Web Authentication Daemon. Contribute to Zirias/swad development by creating an account on GitHub.

Just released: #swad 0.12 🥂

swad is the "Simple Web Authentication Daemon". It basically offers adding form + #cookie #authentication to your reverse proxy (designed for and tested with #nginx "auth_request"). I created it mainly to defend against #malicious_bots, so among other credential checker modules for "real" logins, it offers a proof-of-work mechanism for guest logins doing the same #crypto #challenge known from #Anubis.

swad is written in pure #C with minimal dependencies (#zlib, #OpenSSL or compatible, and optionally #PAM), and designed to work on any #POSIX system. It compiles to a small binary (200 - 300 kiB depending on compiler and target platform).

This release brings (among a few bugfixes) improvements to make swad fit for "heavy load" scenarios: There's a new option to balance the load across multiple service worker threads, so all cores can be fully utilized if necessary, and it now keeps lots of transient objects in pools for reuse, which helps to avoid memory fragmentation and ultimately results in lower overall memory consumption.

Read more about it, download the .tar.xz, build and install it .... here:

github.com/Zirias/swad

GitHubGitHub - Zirias/swad: Simple Web Authentication DaemonSimple Web Authentication Daemon. Contribute to Zirias/swad development by creating an account on GitHub.
Replied in thread

Analyzed and THIS is the core part of the fix:

github.com/Zirias/poser/commit

I'll have to revisit that later, it's probably wasteful as it is right now, I should come up with some better idea to implement async logging later. But it DOES fix the bug I finally identified that could make #swad crash.

Now, just running it, observing it, stress-testing it from time to time ... wish me luck that THIS was indeed the only reason left for crashing in "normal operation".

I already know there's more work to do regarding reload (SIGHUP) with the multi-threaded approach, it's currently unsafe unfortunately, must be fixed before the next release.

GitHubLog: Fix async logging with service workers · Zirias/poser@b1aa978Fix async logging with multiple service workers. Enqueueing a thread job from within a pool thread is not safe any more, because the job must know its "owning" service worker thread. So, ...
Continued thread

Oh boy, I have a lead! And it's NOT related to #TLS. I finally noticed another pattern: #swad only #crashed when running as a #daemon. The daemonizing wasn't the problem, but the default logging configuration attached to it: "fake async", by letting a #threadpool job do the logging.

Forcing THAT even when running in foreground, I can finally reproduce a crash. And I wouldn't be surprised if that was actually the reason for crashing "pretty quickly" with #LibreSSL (and only rarely with #OpenSSL), I mean, something going rogue in your address space can have the weirdest effects.

Continued thread

For two days straight, I just can't reproduce #swad #crashing with *anything* in place (#clang #sanitizer instrumentation, attached #debugger like #lldb) that could give me the slightest hint what's going wrong. 😡

But it *does* crash when "unobserved". And it looks like this is happening a lot sooner (or, more often?) when using #LibreSSL ... but I also suspect this could be a red herring in the end.

Situation reminds me of my physics teacher back at school, who used to say something in german I just can't ever forget:

"Wer misst, misst Mist."

Feeble attempt in english would be "the one who measures measures crap", it was his humorous way to bring one consequence of #Heisenberg's indeterminacy principle to the point. And indeed, #debugging computer programs always suffers from similar problems...

I need help. First the question: On #FreeBSD, with all ports built with #LibreSSL, can I somehow use the #clang #thread #sanitizer on a binary actually using LibreSSL and get sane output?

What I now observe debugging #swad:

- A version built with #OpenSSL (from base) doesn't crash. At least I tried very hard, really stressing it with #jmeter, to no avail. Built with LibreSSL, it does crash.
- Less relevant: the OpenSSL version also performs slightly better, but needs almost twice the RAM
- The thread sanitizer finds nothing to complain when built with OpenSSL
- It complains a lot with LibreSSL, but the reports look "fishy", e.g. it seems to intercept some OpenSSL API functions (like SHA384_Final)
- It even complains when running with a single-thread event loop.
- I use a single SSL_CTX per listening socket, creating SSL objects from it per connection ... also with multithreading; according to a few sources, this should be supported and safe.
- I can't imagine doing that on a *single* thread could break with LibreSSL, I mean, this would make SSL_CTX pretty much pointless
- I *could* imagine sharing the SSL_CTX with multiple threads to create their SSL objects from *might* not be safe with LibreSSL, but no idea how to verify as long as the thread sanitizer gives me "delusional" output 😳

This redesign of #poser (for #swad) to offer a "multi-reactor" (with multiple #threads running each their own event loop) starts to give me severe headaches.

There is *still* a very rare data #race in the #lockfree #queue. I *think* I can spot it in the pseudo code from the paper I used[1], see screenshot. Have a look at lines E7 and E8. Suppose the thread executing this is suspended after E7 for a "very long time". Now, some dequeue operation from some other thread will eventually dequeue whatever "Q->Tail" was pointing to, and then free it after consumption. Our poor thread resumes, checks the pointer already read in E6 for NULL successfully, and then tries a CAS on tail->next in E9, which is unfortunately inside an object that doesn't exist any more .... If the CAS succeeds because at this memory location happens to be "zero" bytes, we corrupted some random other object that might now reside there. 🤯

Please tell me whether I have an error in my thinking here. Can it be ....? 🤔

Meanwhile, after fixing and improving lots of things, I checked the alternative implementation using #mutexes again, and surprise: Although it's still a bit slower, the difference is now very very small. And it has the clear advantage that it never crashes. 🙈 I'm seriously considering to drop all the lock-free #atomics stuff again and just go with mutexes.

[1] dl.acm.org/doi/10.1145/248052.

I just stress-tested the current dev state of #swad on #Linux. The first attempt failed miserably, got a lot of errors accepting a connection. Well, this lead to another little improvement, I added another static method to my logging interface that mimics #perror: Also print the description of the system errno. With that in place, I could see the issue was "too many open files". Checking #ulimit -n gave me 1024. Seriously? 🤯 On my #FreeBSD machine, as a regular user, it's 226755. Ok, bumped that up to 8192 and then the stress test ran through without issues.

On a side note, this also made creating new timers (using #timerfd on Linux) fail, which ultimately made swad crash. I have to redesign my timer interface so that creating a timer may explicitly fail and I can react on that, aborting whatever would need that timer.

Anyways, the same test gave somewhat acceptable results: throughput of roughly 3000 req/s, response times around 500ms. Not great, but okayish, and not directly comparable because this test ran in a #bhyve vm and the requests had to pass the virtual networking.

One major issue is still the #RAM consumption. The test left swad with a resident set of > 540 MiB. I have no idea what to do about that. 😞 The code makes heavy use of "allocated objects" (every connection object with metadata and buffers, every event handler registered, every timer, and so on), so, uses the #heap a lot, but according to #valgrind, correctly frees everything. Still the resident set just keeps growing. I guess it's the classic #fragmentation issue...

Replied in thread

@fanf Sure that does make sense. I'll try to verify jmeter indeed doesn't reuse connections (I already have debug logging in place that should tell me).

If that's really the reason, I guess the sane thing to do is to add a hint to the docs to just disable TLS for very busy sites. The intended usecase for #swad is operation behind #nginx to serve its "auth_request". I don't intend to implement HTTP/2 or beyond, but it would be pretty pointless here anyways, nginx defaults to HTTP/1.0 for proxy requests and can be configured to use HTTP/1.1 instead, but *still* doesn't reuse connections by default, and my experiments so far to enable it weren't successful, maybe I didn't fully understand it yet. Using TLS behind nginx would make sense from a "defense in depth" point of view, but it's probably impractical once your load exceeds a certain threshold.

For background how I arrived there, I observed stupid #AI #scraper #bots clog my DSL connection by downloading gigabytes of build logs produced by my #poudriere. They're not secret in any way and having a simple way to share them is great for community bug hunting, but this had to stop. I had a simple C library doing a fully portable reactor event loop on top of select (so, not really scalable), and some very limited HTTP/1.1 server code from experiments with TOR hidden services ... so I put that together to add some web-form + cookies auth to my private nginx to lock out the bots. Later, I added a "guest login" doing the same "proof of work" stuff known from #anubis, and then I suddenly had the idea in mind to make my little service (that already solved the problem perfectly for myself) suitable for large-scale installations. So, added kqueue, epoll etc support, added a "multi-reactor with acceptor-connector" design, etc .... and now I'm a bit frustrated enabling TLS spoils all the performance 🙈

More interesting progress trying to make #swad suitable for very busy sites!

I realized that #TLS (both with #OpenSSL and #LibreSSL) is a *major* bottleneck. With TLS enabled, I couldn't cross 3000 requests per second, with somewhat acceptable response times (most below 500ms). Disabling TLS, I could really see the impact of a #lockfree queue as opposed to one protected by a #mutex. With the mutex, up to around 8000 req/s could be reached on the same hardware. And with a lockfree design, that quickly went beyond 10k req/s, but crashed. 😆

So I read some scientific papers 🙈 ... and redesigned a lot (*). And now it finally seems to work. My latest test reached a throughput of almost 25k req/s, with response times below 10ms for most requests! I really didn't expect to see *this* happen. 🤩 Maybe it could do even more, didn't try yet.

Open issue: Can I do something about TLS? There *must* be some way to make it perform at least a *bit* better...

(*) edit: Here's the design I finally used, with a much simplified "dequeue" because the queues in question are guaranteed to have only a single consumer: dl.acm.org/doi/10.1145/248052.

Finally getting somewhere working on the next evolution step for #swad. I have a first version that (normally 🙈) doesn't crash quickly (so, no release yet, but it's available on the master branch).

The good news: It's indeed an improvement to have *multiple* parallel #reactor (event-loop) threads. It now handles 3000 requests per second on the same hardware, with overall good response times and without any errors. I uploaded the results of the stress test here:

zirias.github.io/swad/stress/

The bad news ... well, there are multiple.

1. It got even more memory hungry. The new stress test still simulates 1000 distinct clients (trying to do more fails on my machine as #jmeter can't create new threads any more...), but with delays reduced to 1/3 and doing 100 iterations each. This now leaves it with a resident set of almost 270 MiB ... tuning #jemalloc on #FreeBSD to return memory more promptly reduces this to 187 MiB (which is still a lot) and reduces performance a bit (some requests run into 429, overall response times are worse). I have no idea yet where to start trying to improve *this*.

2. It requires tuning to manage that load without errors, mainly using more threads for the thread pool, although *these* threads stay almost idle ... which probably means I have to find ways to make putting work on and off these threads more efficient. At least I have some ideas.

3. I've seen a crash which only happened once so far, no idea as of now how to reproduce. *sigh*. Massively parallel code in C really is a PITA.

Seems the more I improve here, the more I find that *should* also be improved. 🤪

zirias.github.ioApache JMeter Dashboard

Just released: #swad 0.11 -- the session-less swad is done!

Swad is the "Simple Web Authentication Daemon", it adds cookie/form #authentication to your reverse #proxy, designed to work with #nginx' "auth_request". Several modules for checking credentials are included, one of which requires solving a crypto challenge like #Anubis does, to allow "bot-safe" guest logins. Swad is written in pure #C, compiles to a small (200-300kiB) binary, has minimal dependencies (zlib, OpenSSL/LibreSSL and optionally libpam) and *should* work on many #POSIX-alike systems (#FreeBSD tested a lot, #Linux and #illumos also tested)

This release is the first one not to require a server-side session (which consumes a significant amount of RAM on really busy sites), instead signed Json Web Tokens are now implemented. For now, they are signed using HMAC-SHA256 with a random key generated at startup. A future direction could be support for asymmetric keys (RSA, ED25519), which could open up new possibilities like having your reverse proxy pass the signed token to a backend application, which could then verify it, but still not forge it.

Read more, grab the latest .tar.xz, build and install it ... here: 😎

github.com/Zirias/swad

GitHubGitHub - Zirias/swad: Simple Web Authentication DaemonSimple Web Authentication Daemon. Contribute to Zirias/swad development by creating an account on GitHub.

Just released: #swad 0.10

github.com/Zirias/swad/release

Swad is the "Simple Web Authentication Daemon". If you're looking for a way to add #authentication (and/or proof-of-work access as known from #anubis) to your #nginx reverse proxy -- without adding yet another reverse proxy -- swad could be for you! It's written in pure #C, has few external dependencies (just zlib, and optionally OpenSSL/Libressl and/or libpam) and compiles to a pretty small binary. It's designed for usage with nginx' 'auth_request'.

Swad is tested on #FreeBSD, some basic functionality tests were also done on #Linux and #illumos (descendant from #solaris). It *should* build and work on most #POSIX-alike systems.

This release mainly brings performance improvements and a few bugfixes. It's now stress-tested with Apache jmeter, verifying it can deal with at least 1000 requests per second on my personal (somewhat limited) FreeBSD host machine.

GitHubRelease swad 0.10 · Zirias/swadBreaking changes: Correct integration with nginx now requires using an internal redirect, see the updated nginx config snippet in README.md! Improvements: Don't use CSRF protection where it isn'...
Continued thread

Got somewhere:

github.com/Zirias/swad/commit/

Now, no bot ever causes #swad to create a server-side session, at least from what I can observe in my logs -- these bots don't attempt any login!

I also disabled usage of CSRF tokens for the login form, which I forgot to mention in the commit message. They strictly require a session and are pointless on a login form anyways.

GitHubSession: Reduce usage, make explicit · Zirias/swad@1bbd1e9Instead of always creating a session directly from the middleware, only look for an existing session and make creation explicit with Session_start(). Use this to only use a session when it is real...
Continued thread

I now decided I'll at least aim for some middle grounds: Rework #swad so it only needs a (server-side) #session once a user is #authenticated!

This does have some implications, e.g. passing a redirect argument to the authentication endpoint won't work any more. But experimentation shows a workaround would be to use an "internal redirect" to the login endpoint in #nginx.

We'll see where I end up. Having sessions only for authenticated users should reduce the need for server-side RAM significantly, so I hope 😉

Good morning! ☕

Now that I can't find any other bugs in #swad any more, I'm thinking again about how I could improve it.

Would anyone consider deploying it on a busy site right now? Either as a replacement for #Anubis (proof-of-work against bots), or for simple non-federated #authentication, or maybe even both?

I'm currently not sure how well it would scale. The reason is the design with server-side sessions, which is simple and very light-weight "on the wire", but needs server-side RAM for each and every client. It's hard to guess how this would turn out on very busy sites.

So, I'm thinking about moving to a stateless design. The obvious technical choice for that would be to issue a signed #JWT (Json Web Token), just like Anubis does it as well. This would have a few consequences though:

* OpenSSL/LibreSSL would be a hard build dependency. Right now, it's only needed if the proof-of-work checker and/or TLS support is enabled.
* You'd need an X509 certificate in any case to operate swad, even without TLS, just for signing the JWTs.
* My current CSRF-protection would stop working (it's based on random tokens stored in the session). Probably not THAT bad, the login itself doesn't need it at all, and once logged in, the only action swad supports is logout, which then COULD be spoofed, but that's more an annoyance than a security threat... 🤔
* I would *still* need some server-side RAM for each and every client to implement the rate-limits for failed logins. At least, that's not as much RAM as currently.

Any thoughts? Should I work on going (almost) "stateless"?

Continued thread

Found and fixed two more bugs affecting only #TLS with #swad, so here's yet another "bugfix release":

github.com/Zirias/swad/release

One of these bugs was always there and I never noticed (just ignoring intermediate certificates) because many clients cope well with this, but not all.

The other bug is yet another regression from earlier performance improvements. 😞

So, lots of releases these last days. I'll have to remember to do very thorough regression testing whenever "optimizing" things in existing code 🙈

In a nutshell: 0.8 was finally fine again without TLS, but if you need TLS, better use this new 0.9.

GitHubRelease swad 0.9 · Zirias/swadBugfix release, fixes: [TLS] make sure a TLS shutdown is delayed until pending data is sent [TLS] actually read and use intermediate certificates if present in the certificate file