#haiku on 2024-08-27 — irc logs at oftc.irclog.whitequark.org

2024-07-16 04:52 ChanServ changed the topic of #haiku to: Open-source operating system that specifically targets personal computing. | https://haiku-os.org | Nightlies: https://download.haiku-os.org | Bugtracker: https://dev.haiku-os.org | SCM: https://git.haiku-os.org/ | Logs: https://oftc.irclog.whitequark.org/haiku | Matrix: #haiku:matrix.org | XMPP: #haiku%irc.oftc.net@irc.jabberfr.org

00:01 nipos has left #haiku [Disconnected: Replaced by new connection]

00:01 nipos has joined #haiku

00:10 avanspector[m] has joined #haiku

00:10 <avanspector[m]> waddlesplash: hi, have there been any considerations amongst devs on adding futexes to user API?

00:13 _-Caleb-_ has left #haiku [#haiku]

00:13 _-Caleb-_ has joined #haiku

00:17 OrangeBomb has quit [Remote host closed the connection]

00:21 OrangeBomb has joined #haiku

00:51 <waddlesplash> avanspector[m]: no

00:51 <waddlesplash> the pthread locks already use an API internally that has many of the advantages of a futex API

00:52 <waddlesplash> I don't see much if any reason to expose anything else. the only thing that might be worthwhile is the API for having a mutex in just an int32 rather than the whole pthread data structure, but even this has very limited usecases

00:53 <waddlesplash> why do you ask?

00:55 * OscarL wonders if using "hlt" instead of "pause" while on KDL's kgetc() would work (and if that would allow VBox to stop eating 100% of a core while the debug console is active).

00:58 <phschafft> what is pause?

00:59 <OscarL> phschafft: https://xref.landonf.org/source/xref/haiku/headers/private/kernel/arch/x86/arch_cpu.h#711

01:00 erysdren has joined #haiku

01:03 <phschafft> hm, ok.

01:03 <phschafft> thank you.

01:03 <erysdren> howdy all, what's goin on?

01:04 * OscarL reads some driver code and wishes he had a better brain.

01:04 <OscarL> that's from this side, erysdren. how about you? :-)

01:05 <erysdren> just working on my code projects... custom 3D game engine, Rise of the Triad sourceport, Quake engine stuff....

01:05 <erysdren> yknow, the usual :P

01:06 <erysdren> i really wanna make the memory usage and management in the ROTT sourceport (Taradino) better, so it might have a better chance of running on other platforms

01:06 <erysdren> it only barely runs on Haiku as-is.

01:06 <OscarL> heh

01:07 <erysdren> i think Linux/Windows/etc just have better hardening against applications with shoddy memory management, crazy array accesses, etc.

01:07 <erysdren> the game is coded pretty poorly.

01:15 <Skipp_OSX> ActionRetro has Evolve III Maestro 11.6" Laptop Computer - Micro Center... Intel Celeron N3450 1.1GHz Processor

01:17 <Skipp_OSX> Celeron N4120, N3450... bout the same, need patch :)

01:18 <OscarL> I guess I mixed up with some other machine. I thought the one he got for <100 USD had an N4120.

01:18 <OscarL> The N3450 has a different device ID: "5A85"

01:18 <OscarL> https://en.wikipedia.org/wiki/List_of_Intel_graphics_processing_units

01:20 PetePete has quit [Ping timeout: 480 seconds]

01:21 <OscarL> would also need to check what PCH it has, and what's the id of the "host bridge" device (for use on the agp_gart/intel driver).

01:22 <Skipp_OSX> https://www.microcenter.com/product/646649/evolve-iii-maestro-116-laptop-computer-dark-grey maybe there are diff models

01:23 tuaris has quit [Quit: Leaving.]

01:24 <OscarL> most probably I just missremember it (being kinda similar).

01:24 <Skipp_OSX> we need 5A85 for r1b5 ok

01:27 <OscarL> according to this: https://dri.freedesktop.org/wiki/IntelCodenames/ the UHD 500 sits between SkyLake and KabyLake...

01:28 <Skipp_OSX> so does my cpu :)

01:28 <OscarL> while the N4x20 sits right between KabyLake and CoffeeLake.

01:28 <OscarL> (in regards to graphics)

01:28 <Skipp_OSX> die shrunk I believe on mobile

01:29 <Skipp_OSX> 14nm part

01:31 <OscarL> right. Intel code names gives me a headache.

01:31 <Skipp_OSX> take a Tylenol, we need it!

01:35 <OscarL> feel free to add the ids on https://review.haiku-os.org/c/haiku/+/8083. I can't test hardwre I don't have, and I rather avoid possible KDLs/black screens at boot :-)

01:50 <OscarL> that little netbook sounds nice, till you read about the erratum on those Goldmont CPUs: "eMMC should have a maximum of 33% active time and should be set to D3 device low power state by the operating system when not in use."

01:51 <OscarL> paraphrasing: "also... don't use USB ports, LPC bus, SD Card, or RTC too much if you want it to last" :-D

01:59 <Skipp_OSX> come on you're the one that knows the IDs :)

02:02 <OscarL> but we need more than just the "UHD Graphics 500" id... we need at least two more (PCH and LPC bus one)... and also to make sure to wich "group" it actually equates.

02:04 <OscarL> Apollo Lake seems Gen9, so I would assume SkyLake group. But I assumed GeminiLake was KabyLake... turns out it is, just in parts, in others, it behaves like CoffeeLake :-D

02:06 <Skipp_OSX> Device ID 0x5A85 ok got that part

02:10 <Skipp_OSX> https://ark.intel.com/content/www/us/en/ark/products/95596/intel-celeron-processor-n3450-2m-cache-up-to-2-20-ghz.html formally Apollo Lake whatever that means

02:10 <OscarL> man, listdev output is annoying :-/

02:12 <OscarL> Skipp_OSX: on my GeminiLake... I looked in listdev output for the following: "Host bridge" (that id goes along with the device ID for the Graphics, in intel_gart.cpp).

02:13 talos has quit [Ping timeout: 480 seconds]

02:13 <Skipp_OSX> I'll email him real quick

02:13 <Skipp_OSX> (just kidding)

02:15 <OscarL> and then for... "ISA bridge" (usually mentions LPC controller)... that ID might be needed in intel_extreme/driver.cpp's "detect_intel_pch()".

02:16 <OscarL> and thats more or less the total extent of my understaning of things :-P

02:26 * OscarL rebuilds 8083 on top of beta5 branch, to see if that fixes his boot issues.

02:29 <Skipp_OSX> this is the best I can do: https://0x0.st/XtS0.PNG

02:32 <OscarL> Sadly, can't do much with that :-). At least he can boot currently, would be bad to change that for the worse for "yolo"ing the IDs... (remember radeon :-D)

02:35 <Skipp_OSX> https://i.ebayimg.com/images/g/UhcAAOSweyxl14mI/s-l1600.webp hehe

02:35 <OscarL> "If you rename the system folder or its content"... that warning drives me nuts. I'm trying to rename a file under /var/log... stop pestering me Tracker!!!

02:36 <OscarL> Also... that dialog is a nightmare to use with keyboard only.

02:36 <Skipp_OSX> yeah I hate that warning too

02:43 * OscarL is really happy with idualwifi7260 finally working *most* of the time (after 6 months where it only worked on 2 separate days :-D)

02:43 mmu_man has quit [Ping timeout: 480 seconds]

02:45 <OscarL> yes! beta5 with working intel_extreme graphics (brightness control working).

02:46 smalltalkman__ has quit []

02:59 <Skipp_OSX> yay

02:59 diver has joined #haiku

03:00 * phschafft can clearly think about a few people he would like to have brightness controll on...

03:00 <OscarL> would be cool if the screen turned off when closing the lid, but... oh well.

03:02 <OscarL> phschafft: radiant people are the worse ;-P

03:03 <phschafft> hm....

03:03 <OscarL> (black body radiation doesn't count)

03:05 <zdykstra> evenin' OscarL

03:05 <OscarL> hello zdykstra!

03:06 <zdykstra> how are you this fine evening?

03:06 <zdykstra> and by fine I mean hot and humid

03:06 <OscarL> midnight already down here... less cold than the last few days so... can't complain much :-)

03:10 <phschafft> here it's cold and spider-y

03:12 n_crm has joined #haiku

03:14 TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

03:14 TMM has joined #haiku

03:52 kinkinkijkin has quit [Quit: Leaving]

04:07 clee_ is now known as clee

04:09 AlaskanEmily3 has quit [Quit: Leaving]

04:09 AlaskanEmily3 has joined #haiku

04:18 erysdren has quit [Quit: Konversation terminated!]

04:33 walkingdisaster has joined #haiku

04:50 wicknix has joined #haiku

04:56 talos has joined #haiku

04:57 walkingdisaster has quit [Quit: Vision[]: i've been blurred!]

05:20 talos has quit [Ping timeout: 480 seconds]

05:23 Begasus has joined #haiku

05:24 <Begasus> g'morning peeps

05:36 vdamewood_ has quit [Quit: My Mac has gone to sleep. ZZZzzz…]

05:45 vdamewood has joined #haiku

05:56 vdamewood_ has joined #haiku

06:01 vdamewood has quit [Ping timeout: 480 seconds]

06:06 vdamewood has joined #haiku

06:06 vdamewood has quit []

06:12 vdamewood_ has quit [Ping timeout: 480 seconds]

06:16 vdamewood has joined #haiku

06:37 yann64 has quit [Quit: yann64]

06:46 freddietilley has joined #haiku

06:55 <botifico> [haikuports/haikuports] Begasus pushed 1 commit to master [+2/-2/±0] https://github.com/haikuports/haikuports/compare/91a7cbd871d5...6a1223978120

06:55 <botifico> [haikuports/haikuports] Begasus 6a12239 - qca, bump version (#10921)

07:11 talos has joined #haiku

07:23 talos has quit [Ping timeout: 480 seconds]

07:24 talos has joined #haiku

07:29 talos8 has joined #haiku

07:33 talos has quit [Ping timeout: 480 seconds]

07:40 talos8 has quit [Ping timeout: 480 seconds]

07:40 <botifico> [haikuports/haikuports] Begasus pushed 1 commit to master [+2/-2/±0] https://github.com/haikuports/haikuports/compare/6a1223978120...3e370606eedd

07:40 <botifico> [haikuports/haikuports] Begasus 3e37060 - extra_cmake_modules, bump version (#10922)

07:45 talos has joined #haiku

08:22 <oanderso[m]> x512 waddlesplash I'd really appreciate eyes on this change. Not enough stuff is working to really test it yet, but I'd really like to get this foundational piece solid before layering more users on top of it: https://review.haiku-os.org/c/haiku/+/8139

08:31 Forza has joined #haiku

08:45 yann64 has joined #haiku

08:46 ChaiTRex has quit [Remote host closed the connection]

08:49 erysdren has joined #haiku

08:49 <botifico> [haikuports/haikuports] Begasus pushed 1 commit to master [+0/-0/±2] https://github.com/haikuports/haikuports/compare/3e370606eedd...f1bc64e14ff1

08:49 <botifico> [haikuports/haikuports] Begasus f1bc64e - nextcloud_client, revbump for openssl3, add documentation (#10923)

08:53 yann64 has quit [Quit: Vision[]: i've been blurred!]

08:55 yann64 has joined #haiku

08:57 erysdren has quit [Quit: Konversation terminated!]

08:59 <botifico> [haikuports/haikuports] Begasus pushed 1 commit to master [+1/-0/±0] https://github.com/haikuports/haikuports/compare/f1bc64e14ff1...49b8e8e87180

08:59 <botifico> [haikuports/haikuports] Begasus 49b8e8e - libkexiv2_24, add Qt6 recipe for libKExiv2 (#10924)

09:07 <botifico> [haikuports/haikuports] Begasus pushed 1 commit to master [+1/-1/±0] https://github.com/haikuports/haikuports/compare/49b8e8e87180...1599635726e2

09:07 <botifico> [haikuports/haikuports] Begasus 1599635 - kdegraphics_mobipocket24, bump version (#10925)

09:10 OscarL has quit [Read error: Connection reset by peer]

09:18 <botifico> [haikuports/haikuports] Begasus pushed 1 commit to master [+0/-0/±1] https://github.com/haikuports/haikuports/compare/1599635726e2...610c98d8f6df

09:18 <botifico> [haikuports/haikuports] Begasus 610c98d - phonon6, revbump for Qt6 changes (#10926)

09:32 yann64 has quit [Quit: Vision[]: i've been blurred!]

09:32 yann64 has joined #haiku

09:41 <phschafft> can I tell the Haiku boot loader to just go ahead? on my system it seems to wait for me to confirm that I want to... boot Haiku ;)

09:42 <Begasus> give it the "boot"? :)

09:43 coolcoder613_ has joined #haiku

09:43 <phschafft> hm?

09:44 <Begasus> https://www.youtube.com/watch?v=kAG39jKi0lI (mentioned at almost the end there) :)

09:48 coolcoder613 has quit [Ping timeout: 480 seconds]

09:53 _-Caleb-_ has left #haiku [#haiku]

09:54 _-Caleb-_ has joined #haiku

09:55 <botifico> [haikuports/haikuports] Begasus pushed 1 commit to master [+1/-1/±0] https://github.com/haikuports/haikuports/compare/610c98d8f6df...467e6f44b4d9

09:55 <botifico> [haikuports/haikuports] Begasus 467e6f4 - ki18n6, bump version (#10927)

09:58 <botifico> [haikuports/haikuports] Begasus pushed 1 commit to master [+1/-1/±0] https://github.com/haikuports/haikuports/compare/467e6f44b4d9...5e5f810fde69

09:59 <botifico> [haikuports/haikuports] Begasus 5e5f810 - kconfig6, bump version (#10928)

10:04 <botifico> [haikuports/haikuports] Begasus pushed 1 commit to master [+2/-2/±0] https://github.com/haikuports/haikuports/compare/5e5f810fde69...ca73f4e997f8

10:04 <botifico> [haikuports/haikuports] Begasus ca73f4e - kcoreaddons6, bump version (#10929)

10:10 <botifico> [haikuports/haikuports] Begasus pushed 1 commit to master [+1/-1/±0] https://github.com/haikuports/haikuports/compare/ca73f4e997f8...a6fef215a3da

10:10 <botifico> [haikuports/haikuports] Begasus a6fef21 - kcodecs6, bump version (#10930)

10:13 <phschafft> haha

10:14 <botifico> [haikuports/haikuports] Begasus pushed 1 commit to master [+1/-1/±0] https://github.com/haikuports/haikuports/compare/a6fef215a3da...407ea03b425c

10:14 <botifico> [haikuports/haikuports] Begasus 407ea03 - karchive6, bump version (#10931)

10:17 <Begasus> ;)

10:17 <Begasus> keeps passing by here when "booting" is mentioned :D

10:20 arraybolt3 has quit [Quit: WeeChat 4.1.1]

10:21 <phschafft> over here we sometimes refer to reboot as 'umschuhen' (to change shoes).

10:23 <Begasus> funny too (ps, I do know a bit of German, Aachen only about 50Km from here) :)

10:23 arraybolt3 has joined #haiku

10:23 <Begasus> my French is more rusty though

10:23 * phschafft noods.

10:23 <phschafft> I know, but this is an international channel after all :)

10:24 <Begasus> +1 :)

10:24 <gordonjcp> I may be the only Gaelic speaker in the channel though

10:24 <phschafft> and I think it's always kind to keep those in mind who just read. Being myself in that position most of the time.

10:24 <Begasus> Gaelic as in?

10:24 <phschafft> gordonjcp: then you need to install a language pack on someone else!

10:25 <dovsienko> Begasus: the native language of Scotland (after Pictish, that is)

10:26 <gordonjcp> phschafft: my son has mostly settled on speaking English but when he was about 2 he would just use whichever English, German, or Gaelic word he thought of first

10:27 <gordonjcp> phschafft: poor wee guy had a hard time on holiday because the other kids he was playing with couldn't understand him speaking English but he could understand them speaking English or German

10:28 <Begasus> ah!

10:29 <Begasus> I still talk our local dialect also to the kids and grandchildren, sadly they didn't pick it up

10:29 <botifico> [haikuports/haikuports] Begasus pushed 1 commit to master [+1/-1/±0] https://github.com/haikuports/haikuports/compare/407ea03b425c...39e4dbf0839d

10:29 <botifico> [haikuports/haikuports] Begasus 39e4dbf - kwidgetsaddons6, bump version (#10932)

10:30 <phschafft> gordonjcp: my parents had a fight on which my second language should be, so they decided that I would not group up with three languages but one. made my life way more difficult in the end.

10:30 <phschafft> (for any definition of language as by my parents)

10:31 <phschafft> I think it it is very good to /offer/ additional languages to young childs as it helps the brains to get used to there being more than one.

10:31 <phschafft> what they make of that is clearly their way.

10:32 <Begasus> I'll second that

10:32 <phschafft> also I very much understand the problem with interacting with other people. I find myself now, years later, in that situation sometimes.

10:33 <phschafft> it can be hard, specifically for you young child. but it can also be a path to growth. for both sides. (but may also require a bit of support from the parents of both sides).

10:43 <botifico> [haikuports/haikuports] Begasus pushed 1 commit to master [+2/-2/±0] https://github.com/haikuports/haikuports/compare/39e4dbf0839d...9124d53b7e8b

10:43 <botifico> [haikuports/haikuports] Begasus 9124d53 - falkon, bump version (for openssl3) (#10933)

10:47 vdamewood_ has joined #haiku

10:48 mmu_man has joined #haiku

10:54 vdamewood has quit [Ping timeout: 480 seconds]

10:54 <botifico> [haikuports/haikuports] Begasus pushed 1 commit to master [+1/-2/±0] https://github.com/haikuports/haikuports/compare/9124d53b7e8b...c61b7de8447d

10:54 <botifico> [haikuports/haikuports] Begasus c61b7de - kguiaddons6, bump version (#10934)

10:55 <Begasus> need to focus a bit on those KF6 recipes, they need to be build in the right order, so not doing much side-tracks today :)

10:56 diver1 has joined #haiku

10:56 diver has quit [Read error: Connection reset by peer]

11:04 jhj has joined #haiku

11:06 <jhj> Hi there. Somewhat recently I've seen a huge performance regression on Haiku, around 90% slowdown, with our application. (https://dps8m.gitlab.io/)

11:07 <jhj> We support building a threaded and a non-threaded version.

11:07 _-Caleb-_ has left #haiku [#haiku]

11:07 <jhj> The non-threaded version performs fine, but the threaded version doing work in a single thread is 80-90z slower, and this wasn't always the case.

11:08 <jhj> It's just Haiku - the threaded version is 7-9% slower on all other tested platforms we support (mac, linux, solaris, AIX, windows)

11:08 <nekobot> [haiku/haiku] pulkomandy pushed 2 commits to master [hrev58022] - https://git.haiku-os.org/haiku/log/?qt=range&q=64b7d332ba58+%5E6f88de113d09

11:08 <nekobot> [haiku/haiku] 4cce55e11245 - Revert "ArchitectureRules: Remove lines for -Werror for FreeBSD drivers."

11:08 <nekobot> [haiku/haiku] 64b7d332ba58 - Add a note about why some parts of the code are not built with -Werror.

11:09 <jhj> The easiest way to reproduce this with the "./src/pgo/Build.PGO.GCC.sh" script in our repo in the master branch.

11:10 _-Caleb-_ has joined #haiku

11:11 <jhj> Haiku needs a few prereqs installed (unzip, getconf, curl, wget, libuv, libuv_devel) but then the script will execute automatically.

11:11 <jhj> Setting the NO_LOCKLESS=1 environment variable builds the non-threaded version.

11:16 <jhj> On my virtual machine here, I get an after score of 8.0581 for nonthreaded, and a score of 1.8023 for threaded.

11:16 <jhj> On the Linux host (same machine), threaded gets 13 and nonthreaded gets 12 for comparison.

11:18 <botifico> [haikuports/haikuports] Begasus pushed 1 commit to master [+1/-1/±0] https://github.com/haikuports/haikuports/compare/c61b7de8447d...e3f40345e616

11:18 <botifico> [haikuports/haikuports] Begasus e3f4034 - kcolorscheme6, bump version (#10935)

11:19 <jhj> Also, the "pigz" package is currently broken on Haiku, it exits with an error about library mismatch.

11:23 <avanspector[m]> <waddlesplash> "why do you ask?" <- Every other OS has them, and in that sense they are really powerful because I can reproduce most of synchronization primitives just using thin abstraction on top of futexes

11:23 <avanspector[m]> in my libraries that I use across OSs

11:25 <botifico> [haikuports/haikuports] Begasus pushed 1 commit to master [+1/-1/±0] https://github.com/haikuports/haikuports/compare/e3f40345e616...9fe32ce3d147

11:25 <botifico> [haikuports/haikuports] Begasus 9fe32ce - kconfigwidgets6, bump version (#10936)

11:25 <avanspector[m]> Also potential for other software that relies on them to be ported over

11:31 marzzbar has quit [Ping timeout: 480 seconds]

11:31 <botifico> [haikuports/haikuports] Begasus pushed 1 commit to master [+1/-2/±0] https://github.com/haikuports/haikuports/compare/9fe32ce3d147...eaece05da6d2

11:31 <botifico> [haikuports/haikuports] Begasus eaece05 - breeze_icons, bump version (#10937)

11:39 <Begasus> k, need to check this one on 32bit ...

11:40 smalltalkman__ has joined #haiku

11:41 <dovsienko> jhj: I thought pigz is a nice alternative to "xz -T", but it turned out pigz is only parallel for compression, and only up to 4 threads

11:43 <jhj> dovsienko: It works for 12 threads and even more on Linux. I thinkmit depends on how large the input is.

11:43 <botifico> [haikuports/haikuports] threedeyes pushed 1 commit to master [+2/-2/±0] https://github.com/haikuports/haikuports/compare/eaece05da6d2...9a6733b3cba6

11:43 <botifico> [haikuports/haikuports] threedeyes 9a6733b - libreoffice: bump version

11:43 <jhj> Also, it supports -11 which uses zopfli compression.

11:44 <Begasus> jikes: virtual memory exhausted: Out of memory (for 32bit breeze_icons)

11:44 <jhj> It's currently broken on Haiku, tho: "pigz: abort: zlib version less than 1.2.3"

11:44 <dovsienko> jhj: in my experience, pigz on otherwise idle Xeon servers with lots of RAM never loaded more than 4 cores

11:44 <dovsienko> apparently, there is also pbzip2, but this one I never tried

11:45 <dovsienko> jhj: gigabytes of input

11:54 <jhj> dovsienko: https://imgur.com/bDrsA9R

11:54 <jhj> Work For Me (tm)

11:55 <jhj> Linux though, can't test on Haiku right now.

12:09 <dovsienko> the web site looks broken, but let's say there is a difference that cannot be explained. I tested on Linux too.

12:16 jmairboeck has joined #haiku

12:18 <gordonjcp> phschafft: yeah

12:18 <gordonjcp> phschafft: I only started learning German relatively recently in my mid-40s

12:19 <gordonjcp> phschafft: it's not that difficult but I find myself sometimes trying to explain the German word I can't remember in Gaelic, which is no bloody use to anyone

12:19 OrngBomb has joined #haiku

12:20 OrangeBomb is now known as Guest1615

12:20 OrngBomb is now known as OrangeBomb

12:25 Guest1615 has quit [Ping timeout: 480 seconds]

12:26 illwieckz has joined #haiku

12:31 <Coldfirex> jhj: It might be worthwhile to open a bug report and see if you can find around which hrev the issue started

12:32 <jhj> Coldfirex: I think I should. Is there any snazzy full system profiler for Haiku I should know about?

12:32 * phschafft nods to gordonjcp.

12:33 <Coldfirex> I think we have a profiler, but that is out of wheelhouse

12:33 <jhj> Coldfirex: I never benchmarked it before scientifically, but I was just testing again recently and said "wow this is SLOW".

12:34 <jhj> And the fact that the threaded version is 80% slower than the single-threaded version on Haiku, but only the expected 7-9% everywhere else, makes me think, it's a Haiku problem.

12:35 <jhj> Also, we only distribute threaded binary builds, so that sucks for Haiku.

12:35 <Coldfirex> Its possible. Maybe try on a fresh beta 4, then b4+updates, then a nightly?

12:36 <jhj> not that we are a major project or anything, but sfill :)

12:36 <jhj> still*

12:38 <jhj> https://imgur.com/rcDVrQd

12:38 <jhj> Threaded on the bottom, obviously.

12:39 <jhj> Actually, it's 76% slower. But still, that's terrible.

12:40 <jhj> That's with nightly from today.

12:41 <jhj> Coldfirex: I recently broke my leg, so I'm stuck downstairs and can't easily use my office computer, I'm stuck with a crappy laptop downstairs. Maybe next time I get up there I setup VNC.

12:41 <jhj> makes testing harder

12:42 <dovsienko> jhj: maybe just send someone who is in shape to fetch your stuff to you

12:43 <jhj> I don't trust anyone to move my desktop around. I can get upstairs but it's a pain, I have to scoot up sitting on each stair and haul the walker up.

12:44 <Coldfirex> or virtualize it :)

12:44 <jhj> My laptop has like 8GB of RAM and a small disk, so VNC will be better. I test on Haiku by SSH'ing into it now.

12:45 <jhj> I would highly recommend not breaking your legs to everyone.

12:47 <Coldfirex> good advise hah

12:47 * coolcoder613_ nods

12:47 coolcoder613_ is now known as coolcoder613

12:49 <coolcoder613> My laptop has 8GB of RAM and a small disk, and it's the best machine I have

12:49 <coolcoder613> (256GB)

12:50 <phschafft> $in_the_good_old_times...

12:50 <coolcoder613> It is an Apple M1 MacBook Air, so the performance *is* decent

12:50 <coolcoder613> Or at least it is to me

12:51 * coolcoder613 is currently instaling BeOS on his Deskpro

12:51 <phschafft> the more powerful machines became on avg the more I throttle them.

12:51 yann64 has quit [Quit: Vision[]: i've been blurred!]

12:51 <coolcoder613> I got a PS/2 mouse finally

12:51 <coolcoder613> phschafft: throttle them?

12:51 yann64 has joined #haiku

12:51 <phschafft> and keep that in mind: 20 years ago it would be very uncommon to have a desktop running at 1% CPU while doing basically anything.

12:52 PetePete has joined #haiku

12:53 <phschafft> coolcoder613: e.g. my laptop can go up to 4x 3.8GHz, but I set the limits for the CPU to 4x 1.6GHz. keeps it cooler when under stress, but hardly noticeable to me.

12:53 <phschafft> I mean if you do something heavy for a second or two everything spins up, throttling them keeps it down to normal at the cost of adding like a second or two to your job.

12:54 <phschafft> but waiting for an extra second is way less anoying then having it spin up and down the fan, it running hotter, and in the end breaking down earlier.

12:55 <jhj> Oh, does anyone know if Haiku's setjmp/longjmp save the signal mask? On BSD, there is _setjmp/_longjmo that don't. Glibc before 2.19 saved by default, but now does not.

12:56 * phschafft wonders why jhj needs those calls.

12:56 <jhj> Haiku has all three of setjmp/_setjmp/sigsetjmp.

12:56 <jhj> phschafft: We use it to implement faults and interrupts in our virtual machine.

12:57 <jhj> Essentially our own exceptions.

12:57 <phschafft> the sig*() variants are in POSIX.

12:57 <phschafft> so they sound like good candidates.

12:57 <phschafft> jhj: 'we'?

12:59 <jhj> POSIX doesn't specify signal mask behaviors, which is why some OS's invented the variant.

12:59 <jhj> "It is unspecified whether longjmp( ) restores the signal mask, leaves the signal mask unchanged, or restores it to its value at the time setjmp( ) was called."

12:59 <jhj> phschafft: We is https://dps8m.gitlab.io/

13:00 <phschafft> https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigsetjmp.html

13:00 * phschafft nods: thanks. :)

13:01 <dovsienko> jhj: that's a good point, I didn't know the signal mask needs attention

13:01 <jhj> We don't care about saving and restoring it, so we'd prefer to use the fastest variant.

13:02 <jhj> That doesn't address the 75% speed difference between the threaded and non-threaded builds on Haiku, just an unrelated possible performance optimization.

13:03 cyrusbuilt has quit [Quit: Quit]

13:04 * phschafft nods.

13:05 <jhj> https://dps8m.gitlab.io/dps8m/master/global/S/26.html#L39

13:05 <jhj> The small list of OS's where it's faster to use the underscore variants.

13:05 <jhj> I guess I'll just need to benchmark it on Haiku or read the source.

13:06 <phschafft> they're all on the same lineage. fun.

13:07 <jhj> Glibc <2.19 also has the slower signal mask preserving longjmp by default, but that's old enough now I don't think I need to bother special-casing it.

13:15 <jhj> https://github.com/haiku/haiku/blob/64b7d332ba584838bde35ebedf4daa4ec084bc80/src/system/libroot/posix/arch/x86_64/siglongjmp.S

13:16 tuaris has joined #haiku

13:18 <jhj> looks like it always calls longjmp_return?

13:19 <jhj> It ould be an optimization to do that only for siglongjmp, since its only saved for sigsetjmp

13:27 <jhj> https://github.com/haiku/haiku/blob/64b7d332ba584838bde35ebedf4daa4ec084bc80/src/system/libroot/posix/arch/generic/longjmp_return.c

13:28 <jhj> Ah, it checks it there.

13:30 <Begasus> re

13:31 <Begasus> so breeze-icons 6.5.0 won't build on 32bit, will have to check the latest one that does

13:40 bbjimmy has quit [Quit: Vision[]: i've been blurred!]

13:47 bbjimmy has joined #haiku

14:04 <phschafft> I was wondering if the expire feature of tagpool could be ported to Haiku.

14:05 <phschafft> but it seems there is no timestamp type for attributes that would be helpful in that case.

14:05 <phschafft> also only the unlink mode would be supported.

14:11 <Begasus> nice, the revbump for nexcloud-client fixed login on 32bit also :D

14:11 <phschafft> :)

14:12 <botifico> [haikuports/haikuports] jmairboeck pushed 1 commit to master [+2/-2/±0] https://github.com/haikuports/haikuports/compare/9a6733b3cba6...7e43a9c37f0d

14:12 <botifico> [haikuports/haikuports] jmairboeck 7e43a9c - libtool: bump version (#10938)

14:28 walkingdisaster has joined #haiku

15:03 SLema has quit [Quit: Vision[]: i've been blurred!]

15:03 SLema has joined #haiku

15:09 <waddlesplash> jhj: knowing when the regression started could be useful

15:09 <waddlesplash> I am surprised your program isn't much faster multi threaded even on Linux though

15:11 freddietilley has quit [Quit: WeeChat 4.2.2]

15:18 <jhj> waddlesplash: The 7-9% threading slowdown is synchronization stuff. The benchmark in both cases is testing just a single thread.

15:18 <waddlesplash> ah

15:19 <jhj> So in the multithreaded version, you can run, say, 12 emulated CPUs, and each could run that benchmark.

15:19 nipos has left #haiku [Error from remote client]

15:20 <waddlesplash> but the multi threaded version with a single thread on Haiku is 75% slower, is that what you're saying?

15:20 <jhj> If you only have one CPU, or you want to run on embedded hardware and don't care about emulating a multiprocessing system, then you could just build the single threaded version.

15:20 <jhj> And yes, the multithreaded version with 1 CPU thread runs our benchmark 75% slower than the single threaded build.

15:21 nipos has joined #haiku

15:23 <waddlesplash> odd

15:24 <waddlesplash> pretty much all the multi threading primitives on Haiku should not even call the kernel if used with only one thread

15:24 <waddlesplash> so I don't know what this could be coming from

15:24 <waddlesplash> how long does a build take? could I spin it up here?

15:28 <jhj> waddlesplash: The exact numbers on my host machine just now was 14.0234 MIPS on the single threaded version, and 13.0755 on the muktithreaded version.

15:28 <nekobot> [haiku/haiku] waddlesplash pushed 1 commit to master [hrev58023] - https://git.haiku-os.org/haiku/log/?qt=range&q=1c3b53c4461f+%5E64b7d332ba58

15:28 <nekobot> [haiku/haiku] 1c3b53c4461f - FindPanel: fix branch condition evaluates to a garbage value

15:29 <nekobot> [haiku/haiku] waddlesplash pushed 1 commit to master [hrev58024] - https://git.haiku-os.org/haiku/log/?qt=range&q=a272ee798f1d+%5E1c3b53c4461f

15:29 <nekobot> [haiku/haiku] a272ee798f1d - package_repo: Add "-t" argument to update command.

15:30 <jhj> waddlesplash: It should very very fast to build. You'll need to install unzip, getconf, libuv, libuv_devel, bash, pkgconfig, gzip with pkgman.

15:30 <waddlesplash> jhj: still much slower than I'd expect, threading primitives should avoid doing much if only used with a single thread

15:31 <jhj> waddlesplash: then clone https://gitlab.com/dps8m/dps8m.git

15:32 <jhj> "env NO_LOCKLESS=1 ./src/pgo/Build.PGO.GCC.sh" will build, the single threaded version.

15:32 <jhj> wit

15:33 <jhj> "./src/pgo/Build.PGO.GCC.sh" for the regular threaded version

15:33 <jhj> waddlesplash: oh you need wget package too

15:34 nosycat has joined #haiku

15:35 <waddlesplash> this build is running much slower than I'd expect

15:35 <waddlesplash> not using much CPU, compiling one file every second or two

15:36 <waddlesplash> weird

15:36 <jhj> It does a whole bunch of checks on each file we could optimize.

15:36 <jhj> for dependency stuff, and its all written in shell

15:36 <waddlesplash> yeah, but even clean?

15:36 <waddlesplash> hm

15:38 <waddlesplash> jhj: yeah, you have hundreds of shell processes for every gcc process, no wonder

15:39 <jhj> It's pretty quick here, "make distclean && make" takes 36 seconds.

15:40 <jhj> (on Linux)

15:40 <jhj> on a pretty old machine

15:40 <waddlesplash> ok, how do I run the benchmark?

15:41 <jhj> The PGO script runs the benchmark automatically

15:41 <waddlesplash> oh

15:41 <waddlesplash> how do I run the benchmark without recompiling every time?

15:42 <jhj> "cd src/perf_test; ../dps8/dps8 -qr ./nqueensx.ini"

15:43 <waddlesplash> well it's using 100% CPU on one core here

15:44 <waddlesplash> means we aren't getting lost in lock contention at least

15:45 PetePete has quit [Ping timeout: 480 seconds]

15:46 PetePete has joined #haiku

15:47 <waddlesplash> jhj: how long should a run take?

15:47 <jhj> I don't understand how it can be 75% slower on Haiku. But I'm positive it wasn't always that bad or I would have noticed it when I ported it originally.

15:47 <jhj> It depends. It takes 20-some seconds here on Linux.

15:48 <waddlesplash> it's been going over a minute here

15:48 <jhj> That sounds "normal" for Haiku.

15:48 <jhj> It'll take 75-80% longer.

15:49 <jhj> So 2 or 3 minutes, depending on your hardware.

15:50 <jhj> On a POWER10 with AIX 7.3, it takes 17 seconds / 18 seconds

15:50 <jhj> Much quicker after profiling.

15:51 <waddlesplash> I would suspect there are things you can optimize in the code itself if PGO makes that much difference

15:51 <jhj> It's not much we can really do, there is a huge number of CPU instructions.

15:51 <waddlesplash> anyway, appears there's a profiler bug here, if I kill the program while it's in progress I don't get results from any other thread

15:51 <waddlesplash> using the whole-system profiler works

15:52 <waddlesplash> guess I should look into that...

15:52 <jhj> And optimizing QEMU with PHO gives the same 18-20% performance improvement with TCG.

15:52 <jhj> *PGO

15:52 <waddlesplash> jhj: well, I've found the answer

15:52 <waddlesplash> 17849 17849000 74.98 737614 get_tls_address(unsigned int, unsigned long)

15:52 <waddlesplash> that 3rd number is %

15:52 <jhj> Woah.

15:52 <waddlesplash> it's spending most of its time getting TLS addresses

15:53 <jhj> Yes, we make extensive use of thread local storage.

15:53 <waddlesplash> right, but it looks like you must be fetching it VERY often

15:53 <waddlesplash> instead of fetching it just once at the top of a function

15:53 <waddlesplash> that will probably speed things up everywhere if you fix that actually

15:53 <jhj> Every cycle we have to fetch it more than once, because of the way the appending unit works.

15:53 <jhj> That's by design.

15:54 <jhj> Remember, only Haiku is slow.

15:54 <waddlesplash> ?

15:54 <waddlesplash> why can't you fetch it once per cycle?

15:54 <jhj> Itks not a regular von neuman machine design

15:54 <waddlesplash> jhj: if you are fetching it multiple times per cycle on every OS then it's almost certainly a performance issue there too, those OSes are just faster at the fetch.

15:55 <waddlesplash> what does that have to do with anything?

15:55 <jhj> Some instructions can take hundreds of cycles, or thousands as well

15:55 <waddlesplash> you should only need to fetch the TLS address *once* per function call here

15:55 <waddlesplash> basically. function() { void* tls = tls_variable; }

15:55 <jhj> I don't ever call that function directly.

15:56 <waddlesplash> no, but the compiler generates many implicit calls to it

15:56 <waddlesplash> if you access a _thread_local variable

15:56 yann64 has quit [Quit: Vision[]: i've been blurred!]

15:56 <waddlesplash> so if you instead access the _thread_local variable just once and cache the result, massively reduces implicit calls

15:58 <jhj> on macos for example, the TLS lookup on their profiler comes in at 1.7%

15:58 <waddlesplash> keep in mind that there is another big difference on Haiku here which is that every executable is built as relocatable

15:58 <waddlesplash> this means we cannot use the "static" TLS model

15:58 <waddlesplash> and the "dynamic" TLS model needs a lot more function calls on every OS, that's just how it works

15:59 <waddlesplash> so, if you are building a non-PIC executable on Linux and macOS, you may get the static TLS model and then it's faster because of that

15:59 <jhj> so I dont think itks my code, really. And ecen if it was, Haiku is the huge outlier of the 17 OSs we support (AIX, OpenBSD, NetBSD, FreeBSD, DragonFly, Android, Windows, etc.)

15:59 <jhj> also, we enforce global-dynami model on AIX

16:00 <jhj> and we build -fPIC on Linux as well

16:00 <waddlesplash> hm

16:00 <waddlesplash> well, I don't know why it calls get_tls_address so often then

16:01 <waddlesplash> maybe rebuilding with Clang might be interesting...

16:01 <jhj> In fact, on AIX where we enforce global dynamic for various reasons, it's one of rhe fastest operating systems we run on (per clockmcycle)

16:01 mmu_man has quit [Ping timeout: 480 seconds]

16:01 <waddlesplash> that's surprising and probably means you could optimize a lot on Linux more.

16:01 <jhj> I don't think clang was differenr but I can check pretty quick

16:02 <jhj> We originally wrote this on AIX and FreeBSD

16:02 <waddlesplash> jhj: ok, so this isn't helped by the fact that I'm running a debug build of runtime_loader, which is where those TLS addresses come from

16:03 <waddlesplash> but seeing as we have over a thousand hits of TLSBlock::IsInvalid(), this indicates that we are calling these methods a really ridiculous number of times

16:03 <waddlesplash> because that method is literally *just a null pointer check* and nothing use

16:03 <waddlesplash> so for us to get over 1000 hits of it when *only sampling every 1 ms*, indicates just how many times it's invoked (a really ridiculously high number)

16:03 <jhj> I just built with Clang on Haiku, let me run the benchmark.

16:05 <jhj> waddlesplash: https://dps8m.gitlab.io/blog/posts/20240322_DPS8M_Android/

16:06 <phschafft> waddlesplash: how is the static model different?

16:06 <jhj> my $89 android tablet beats my desktop i7 running Haiku

16:06 <jhj> multithreaded builds on both

16:06 <phschafft> also, the function that returns the TLS address back to the compiler: might it be marked with some attributes that help the compiler? e.g. that the result is constant per thread?

16:06 <waddlesplash> jhj: there may be some TLS caching scheme there, I don't know. but if you adjust the code just a bit to call TLS methods less, it will surely be faster on all OSes and not just Haiku.

16:07 <waddlesplash> phschafft: I am not familiar with the precise differences tbh

16:07 <phschafft> waddlesplash: ok. thank you.

16:07 <waddlesplash> phschafft: I doubt it, because it's not constant per thread

16:07 <waddlesplash> this is ELF TLS, so we get a different result based on what offset is specified

16:07 <jhj> waddlesplash: even without optimization problems, this is definately a haiku issue

16:08 <phschafft> the address of a given variable changes per thread at runtime?

16:08 <jhj> on Android for example, the difference between threaded and non-threaded is less than 3%

16:08 <waddlesplash> jhj: I don't know about that. It may be a difference in how GCC generates code for -fPIC -shared

16:08 <waddlesplash> I don't really see how these methods could be optimized much more

16:08 <jhj> waddlesplash: Well well.

16:08 <waddlesplash> we could add a most-recently-accessed cache

16:08 <waddlesplash> but that's about it

16:08 <jhj> Clang build is 5 MIPS dor threaded.

16:09 <jhj> GCC is 1

16:09 <waddlesplash> and what's Linux on the same machine?

16:09 <jhj> Clang builds this code significantly slower as well.

16:09 <waddlesplash> so, Clang is much better about not calling the TLS routines probably then

16:09 <jhj> waddlesplash: Linux Clang gets about 7 without profiling

16:09 <waddlesplash> ok

16:09 <jhj> Linux GCC is about 10

16:09 <phschafft> I would maybe hint that it is not a problem in the software xor Haiku but most likely both could help. ;)

16:10 <waddlesplash> jhj: the differences really should not be so big. this really indicates you can optimize your code more

16:10 <jhj> But this wasn't always the case.

16:10 <jhj> I'm quite sure it wasn't this slow before I would have noticed.

16:11 <waddlesplash> possibly a change in GCC 13, no idea

16:11 <jhj> And our code is very optimized now, at least what does memory addressing. We've been keeping track of it over time.

16:11 <jhj> https://gitlab.com/dps8m/dps8m/-/wikis/Benchmarks-Historical

16:11 <jhj> These are all on the same machine.

16:12 <waddlesplash> the fact that Clang gives 8.5 and GCC gives 10.3 indicates that there are probably more things you could do

16:12 <waddlesplash> or it's worth digging into and reporting to Clang upstream

16:12 <jhj> Thismis very similar to building QEMU on the same machine.

16:13 <jhj> Clang builds are about 15-20% slower with TCG on the same machine.

16:13 <waddlesplash> jhj: there really isn't any way to optimize the get_tls routines here, they are pretty simple. Adding a thread-local cache of the last-accessed DTV is about all I can think of. But that wouldn't speed things up by much

16:13 <jhj> Doing a PGO build improves clang much more than it improves GCC also.

16:13 <phschafft> can you isolate the performance problem to a small section of code?

16:13 <waddlesplash> Not when there are tens of thousands of calls to this method

16:13 <phschafft> maybe you could diff the generated code from gcc and clang.

16:14 <waddlesplash> it's almost certainly just down to how often it does TLS fetches

16:14 <waddlesplash> let's see what the code is actually doing

16:14 <waddlesplash> jhj: actually I don't think we can really cache this. We have to recheck the "Generation" every time

16:14 <jhj> OK, so, with Clang, the difference between threaded and non-threaded versions is about 9%

16:14 <phschafft> what is the generation?

16:14 <waddlesplash> phschafft: TLS generation

16:15 <phschafft> hm.

16:15 <waddlesplash> https://android.googlesource.com/platform/bionic/+/HEAD/docs/elf-tls.md

16:15 <waddlesplash> this is for Android but it appears to detail a lot of what's going on

16:16 <jhj> Android only gives us a 3% difference or so between threaded and non-threaded, but we use Clang from the NDK there.

16:17 <jhj> On essentially every other system except macOS, GCC is faster, but the difference between the threaded and non-threaded be chmarks is constant.

16:18 <waddlesplash> it really shouldn't be

16:18 <waddlesplash> probably fixing the thread local accesses will make the threaded-but-one-thread benchmark just as fast as the regular benchmark on all OSes

16:22 <jhj> Even so, I've done this test on z/OS USS, AIX, Solaris, illumos, OpenBSD, NetBSD, DragonFly, FreeBSD, Windows, macOS, Android, and even oddballs like SerenityOS and GNU/Hurd, and Haiku is so far the only one that has had any weird performance issue.

16:22 _-Caleb-_ has left #haiku [#haiku]

16:22 <jhj> And I'm also extremely confident it wasn't always so, because I would have immediately noticed it when I initially did the port.

16:23 _-Caleb-_ has joined #haiku

16:23 <waddlesplash> yes, but I guarantee this is something that can be fixed on your end

16:23 <waddlesplash> #define cpu (* cpup)

16:23 <waddlesplash> this is the source of the problem, I bet

16:25 <jhj> That's the CPU state, which we do want to be local to each running thread.

16:25 <waddlesplash> yes

16:25 <waddlesplash> but this means you are doing a TLS access *EVERY TIME* you access the CPU structure

16:25 <waddlesplash> you should instead read it once at the beginning

16:26 <waddlesplash> and pass it through to all functions

16:26 <waddlesplash> probably Clang is just better at caching this variable, but GCC is more aware that it could change

16:27 <jhj> waddlesplash: On Linux with VTune, TLS access is <2% of the difference, and mutexes and pthread cond wait makes up the rest.

16:27 <jhj> with both gcc and clang

16:27 <waddlesplash> if you are using non-PIC then that may be the case

16:28 <waddlesplash> but I don't see how this could really be much faster on PIC

16:28 <waddlesplash> getting the compiler to cache it across function calls is about it

16:28 <waddlesplash> jhj: let's put it this way, even if we optimize things in Haiku more, you are still doing a function call every single time this variable is accessed, in the general case.

16:28 <waddlesplash> avoiding that will surely be faster everywhere

16:29 <phschafft> can you maybe replace *cpup with something like *get_cpup(); static inline get_cpup(void) __attribute__ ((pure)); static inline get_cpup(void) { return cpup; }

16:30 <phschafft> just to tell the compiler that it is safe to cache the value. and just as a test if that makes any difference.

16:30 <waddlesplash> how would that tell the compiler it can be cached?

16:30 <jhj> No, I juat cgecked and built with an explicit $ env LDFLAGS=-fPIC CFLAGS=-fPIC gmake

16:30 <phschafft> not fully sure if pure is the correct attribute.

16:30 <waddlesplash> jhj: do you link with -shared?

16:30 <jhj> on Linux, and the results are exactly the same

16:30 <waddlesplash> if you don't link with -shared, then the linker may collapse all the TLS

16:30 <phschafft> waddlesplash: it would tell the compiler that the result of get_cpup() is to be cached.

16:30 <phschafft> e.g. in loops.

16:30 <milek7> accessing var is surely also pure

16:31 <waddlesplash> yes

16:31 <waddlesplash> well.

16:31 <waddlesplash> I'm not sure what TLS accesses count as actually

16:31 <phschafft> milek7: I'm suspecting that gcc for some reason thinks it's not.

16:31 <jhj> waddlesplash: linkin what wirh -shared?

16:31 <waddlesplash> the whole binary

16:32 <waddlesplash> jhj: yeah, you access this variable across multiple contexts in separate files. no wonder this is slow

16:32 <waddlesplash> if you passed it through as an argument, definitely will be way faster

16:33 <waddlesplash> jhj: also, your incremental builds are even slower than Haiku's incremental builds. if they're even incremental at all?

16:33 <jhj> We arent building any shared libraries. You can't link a whole program with -shared

16:33 <waddlesplash> which is impressive

16:33 <waddlesplash> jhj: you can. Every program on Haiku is.

16:33 <waddlesplash> that's probably a large part of the difference

16:33 <jhj> Not on Linux.

16:33 <waddlesplash> not sure why not

16:33 <waddlesplash> but ok

16:33 mmu_man has joined #haiku

16:34 <phschafft> em?

16:34 vdamewood_ has quit [Quit: Textual IRC Client: www.textualapp.com]

16:34 <jhj> jhj@brshpc0 ~ $ gcc -o hello hello.c -fPIC -shared

16:34 <jhj> ./hello: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=61ffb2502e806b6960f6dfea0b99dd0bad977272, not stripped

16:34 <jhj> jhj@brshpc0 ~ $ ./hello

16:34 <jhj> jhj@brshpc0 ~ $ file ./hello

16:34 <jhj> [1] 1286441 segmentation fault (core dumped) ./hello

16:34 <waddlesplash> huh

16:34 <waddlesplash> wonde rwhy

16:34 <jhj> That's never worked on any OS ever

16:34 <waddlesplash> works on Haiku

16:34 <jhj> That Ikve ever seen.

16:34 <waddlesplash> so you are wrong

16:35 <waddlesplash> like I said, *every* application on Haiku is built with -shared

16:35 <waddlesplash> it's implicit in the compiler specs

16:35 <jhj> I can do it on AIX, Solaris, USS, etc.

16:35 <jhj> All the BSDa

16:35 <jhj> and prove it id you dont beleive me :)

16:35 <milek7> I think it's just entry point problem

16:36 <jhj> Even the docs say thats cor building objects or libraries and doesn't make an executable.

16:37 <jhj> You mean -pie?

16:37 <waddlesplash> it makes one on Haiku

16:37 <waddlesplash> no, again, on Haiku applications are linked with -shared too

16:39 <milek7> $ /lib/libc.so.6

16:39 <milek7> GNU C Library (GNU libc) stable release version 2.38.

16:39 <phschafft> milek7++

16:39 <phschafft> that was the example I was thinking about as well.

16:39 <waddlesplash> yeah

16:40 <jhj> Well, anyway, on Linux, a pie build is also no different in speed. I built just now explicitly, without fpic or pie, and the average benchmark was less than 1% faster

16:40 <jhj> So I think this is kind of a red herring.

16:40 <waddlesplash> PIE is still not -shared

16:40 <waddlesplash> Haiku doesn't even support PIE, iirc

16:41 <phschafft> I think FreeBSD was the only ELF based system I tested that enforced ELFs to be a executable *and* having a exactly the correct ABI set.

16:42 <phschafft> while all other systems just loaded it and started INIT.

16:44 <milek7> you could remove thread_local from that var and see if that changes anything

16:44 <waddlesplash> milek7: it does, it's way faster

16:44 <waddlesplash> that's what the "Single threaded" builds do

16:45 <jhj> waddlesplash: Going back to our archives, a build we made 2 years ago (https://dps8m.gitlab.io/dps8m-r3.0.0-archive/R3.0.0/dps8m-r3.0.0-haiku-x64.tar.gz) is also running slowly (benchmarks at ~1), and that was buit with GCC 11.3.0

16:45 <waddlesplash> well, I don't know what could've changed here, really

16:45 <milek7> but that build probably disables pthread stuff also?

16:45 <jhj> I should try tomfire up Haiku from 2022 or so.

16:45 <waddlesplash> are you sure it was really faster on older Haiku?

16:46 <jhj> I am going to check but I don't know how I could have not noticed.

16:48 <jhj> I'll download r1b4 ISO I guess

16:48 <jhj> That's 12/2022.

16:49 <jhj> I'm going to do this on my i5 laptop....

16:49 MohitKumar[m] has left #haiku [#haiku]

16:49 <jhj> ETA 4m

16:53 hightower2 has quit [Ping timeout: 480 seconds]

16:54 <jhj> Ah crap. No VMX on this machine, I'll see if I can do remote X to the machine upstairs.

17:00 <botifico> [haikuports/haikuports] Begasus pushed 1 commit to master [+2/-0/±0] https://github.com/haikuports/haikuports/compare/7e43a9c37f0d...75e1be01b6b7

17:00 <botifico> [haikuports/haikuports] Begasus 75e1be0 - breeze_icons, keep 6.2.0 around for 32bit (for now) (#10939)

17:01 <jhj> OK, remote X is fine, installing R1b4.

17:02 <milek7> GS is not used by libc, right? so you could stick that var here...

17:02 <waddlesplash> there are other ways to get around this

17:02 <waddlesplash> but really, cpu_state_t should just be passed around as a first argument to all functions

17:13 <jhj> waddlesplash: Well, r1b4 is also slow. It's a little quicker, but still significantly slow here, so maybe I was wrong.

17:14 <waddlesplash> yeah, that's what I expected

17:14 <jhj> But I don't know how practical it is to change this everywhere: https://dps8m.gitlab.io/dps8m/master/global/R/2779.html

17:14 <waddlesplash> you don't need to change it everywhere

17:14 <jhj> Especially since, like I mentioned, Haiku is the single outlier OS out of more than a dozen.

17:14 <waddlesplash> keep the macro as-is and pass the argument through instead of fetching it

17:15 <waddlesplash> okay, well, it would be interesting to test with -shared on Linux

17:15 <waddlesplash> apparently there is some way to make that work as milek7 showed above

17:15 <jhj> Still, I can't imagine we are the only affected application either.

17:15 <waddlesplash> I can't think of any other application I have encountered that uses thread-local variables like this

17:16 <waddlesplash> it's not what they're designed for

17:16 <waddlesplash> fetching them will always be slow, it's an implicit function call, unless the compiler manages to cache it

17:16 <waddlesplash> but it *CAN'T* cache it if you call across modules

17:16 nosycat has quit [Quit: Leaving]

17:16 <waddlesplash> so, every function call to a function that the compiler can't see into, it MUST refetch

17:16 <waddlesplash> after that

17:17 <waddlesplash> so no matter what this will be inefficient on all OSes, maybe the others have inline asm to cache the results and thus make this faster

17:17 <waddlesplash> but it's still not great

17:17 <jhj> waddlesplash: We link with LTO, so I don't understand why the compiler couldn't see things.

17:17 <jhj> We used to have some crazy concatentation system that we stole from Chrome or something.

17:18 <waddlesplash> webkit also did something like that yes

17:18 <waddlesplash> still

17:18 <jhj> Also, Clang on Haiku isn't using LTO, and it's 500X faster than than GCC build.

17:18 <waddlesplash> you are then depending on the compiler to do value propagation in a massive way

17:18 <jhj> So something isn't working like it should here.

17:18 <waddlesplash> that's not optimal

17:19 <jhj> waddlesplash: A debug build on webkit is easily 400% slower than the release builds :)

17:19 <waddlesplash> sure

17:19 <jhj> Our debug builds are always massively slower, but we do all sorts of additional things in them, so it's not exactly equivilant.

17:23 frkazoid333 has quit [Ping timeout: 480 seconds]

17:25 walkingdisaster has quit [Quit: Vision[]: i've been blurred!]

17:26 <jhj> We aren't calling any functions indirectly. We're just doing the equiv of 'modify_dsptw *cpup.TPR.TSR);' and such, where cpup is thread local, and never changes per thread.

17:26 <jhj> I'm not sure how Haiku is the only OS where this doesn't work.

17:26 <waddlesplash> a thread local variable access is an indirect function call

17:26 <waddlesplash> that's how it works

17:26 <waddlesplash> the compiler can sometimes cache the result of this indirect function call to avoid calling it multiple times

17:26 <waddlesplash> but it still has to call it

17:26 <jhj> I wonder if we can see why Clang is working fine and GCC isn't.

17:26 <waddlesplash> probably Clang manages to cache the result more

17:26 <waddlesplash> that's it

17:27 <jhj> But that isn't the case, because we use GCC on every other platform as well.

17:27 <jhj> In fact, we recommend using GCC, because it usually is somewhat faster than the alternatives.

17:27 <waddlesplash> again, there are different TLS models here

17:27 <waddlesplash> if you are using a static or PIE executable, the compiler can optimize more

17:28 <jhj> waddlesplash: Even when I explictly do not, the results don't change.

17:28 <waddlesplash> probably because e.g. it knows this won't be unloaded like a shared library could be, I think

17:28 <jhj> And this is with the global-dynamic model.

17:28 <waddlesplash> but you didn't test on Linux with -shared

17:28 <waddlesplash> there are 4 different TLS models

17:29 <waddlesplash> well, it's possible our compiler is somehow configured differently but I don't know how. clearly Clang manages to figure something out

17:29 <jhj> I'm aware, we explicltly set -ftls-model=global-dynamic on Linux.

17:29 <jhj> At least, I did when I just tested.

17:30 <jhj> I even tested the same version of GCC (13.3.0) on Linux and Haiku.

17:30 <waddlesplash> do you set that on Haiku?

17:30 <jhj> I didn't, but I can. Let me see.

17:31 <phschafft> also gccs support *for* Haiku might be less good than for Linux. So maybe it can do some more magic on Linux.

17:32 <jhj> phschafft: Well, not just Linux, we recommend GCC everwhere except for macOS and AIX.

17:32 yann64 has joined #haiku

17:32 <jhj> And on AIX, we recommend IBM's expensive commercial compiler if available, but GCC does just fine.

17:32 <jhj> There is mainline LLVM support for AIX now because IBM ported Rust, I haven't benchmarked mainline Clang there yet tho.

17:33 <phschafft> jhj: and how does that change my statement?

17:33 <jhj> OK, I'm building now with global-dynamic forced.

17:34 <jhj> phschafft: Oh, it doesn't change it exclusively, but just generalizing, that isn't not a Linux-specific improvement.

17:35 <jhj> waddlesplash: About to test in a moment. I assumed that global-dynamic is the default, but I'm probably wrong because Haiku is weird. :)

17:37 <jhj> It's not done yet, so it's going slowly. I'll see about forcing the other models, just to see what we get.

17:37 <jhj> Once it finishes.

17:38 <phschafft> jhj: My argument was more about that you compare compiler support for Haiku with support of other systems that have tens to tens of milions more users.

17:38 <phschafft> so likely there is better support in gcc for those systems.

17:39 <milek7> 9,242567 MIPS

17:39 <milek7> 3,466251 MIPS

17:39 <milek7> that's on linux gcc

17:39 <jhj> My assumptions are that this is tied closer to ELF than anything more OS specific, but my assumptions are probably wrong.

17:39 <waddlesplash> nope, it's all OS specific

17:39 <waddlesplash> ELF specifies where the TLS needs to be stored and it is up to the OS to deal with that

17:39 <waddlesplash> and up to the compiler to call the OS to get the storage location

17:39 <milek7> second one is shared library (and main dlopened from another binary)

17:40 <waddlesplash> jhj: well, look at that!

17:40 <waddlesplash> milek7 proves it's not Haiku :D

17:40 <waddlesplash> Linux is faster than we are, sure, but it's still a massive performance hit

17:40 <jhj> Yeah, interesting. That's still about 60% slower.

17:41 PetePete has quit [Ping timeout: 480 seconds]

17:41 PetePete has joined #haiku

17:45 PetePete has quit [Read error: Connection reset by peer]

17:46 <jhj> milek7: What happens if you build with -ftls-model=initial-exec in that case?

17:47 <jhj> You'd need to link dps8 at link time vs dlopen though.

17:52 <jhj> runtime_loader: Static TLS model is not supported.

17:52 <jhj> runtime_loader: /boot/home/src/dps8m/src/dps8/dps8: Troubles handling dynamic section

17:52 <jhj> Hah.

17:52 <waddlesplash> yes

17:52 <waddlesplash> we don't support that

17:52 <jhj> waddlesplash: Does that apply even to a statically linked binary?

17:52 <waddlesplash> we don't support statically linked binaries

17:53 <waddlesplash> you must dynamically link at least to libroot.so

17:55 <jhj> Is the local-exec model supported?

17:56 <waddlesplash> I don't know, but I suspect not?

17:56 <waddlesplash> depends on what exactly that implies

17:57 gouchi has joined #haiku

17:57 _-Caleb-_ has left #haiku [#haiku]

17:58 <jhj> "/boot/system/develop/tools/bin/../lib/gcc/x86_64-unknown-haiku/13.3.0/../../../../x86_64-unknown-haiku/bin/ld: /tmp//cc0yw5HI.ltrans0.ltrans.o: relocation R_X86_64_TPOFF32 against symbol `cpup' can not be used when making a shared object"

17:58 <jhj> Yeah, apparantly not.

18:00 _-Caleb-_ has joined #haiku

18:03 <milek7> jhj: I think it will work with LD_PRELOAD

18:03 hightower2 has joined #haiku

18:04 <milek7> 5,709061 MIPS

18:04 <jhj> Ah, faster, but still not as fast as the regular build.

18:04 <jhj> Thanks.

18:06 frkazoid333 has joined #haiku

18:10 frkzoid has joined #haiku

18:16 frkazoid333 has quit [Ping timeout: 480 seconds]

18:21 <jhj> waddlesplash: Thanks, I guess I'll play with some solutions, but maybe I can just not use thread-local storage for cpup at all.

18:21 <waddlesplash> jhj: I'm working on seeing how feasible that is

18:22 <jhj> Well, I could do it like we do for ROUND_ROBIN

18:22 <waddlesplash> but yes, that's what should be done

18:22 <waddlesplash> how's that?

18:22 <jhj> That's a debugging feature we have, it runs all the CPUs in 1 thread.

18:22 <waddlesplash> you just need to pass a state variable all the way through to all CPU functions

18:22 <jhj> It's there to ensure reproducibility, it runs one instruction and then goes to the next CPU.

18:23 <gordonjcp> jhj: Honeywell DPS8?

18:23 <jhj> yessir

18:24 <gordonjcp> jhj: I'm probably one of the youngest to use CP6 at Robert Gordon's University in Aberdeen, it was just getting pulled out the year I started (1991(

18:25 <gordonjcp> jhj: I have somewhere got some printouts of manuals for it, and a mate of mine from school who's about three years older than me wrote a simple mail system for it after the admins shut the built-in one off :-)

18:25 <gordonjcp> jhj: I actually had access to it before I was at uni, although I was very very much not supposed to

18:32 <jhj> gordonjcp: Actually, if you have any CP-6 stuff, feel free to join our Slack (ugh

18:32 <jhj> and let us know what you have.

18:32 <jhj> We very much would run CP-6, if we had access to it.

18:33 <jhj> CP-6 uses NSA (New System Architecture), which is the VU (virtual memory unit) instead of the AU (appending unit), which is the difference between the DPS-8/C and DPS-8/M spec CPUs.

18:33 <jhj> GCOS-8 also uses VU/NSA.

18:33 <jhj> We have the CP-6 source code, but not any binary tapes, and no access to a PL/6 compiler.

18:34 <jhj> We don't currently support the VU, but we have more than enough documentation that it could be implemented, if we had something to run on it.

18:34 <jhj> Until then, you are stuck using Multics (or GCOS-3) with us.

18:35 <jhj> CP-6 ran on some additional hardware (the DPS-88 and 9000, IIRC) and by adding VU support we could support those CPU types.

18:36 <jhj> Well, the 9000 (and the later systems as well, like the NovaScale, ACOS systems, M9600, etc.) have some additional CPU featuers like vector instructions.

18:36 <jhj> But CP-6 doesn't use them.

18:37 <jhj> gordonjcp: Anyway, I'm getting quite off the Haiku topic, but, we will have a new blog entry coming soon at https://dps8m.gitlab.io/blog/ where we'll give a status report.

18:38 <jhj> It'll include begging for more CP-6 materials. We've already reached out to like 30 academic sites that ran CP-6 and haven[t had any luck so far. :(

18:38 <jhj> waddlesplash: A very dirty removal of thread-local storage changes the 7% speed difference to a 3% speed difference for me.

18:38 <jhj> on Linux.

18:39 * phschafft wonders if one could make a per thread overlay memory map that would map some CoW memory area into the area for TLS.

18:39 B2IA has quit [Ping timeout: 480 seconds]

18:40 <jhj> The remaining difference of 3% could be made much faster for the 1-thread case, if we don't do call any SCU/IOM locks until we've actually started >1 thread.

18:40 <jhj> waddlesplash: anyway, thanks again.

18:41 <jhj> Unfortunatley, I'm not super motivated to immediately fix it for what amounts to a 3.5% speed improvement on all other platforms (but a 75% speed improvement on Haiku).

18:42 <waddlesplash> I bet it will be more than that if the CPU field is passed in a register rather than requiring a global read

18:42 Anarchos has joined #haiku

18:43 <jhj> phschafft: something like that could possibly be made to allow supporting the local-exec model.

18:43 <jhj> But I have no idea what the implications are of all programs also being shared objects on that, and it would need someone way smarter than me who actually knows Haiku internals.

18:44 <milek7> waddlesplash: it might be balanced out by spilling more arguments onto stack

18:44 <waddlesplash> maybe, but a lot of this will probably get ilined

18:44 <waddlesplash> so it may not matter

18:46 <phschafft> jhj: oh, my comment wasn't about Haiku at all. just a thought how this could be done.

18:46 <Anarchos> is it possible to compile on haiku x86_64 for haiku x86_gcc2 ?

18:47 <jhj> Optimizations are interesting.

18:48 <phschafft> also such a per thread overlay could allow for ther kind of nice features. such as protection of memory between threads. I mean in an ideal world memory is only writeable if declared so. and only writeable by other threads if declared so. and only writeable by other processes if declared so, ... ;)

18:48 <milek7> I think "per thread memory map" is also called a "process" :D

18:48 <jhj> In the past, especially with Clang <14 and GCC <12, it made a pretty big difference to build dps8 with -march=native on AVX2 capable machines.

18:50 <jhj> Now, it usually results in a build thats about 1-2% slower, while the overall, the baseline binaries (even on old checkouts before we made other optimizations) are faster.

18:50 <phschafft> milek7: not really. there are a lot of other things you can share.

18:50 <phschafft> such as the file descriptor table.

18:50 <jhj> It's improved to the point where I'm not going to bother offering any AVX2 or greater optimized builds any longer.

18:51 <phschafft> or basically any other structure the kernel has about you.

18:55 <gordonjcp> jhj: I definitely don't have any CP6 media

18:55 <gordonjcp> jhj: I wonder how hard a PL/6 compiler would be?

18:55 <jhj> gordonjcp: Find someone that has CP-6 tapes hidden away and point them to us! We can recover 9-track tape data.

18:57 <jhj> gordonjcp: A new PL/6 compiler would be non-trivial to say the least, even with our existing 6000-series PL/I backend.

18:58 _-Caleb-_ has left #haiku [#haiku]

18:58 _-Caleb-_ has joined #haiku

19:03 <jhj> But even if we could recreate the compiler, and build the entire operating system, there isn't any guarantee that the compiler we create would have the same of similar code generated as what exists on real media, with all the corresponding quirks.

19:04 <jhj> *or similar;

19:05 <jhj> Like, maybe the real distributed tapes have generated code for instructions we aren't emulating (or aren't emulating correctly).

19:05 Anarchos has quit [Quit: Vision[]: i've been blurred!]

19:06 B2IA has joined #haiku

19:19 Begasus_32 has joined #haiku

19:24 B2IA has quit [Quit: Vision[]: i've been blurred!]

19:24 B2IA has joined #haiku

19:28 Anarchos has joined #haiku

19:28 <jhj> gordonjcp: Oh, for Multics, for example, only changed parts of the system were actually rebuilt, or, if there was a compiler bug that was identified, the identified miscompilations were rebuilt.

19:29 <jhj> gordonjcp: We actually have a specific optimization for some inefficient code generated by old versions of the Multics PL/I compiler in dps8m.

19:42 <Skipp_OSX> Multics is the primary example of why central planning does not work.

19:44 <Skipp_OSX> Multics was a centrally planned OS, Unix the rogue decentralized OS. The centrally planned OS died an ignominious death like centrally planned nation-states, while the decentralized Unix and capitalist states thrive.

19:47 <Anarchos> how to configure for a x86_gcc2 build ?

19:47 <Anarchos> " ../configure --build-cross-tools x86_gcc2 --build-cross-tools x86_64 --cross-tools-source ../../buildtools/ --use-gcc-pipe -j5" did not work

19:50 <jhj> Skipp_OSX: That's not really true. All of the features that were removed from Multics were just added back to Unix later, like volume management and such.

19:50 <jhj> Skipp_OSX: Multics had a very long commercial history, and the 6000-series mainframes are still currently sold.

19:51 <jhj> https://multicians.org/myths.html

19:51 <Skipp_OSX> Nothing you said invalidates my point.

19:52 <jhj> Skipp_OSX: Multics died because of the Palyn Report not being accepted and development being wound down in favor of GCOS.

19:52 <jhj> https://multicians.org/palyn-report.html

19:53 <jhj> And it isn't like Multics is completely gone, we are releasing the next version of it probably before the year is done, and we are working on new hardware as well that implements the classic 6000-series CPU.

19:53 <Skipp_OSX> The very name "Unix" is a play on words for "eunuchs", meaning a "cut-down" version of Multics.

19:55 <Skipp_OSX> Sure, central planning does not go away, it languishes on, but it is not the source of real economic innovation.

19:55 <waddlesplash> jhj: http://0x0.st/XtWO.patch

19:55 <jhj> I wouldn't say that it died an ignominious death at all either, as it was successful for its market segment. You have to remember that the PDP-7 that Unix was initially developed for was a machine that cost $72K in 1965. The 6000-series machines that Multics targeted were much larger systems, which were multimillion dollar installations.

19:55 <waddlesplash> jhj: this compiles and runs, and with it I get 5.425542 MIPS for the benchmark on Haiku

19:55 <waddlesplash> doubtless I missed stuff in #ifdefs

19:56 <jhj> For example, one customer was the USGS which ran both Multics and Unix, and the Unix systems were mostly under $1M and the Mutlics systems cost about $40M.

19:56 <jhj> This wasn't just cost for the sake of cost, these were much larger systems.

19:56 <Skipp_OSX> I guess you're right that is too far, it languished into obscurity, but did not die an ignominious death.

19:56 <jhj> waddlesplash: Awesome, I'll look in a bit!

19:56 <gordonjcp> Skipp_OSX: also I'd disagree with the idea that capitalist states thrive

19:57 <waddlesplash> jhj: it will be interesting to see what the performance difference is on Linux, if this improves things at all

19:57 <jhj> Skipp_OSX: The main reason why it languished was management though, because your "central planners" picked another winner.

19:57 <Skipp_OSX> only by accident, the inefficiency of the capitalist system is what allows it to thrive, engineers allowed to work on their own independent of the central planners, like Unix vs. Multics.

19:58 <jhj> Skipp_OSX: If Honeywell would have released it under similar terms as Unix, it likely would have won in the long run.

19:58 <Skipp_OSX> perhaps, but the deal with AT&T had already fell

19:58 <waddlesplash> jhj: besides whatever I missed in disabled #ifdefs, you will have to check the logic in the main loop after setCPU goto I think... not sure if I got that right

19:59 <waddlesplash> it works for one CPU anyway

19:59 <waddlesplash> but with multiple CPUs, no idea

20:01 <jhj> Skipp_OSX: The other impediment was that the hardware required to run Multics was protected by various patents or trade secrets. Those patents have expired and we were able to obtain internal documentation from Bull, but only after a very long time.

20:01 <jhj> waddlesplash: I'm going to just benchmark it on Linux as is first, before I start fixing things further.

20:01 <gordonjcp> jhj: Data General were even worse, almost nothing exists of any media or hardware from them

20:01 <Skipp_OSX> Which is not a problem for a couple of rogue engineers with spare hardware and little supervision, and that's my point.

20:02 <jhj> GE -> Honeywell -> Honeywell/Bull -> Bull -> Atos, so thankfully there is clear ownership

20:02 <jhj> Atos still makes 6000-series systems, they don't support only NSA/VU and not AU, so they can't run Multics.

20:03 <jhj> Skipp_OSX: Multics and the 6000 AU were designed together, as a system, not an existing system and then added software.

20:03 <Skipp_OSX> Central planning created the B-2 Bomber and the Space Shuttle, and Multics expensive technology with little mass-market appeal.

20:03 <dovsienko> on this note, OpenVMS is a current product that even runs on x86 (not for free), and the easiest way to try RISC OS is by running it on a Raspberry Pi

20:04 <dovsienko> the latter being a very nice example of what Haiku could do

20:04 <jhj> Skipp_OSX: The intention was never to make it a mass market product, it was intended to be a computing utility service, that is, something similar to telephone service or water or sewage.

20:04 <jhj> It was cloud computing befofe it was popular.

20:04 <Skipp_OSX> Sure B-2 Bomber and the Space Shuttle and Multics have their place, but I'm happy that we live in a society that allows the freedom for engineers to work outside those areas, even if it is an accident of history.

20:05 <Skipp_OSX> Unix was never intended to be a mass-market product either, but here we are.

20:05 <dovsienko> jhj: what do CP-6 tapes look like physically?

20:05 <jhj> I mean, there is efficency in having a power grid and centralized power stations, vs. everyone doing their own generation. Even with solar and the like.

20:06 <jhj> dovsienko: They'd look like https://computer-convert.com/index_htm_files/22987.jpg

20:06 <Skipp_OSX> My point is the more change happened on accident than on purpose and that is a feature of capitalism that is lost in a centrally planned system.

20:07 <jhj> dovsienko: For CP-6, it is actually possible that the boot tape would have been an image on a CD-ROM at the end of its life.

20:07 <Skipp_OSX> It's more an accident of capitalism than a feature, a bug that the capitalists would fix if they could, but can't.

20:07 <jhj> Source code browser was distributed on CD for example.

20:09 <dovsienko> Skipp_OSX: I have worked for genuine capitalist businesses that wiped their bottoms with valuable accidents on a regular basis, and it seems to be the rule rather than an exemption

20:09 <Skipp_OSX> well there you go

20:10 HaikuUser has joined #haiku

20:10 HaikuUser has quit []

20:13 <dovsienko> jhj: I will try to remember next time I am next to old hardware

20:16 <jhj> waddlesplash: So, I didn't test multiple CPUs let, I'm going to go through the patches, but, I'm going to benchmark it now on my machine, vs. existing master (Linux)

20:16 <waddlesplash> very good

20:17 <waddlesplash> I expect we will at least get that 3.5% improvement, and possibly a lot more depending

20:17 <jhj> I'll do the full (profiled) release build of each and average 3 runs.

20:19 jmairboeck has quit [Quit: Konversation terminated!]

20:19 hightower2 has quit [Remote host closed the connection]

20:19 hightower2 has joined #haiku

20:20 <jhj> dovsienko: https://en.wikipedia.org/wiki/9-track_tape

20:25 <Skipp_OSX> I am so close to getting these menu fields to display right: https://0x0.st/XtWU.png it is now truncating at the right spot, but why is there no drop down arrow?

20:25 <jhj> waddlesplash: almost done, last run.

20:25 Begasus_32 has quit [Quit: Vision[]: Gone to the dogs!]

20:26 <Begasus> k, done for today

20:26 <Begasus> cu peeps!

20:26 <waddlesplash> Skipp_OSX: why doesn't the layout system handle this properly?

20:26 Begasus has quit [Quit: Vision[]: i've been blurred!]

20:26 <Skipp_OSX> because we're not using it I suppose, there are regular menu fields

20:26 <waddlesplash> ?

20:26 <waddlesplash> the Find panel itself uses layouts

20:26 <waddlesplash> so how or why does the layout manager size things wrongly?

20:26 <Skipp_OSX> yeah it does that's true, I'm using SetExplicitSize() idk

20:27 <waddlesplash> you should not need to

20:27 <waddlesplash> something else has gone wrong if we need SetExplicitSize

20:27 <waddlesplash> try SetExplicitMinimumSize() with something small

20:27 <Skipp_OSX> What is the proper way to limit a menu field width?

20:27 <waddlesplash> it's possible the view just has a too large computed minimum size and we should force a smaller one

20:28 <waddlesplash> ... oh, ResizeMenuField() is calling SetExplicitSize already

20:28 <waddlesplash> why does it do that?

20:28 <waddlesplash> can we get rid of that entire function?

20:28 <Skipp_OSX> to limit the width of the menu field

20:28 <waddlesplash> tbh I would either kill this function entirely, or call SetExplicitMaxSize rather than SetExplicitSize

20:29 <Skipp_OSX> We could, and it would display correctly, but the menu field would be much wider than it currently is, it would be wide enough to fit the longest string

20:29 <waddlesplash> then use SetExplicitMaxSize rather than SetExplicitSize

20:29 <waddlesplash> the layout manager will then be able to do the right thing and pick a smaller size

20:29 <Skipp_OSX> but then the menu field is too narrow :/

20:29 <waddlesplash> ?

20:29 <Skipp_OSX> and then SetExplicitPreferredSize but that doesn't work either

20:29 <waddlesplash> why not?

20:30 <Skipp_OSX> idk

20:30 <Skipp_OSX> I tried that, same result if I set the preferred size...

20:31 <Skipp_OSX> let me try just setting the max content width and not SetExplicitSize() maybe that will work...

20:35 <waddlesplash> Skipp_OSX: looks like ShowOrHideMimeTypeMenu() may be interfering

20:36 deneel has joined #haiku

20:37 <Skipp_OSX> I don't think it is... but maybe let's see

20:37 <waddlesplash> yes, I think it is

20:37 <waddlesplash> because ExplicitMinSize is interfering too

20:38 deneel has quit []

20:39 <jhj> waddlesplash: On Linux/GCC/glibc without your patch I get 12.0895 MIPS, and with it I get 12.3870 MIPS. That's a 2.46081% increase.

20:39 <waddlesplash> any difference with Linux/Clang/glibc?

20:40 <jhj> waddlesplash: Let me check.

20:43 <waddlesplash> Skipp_OSX: there is definitely some bug with SetExplicitMaxSize. I see what you are talking about, it's got way too much space around it in that case

20:43 <waddlesplash> Something looks wrong there. Not sure what

20:44 <waddlesplash> whether it's a layout bug or elsewhere, but definitely there's a bug

20:44 <Skipp_OSX> yeah, it's annoying

20:44 <waddlesplash> well, it may be worth debugging that

20:45 <Skipp_OSX> well _BMCMenuBar_ (the class inside menu field that handles the internal menu bar) is handling the insets correctly, but menu item is not once you SetMaxContentWidth

20:46 <waddlesplash> I think the menu field isn't handling explicitly set sizes propelry?

20:46 <waddlesplash> properly

20:46 <waddlesplash> either way there is clearly some bug

20:47 <Skipp_OSX> yeah... I've deep dived into code trying to fix but it is elusive

20:47 <waddlesplash> step a debugger through the layout code?

20:47 <waddlesplash> clearly the menufield thinks it is larger than it actually is

20:47 <waddlesplash> and renders something larger than its actual size

20:47 <waddlesplash> so, something is not checking a value somewhere

20:48 <waddlesplash> jhj: if this was a 2.5% performance win by itself, I wouldn't be surprised if more wins are possible with a few more minor changes. Note that there are a few places where I didn't bother passing the cpup, those could be further refactored, but they looked like "cold" areas (fault handler etc.)

20:52 <jhj> waddlesplash: Linux/Clang/glibc without patch: 10.8738, with patch: 12.1210, so +11.4698% increase. But it's still slower than the GCC build -2.14741%.

20:52 HaikuUser has joined #haiku

20:53 HaikuUser has quit []

20:54 Anarchos has quit [Quit: Vision[]: i've been blurred!]

20:55 mmu_man has quit [Ping timeout: 480 seconds]

20:55 <jhj> waddlesplash: I bet this will hurt some things that are register starved, but thankfully there aren't many of those platforms.

20:55 <jhj> i586 and such, but performance on 32-bit machines sucks already.

20:56 <jhj> It's no fun emulating a 36-bit/72-bit machine on a 32-bit machine :)

20:56 <jhj> Also for the 32-bit machines, we have to do our own 128-bit math, which is probably slower than what the compiler provides.

20:56 <waddlesplash> doesn't clang at least support int128 on 32bit?

20:57 <jhj> Oh, I'd have to check. We have a define (NEED_128) which uses our code instead of the compilers, or if it isn't provided.

20:57 <waddlesplash> jhj: if Clang has that much of an improvement on Linux it probably at least has something of an improvement elsewhere

20:57 <jhj> I didn't before, so if it does now it must be recent.

20:57 <waddlesplash> but yes, this patch overall should reduce the compiler-dependent variations in code performance

20:58 <waddlesplash> jhj: if you are using LTO then it may not matter much for register starved things, we will just spill onto the stack, which will then be a memory access same as it was before

20:58 <waddlesplash> jhj: interesting that the Clang build is now faster than the GCC one was pre-patch

20:59 mmu_man has joined #haiku

20:59 <jhj> waddlesplash: We use LTO everywhere we can, yes.

21:00 <jhj> waddlesplash: We did some benchmarks of the same crappy Rpi ARM SBC in 32-bit vs. 64-bit mode, and it was about half the speed in 32-bit.

21:00 <jhj> Which is sort of expected. :)

21:01 <dovsienko> jhj: the only time I saw an IBM 360 was in a corner of a museum

21:01 <waddlesplash> makes sense

21:01 <jhj> dovsienko: There's a cool book about all the IBM 360's joining together to take over.

21:02 <dovsienko> a la SkyNet?

21:02 <jhj> https://en.wikipedia.org/wiki/The_Adolescence_of_P-1

21:02 <jhj> yep :)

21:03 <jhj> > told that it has taken over almost every computer in the US (somewhat dated with 20,000 mainframes with a total of 5,800 MB), and is now fully sentient and able to converse fluently in English.

21:03 <jhj> :)

21:03 <phschafft> jhj: was wondering about having some tape device. just for decoration. but wow are they expensive.

21:04 <jhj> waddlesplash: Obviously we wrote the code first for single threaded and then changed it to be multithreaded long after.

21:04 <jhj> The memory access code is pretty finely tuned, and was hard to get right, done with proper atomics and such.

21:14 <jhj> waddlesplash: I don't know why Clang is generally lower performance everywhere else vs. GCC.

21:14 <Skipp_OSX> "The content area is where the item label is drawn; it excludes the margin on the left where a check mark might be placed and the margin on the right where a shortcut character or a submenu symbol might appear."

21:14 <jhj> waddlesplash: for this particular code.

21:14 <Skipp_OSX> https://www.haiku-os.org/legacy-docs/bebook/BMenu.html#BMenu_SetMaxContentWidth

21:15 <waddlesplash> jhj: clang is also using LTO+PGO?

21:15 <Skipp_OSX> ok, so that answers that question, the menu item content width should not include padding even if you call SetMaxContentWidth

21:15 <jhj> waddlesplash: thanks for looking at it, I had no idea that Haiku was loading *everything* as a shared object, which is what kind of threw me off and I didn't quite understand what you were talking about at first :)

21:15 <waddlesplash> yep

21:15 <jhj> I didn't even know such a thing was in the realm of possibility :)

21:16 <waddlesplash> we do this for a number of reasons, but one of them is that any application can have replicants

21:16 <jhj> waddlesplash: Yes, that was with LTO+PGO also.

21:16 <waddlesplash> which need to be loaded as shared objects into Deskbar and Tracker

21:16 <waddlesplash> so applications may need to be loaded as shared objects

21:16 <waddlesplash> hence, everything's built as shared

21:16 <waddlesplash> there are other reasons I think, too

21:16 <Skipp_OSX> https://git.haiku-os.org/haiku/tree/src/kits/interface/MenuItem.cpp#n418 <= so this is wrong (I think) the padding should be subtracted in both cases.

21:16 <jhj> It is an interesting choice, but I do have to wonder if there are other applications besides ours that have a performance hit.

21:17 <jhj> and wonder if there is anything that can be done to work around it in a more general sense.

21:17 <jhj> waddlesplash: for our released binaries that we build for Linux on our website, we actually build musl libc with LTO too.

21:18 <jhj> For Linux, we can actually target 2.6.x (next release we are going to target 3.2.x though) by itself and ship fully static binaries, and we get quite a bit of performance improvement by being able to inline libc functions directly into our code.

21:18 <jhj> Especially for an emulator in general.

21:19 <jhj> It's about a 15% improvement.

21:20 <milek7> what are libc functions doing so much in emulator?

21:20 <jhj> We've had a couiple people ask about why our binaries are faster than what they build, and we explain we build various toolchains with crosstool-ng with LTO.

21:20 <waddlesplash> jhj: it may be less of one now

21:21 <waddlesplash> libc functions should get out of the hot path now after not using _thread_local everywhere

21:21 <jhj> milek7: Well, libm stuff more so.

21:21 <jhj> Running numeric benchmarks in the simulator for example.

21:21 <waddlesplash> function calls are pretty fast on x86_64

21:22 <waddlesplash> however, without full LTO, the compiler can't know that those functions don't mess with global state

21:22 <waddlesplash> so the performance improvement may have again been _thread_local caching

21:22 Manboy has quit [Ping timeout: 480 seconds]

21:22 <jhj> Possibly, but for numeric code we did get a pretty decent improvement brining in a static libm alone.

21:22 deneel has joined #haiku

21:22 <waddlesplash> hm, ok

21:22 <jhj> We used to have builds that used Julia's Openlibm code and static linked it.

21:23 <jhj> But even a 15% or so improvement in the math code wasn't worth the hassle/complexity to the build system.

21:23 <jhj> (for most users)

21:24 <jhj> also there were some things where glibc or musl libm was faster than julia openlibm

21:25 <jhj> waddlesplash: something I need to do is figure out how to run our code through cachegrind or what not on ARM and see what we can do to improve it that way.

21:25 <waddlesplash> yes

21:25 <waddlesplash> well, "perf record" is probably better

21:26 <jhj> We have a special build that's not documented well (PERF_STRIP), that builds ONLY the CPU code alone.

21:26 <jhj> No threading, no I/O, no SCU/IOM, no devices, no IO, just the CPU.

21:26 <jhj> Which we have profiled the heck out of.

21:26 <jhj> But it's too slow to do anything non-trivial in valgrind.

21:27 <jhj> waddlesplash: In fact, Multics won't even boot on a system that is considerably slower than a GE-645, because of various timeouts that exist in the code and aren't wrong.

21:27 <waddlesplash> again, "perf record"

21:28 <waddlesplash> doesn't slow down things nearly as much because it's not an emulator

21:28 <waddlesplash> alternatively, "Very Sleepy" for Windows builds is a very nice profiler

21:28 <jhj> waddlesplash: problem is that for an optimized build, a lot of stuff is going to be inlined and constant propgated and cloned tho, so I'm not sure how much it is going to reflect what the code really looks like.

21:28 <jhj> I guess I could just try it :)

21:29 <jhj> I should create an somewhat less optimized build for this task, between our release and testing builds.

21:29 <waddlesplash> perf record should work just fine on ReleaseWithDebugInfo builds

21:30 <jhj> TESTING=1 does stuff like use gcc or clangs trivial-auto-var-init to set a pattern for all memory, etc.

21:30 <jhj> and it's quite slow because it's built with -fno-inline too

21:38 <jhj> waddlesplash: I already have an issue open about it but, if you have any pull with this project, or with PulkoMandy etc, maybe we can get getconf into the system for R1b5?

21:38 gouchi has quit [Remote host closed the connection]

21:38 <jhj> It's needed for POSIX conformance anyway!

21:39 <PulkoMandy> It's available in the depot if you need it, I think?

21:39 <jhj> PulkoMandy: It is, yes!

21:40 <PulkoMandy> But yes, maybe it should be part of the development feature included in the nightlies

21:40 <jhj> But Haiku nicely includes essentially everything to build most anything I've made in the default installation with the optional components installed, sans that.

21:42 <jhj> It becomes a problem because I *extensively* use things like: "env PATH="$(command -p getconf PATH)" grep" or "env PATH="$(command -p getconf PATH) sed" (and awk)

21:43 <jhj> Because there are still horrible systems out there like Solaris.

21:43 <jhj> jhj@solaris1:~$ env PATH="$(command -p getconf PATH)" sh -c 'command -v awk'

21:43 <jhj> jhj@solaris1:~$ command -v awk

21:43 <jhj> Returns: "/usr/xpg4/bin/awk" and "/usr/bin/awk", for example.

21:43 <jhj> And the default awk isn't fully POSIX conforming.

21:45 <jhj> On Solaris (but thankfully mostly not on illumos systems like OpenIndiana), the default awk, grep, find, sed, and sort, in particular, aren't POSIX conforming.

21:47 <jhj> So every makefile I write does something like "AWK?=env PATH=$(command -p getconf PATH) awk".

21:47 <jhj> Or more often, something worse like 'AWK?=env PATH=$(shell command -v gawk 2> /dev/null || env PATH="$(command -p getconf PATH)" awk'

21:48 <jhj> missed a parens there, but you get the idea.

21:48 <jhj> PulkoMandy: I need this for at least Solaris, AIX, z/OS USS, and something else I forget offhand.

21:49 <jhj> thus all stuff fails to build out of the box on Haiku without installing getconf first :)

21:50 <waddlesplash> why not switch to some other build system? :P

21:51 <jhj> waddlesplash: GNU make (with POSIX shell) is essentially universal and available on every system we need to target.

21:51 <jhj> where cmake or whatever isn't, and other ones require python or what not.

21:53 <milek7> mutexes in advanceG7Faults take around 9%

21:53 <milek7> isn't that somewhat high?

21:53 <jhj> waddlesplash: I've given in a bit, on a couple projects I maintain, I do have cmake support along side GNU make.

21:54 <jhj> milek7: Possibly, but the G7 faults code is complicated and called often and requires locking the SCU

21:55 <jhj> the group 7 faults processing is also tied into the appender unit

21:56 <jhj> milek7: https://imgur.com/0HF0AYz is the flowchart for the appender

21:57 <jhj> and, in fact, that flowchart which derives from the official documentation has 3 bugs :)

21:58 <jhj> two typos and it omits checking the ring brackets

21:59 <jhj> milek7: What kind of system did you see that usage on? The mutex seem to only take about 4-5% here. 9% seems high.

22:01 <milek7> x86_64 linux

22:03 <jhj> milek7: In case you are remotely interested, https://imgur.com/zVEIeeq

22:03 <milek7> I have no idea what G7 is or what it does

22:03 <milek7> but that

22:03 <milek7> if (!cpu.g7FaultsPreset || !cpu.FFV_faults_preset) return;

22:03 <milek7> sure seems like performance win :D

22:03 <jhj> the group 7 faults include the TRO, the timer runout

22:04 <milek7> *if (!cpu.g7FaultsPreset && !cpu.FFV_faults_preset) return;

22:04 <jhj> Also, faults can fault.

22:04 <jhj> at least "upward"

22:06 <jhj> the TRO is a 512khz timer, which is presented in a dedicated 27-bit register, which indicates a TRO G7 fault every rollover

22:07 <jhj> milek7: I'll look at that in a moment. I'm still going through a bazillion functions to pass the cpup address through :)

22:10 juanjo has left #haiku [Error from remote client]

22:15 <jhj> milek7: We have a full factory test and diagnostic tape that passes (and it took years to get it to pass) which I'll have to verify too once I make sure all the configurations build.

22:15 <jhj> after making changes.

22:16 <jhj> The factory diagnostics, which we don't have the source code to, say things like "CHECK CPU BOARD 7 BACKSIDE" or "CALL FOR FACTORY MODU 7 ENH ERATTA" which aren't always helpful to figure out what they want.

22:18 <jhj> In some cases it was because the documentation we had was from a 1985 revison of the CPU and the diagnostics we have are from 1987, and some fixes had been applied.

22:18 <jhj> The fun of old machines.

22:20 <Skipp_OSX> ok progress I got ExplicitMaxSize instead of ExplicitSize to work but still no pop-up indicator

22:22 marzzbar has joined #haiku

22:25 <dovsienko> jhj: what exactly is the problem that using getconf solves? because it will use the default PATH with non-POSIX binaries, as far as I understand

22:25 <milek7> certainly looks very CISCy

22:25 <milek7> what was real-world speed of these machines?

22:26 <dovsienko> also, POSIX comes in several editions, so old Solarises are POSIX-compliant, to a degree, but to the earlier revisions, so you don't get $() and must use ``

22:28 <dovsienko> my usual solution to that is to set optional environment variables such as MAKE_BIN=/path/to/gmake and LEX=/path/to/flex, then the script would default via ": ${LEX:=lex}" and so on

22:29 <dovsienko> also to export PATH as required and/or to have symlinks in the PATH, but I barely remember ever using getconf

22:30 <dovsienko> (I maintain a few scripts that have to work on Linux, BSD, Solaris, AIX and Haiku)

22:33 <dovsienko> waddlesplash: I ran the debug build of Haiku for two or three weeks and the SSH-related KDL never occurred again. today I upgraded to the current nightly to be able to test libpcap before the release (and to use TMPFS)

22:43 <waddlesplash> ok

22:43 <waddlesplash> I guess we'll see what happens

22:45 <jhj> dovsienko: "command -p" runs the POSIX path, which is different than the default path.

22:45 <jhj> dovsienko: And the POSIX path has a different getconf binary. On Solaris, you actually have a couple different getconfs that you can use - XPG4 and XPG7 IIRC.

22:47 <waddlesplash> jhj: btw, why have #ifdef LOCKLESS... etc. everywhere? why not just make the lock_...() functions inline no-ops when thats' defined?

22:47 <jhj> I don't bother with "``" vs "$()" though, because dps8m (and everything else I do) targets POSIX.1-2008 and uses some of those new functions, and anything that has that conformance supports $()

22:47 <jhj> waddlesplash: The goal is to eventually get rid of lockless entirely and completely.

22:47 <jhj> Err, no_lockless that is.

22:48 <waddlesplash> sure, but in the meantime you could clean up the code a lot by just having the functions be defined to nothing and then removing the #ifdefs from everywhere

22:49 <jhj> There are some fundamental differences that would still need an ifdef where the non-multithreading paths take shortcuts knowing there will never be more than 1 CPU.

22:49 <jhj> I can't remember exactly where. Probably not too many places anymore, but I'm sure a few exist still.

22:49 <waddlesplash> yeah, but they are probably small

22:49 <waddlesplash> compared to how many ifdefs there are

22:50 <waddlesplash> milek7: doesn't that need to be &&?

22:50 <waddlesplash> but yes, looks like another significant performance win

22:50 <milek7> yes

22:51 <waddlesplash> jhj: CMake build systems are very nice for the reason that incremental builds and IDE integration "just work"

22:52 <waddlesplash> maybe you need Makefiles for some more obscure system types where CMake isn't supported but I think that's rate

22:52 <waddlesplash> rare

22:52 <waddlesplash> I mean, AIX has CMake even

23:02 <dovsienko> waddlesplash: if you try building software on bare NetBSD, your assumption will likely change

23:02 <waddlesplash> oh?

23:03 <dovsienko> it is possible to have CMake, git, Perl, Python, Ruby, Rust etc., but one has to compile these first

23:07 <dovsienko> on cfarm.net AIX hosts CMake is a "freeware" package, for example, and tcpdump/libpcap have CMake problems there

23:07 <dovsienko> similar story on OpenCSW Solaris

23:08 <dovsienko> (of course, one can install the required build dependencies as binary packages if someone else compiled these first)

23:11 <dovsienko> anyway, it is getting late. TTYL!

23:12 <jhj> gnite

23:12 <jhj> AIX provides cmake in the IBM provided toolkit now, but it's not part of bos.

23:18 <jhj> Anyway, I'm not too concerned with a new build system. If you look at https://dps8m.gitlab.io/dps8m/Releases/, we cross-compile builds for 14 operating systems and 21 different platforms architectures with what we got.

23:18 <Skipp_OSX> no, nvm, still broken I still don't get it

23:19 <jhj> Making a build go 300% faster might be great, but we're talking about savings 20 seconds total from 30s to 10s.

23:19 <waddlesplash> it would probably be more than that

23:19 <jhj> waddlesplash: Also I didn't mention it, but if you go into the src/dps8m and build from there, you do get incremental builds without most of the checks that the main build performs

23:19 <waddlesplash> I did discover that

23:19 <jhj> That doesn't recompile anything in the other src directories, which change much less often.

23:20 <waddlesplash> an incremental build with nothing changed still takes 25 seconds on Haiku

23:20 <waddlesplash> most of that before the LD step

23:20 <waddlesplash> it appears to spend most of its time spawning hundreds of shell processes

23:20 <jhj> waddlesplash: Yeah, it's all the shell process stuff it does.

23:21 <jhj> Much of it because we embed the full command line in a defintion into every compiled file, although not every file uses it.

23:21 <jhj> Also, we scan all the system headers

23:21 <waddlesplash> sounds expensive

23:21 <jhj> And then we scan all of the source files and keep track of which defintions are set and used, for every file.

23:21 <waddlesplash> surely there's some way to speed that up

23:22 <jhj> We could do that just for some of the main files.

23:22 <jhj> waddlesplash: If you run "dps8 -t

23:22 <waddlesplash> can't you separate into a configure + make stage like most things?

23:22 <jhj> waddlesplash: If you run "dps8 -t" (the -t just avoids creating a rather large state file you don't want or need) and run "SHOW BUILDINFO" you can see some of this information that we save.

23:23 <waddlesplash> sure, but there's better ways to cache this, I'm sure

23:23 <waddlesplash> anyway after adding the check milek7 suggested above, I get 6.433 MIPS

23:23 <waddlesplash> earlier I got 5.4255 MIPS

23:23 <jhj> waddlesplash: There are better ways, though there aren't better ways that are equally as portable and don't need anything beyond what POSIX bare minimums necessiatate.

23:24 <waddlesplash> idk, I feel like there's ways to strip this down more. but not my project :)

23:24 _-Caleb-_ has left #haiku [#haiku]

23:24 <jhj> waddlesplash: We also have nearly 20 years of legacy in here, and we're very conservative when it comes to changing things like build systems that might break something.

23:24 qwebirc42001 has joined #haiku

23:25 <waddlesplash> it makes sense to a degree

23:25 <jhj> Like I forget exactly what it was, but we changed something and some random person complained they were having problems building on some obscure discontinued handheld game system :)

23:26 <waddlesplash> maybe use CMake for a standard build and then have makefiles for a "reduced" build with some of these things not available

23:26 <waddlesplash> and thus simpler makefiles and less to maintain

23:26 <jhj> waddlesplash: Also, because we want to support easy profiling builds, we want to be able to natively build on all these systems, not just cross-compilation.

23:26 <waddlesplash> sure

23:26 <jhj> And some of those, like those crappy game consoles, don't come with C++ toolchains, just C.

23:26 qwebirc42001 has quit []

23:26 <x512[m]> Is CMake really significantly better than Makefile?

23:26 <waddlesplash> x512[m]: it is better than these makefiles :P

23:27 <jhj> x512[m]: It can be significanlty faster.

23:27 <x512[m]> Meson is better.

23:27 <jhj> If you have to make a TON of checks, it is especially faster because cmake has modules where all those things are done in compiled code instead of individual checks that are like autoconf shell.

23:27 <jhj> Essentially our build system does a lot of what autoconf does each time youu run make, behind the scenes.

23:28 <jhj> So yeah, cmake can be significantly faster.

23:28 <jhj> https://github.com/johnsonjh/duma is a project I maintain now, where we added cmake support. I'm sure the cmake support is a bit buggy.

23:28 <arraybolt3> and Autoconf spits out shell scripts that look like utterances from a Balrog

23:28 <jhj> The makefiles are gross and date back to the 90s.

23:29 <x512[m]> Pure Makefiles are better than Craptools.

23:29 <jhj> Eventually for this project, I plan to do something similar and offer cmake as a new option, which won't replace our GNU makefile based system, but work along side it in case it doesn't work.

23:29 <waddlesplash> that sounds nice

23:29 <jhj> But I don't like the idea of making more work for myself maintaining two build systems.

23:29 <waddlesplash> would be neat to step through some of this in an IDE

23:29 <jhj> And the biggest problem is that I don't really know cmake too well.

23:30 <waddlesplash> I know cmake well enough for something like this. but it doesn't sound like a fun project :P

23:30 <jhj> waddlesplash: If you look at https://github.com/aremmell/libsir/ which is a project I am a co-author on, we have a GNU make based system, and our makefiles are absolutely insane.

23:30 <jhj> waddlesplash: But they have zero legacy garbage and they're very fast.

23:30 <jhj> So, it can be done. :)

23:31 <waddlesplash> oh, I know it can be done

23:31 <waddlesplash> but it's not easy

23:31 <jhj> DPS8M uses libsir (right now, not for much, but the next version will extensively), so libsir supported systems and compilers is a superset of what dps8 works on.

23:32 <jhj> waddlesplash: libsir also supports msbuild. :/

23:33 <jhj> Also, with libsir, we've extensively performance optimzied and cached it as we write it, and have scripts that track performance regressions and such.

23:33 <jhj> A large portion of it's formally verified and proven with ESBMC and CMBC etc.

23:34 <jhj> There's a lot you can do when you are writing from scratch

23:34 <waddlesplash> https://www.irccloud.com/pastebin/GtIh0NVh/g7f.diff

23:34 <jhj> :)

23:35 <waddlesplash> jhj: here's the change milek7 was talking about in patch form

23:35 <waddlesplash> around +15% or so

23:36 <waddlesplash> at least for me on Haiku

23:36 <jhj> waddlesplash: I need to stop chatting here, I'm still going through every use of cpup :)

23:36 <waddlesplash> OK, very good

23:36 <jhj> We have a full test suite that takes about 45 minutes to run.

23:37 <waddlesplash> jhj: the above patch will probably be more of a performance improvement on systems that don't have futex-like APIs internally for their pthreads to use

23:37 <waddlesplash> as on those systems a lock is always a syscall

23:37 <jhj> It builds a 5-CPU system from scratch, runs diagnostics, builds and runs a program in every single language we have on multics like APL and COBOL and PL/I and C and BCPL and FORTRAN and even more obscure ones and runs them, simulates activity with multiple users, etc.

23:38 <jhj> So the real trick is to make sure that works the same after any modifications :)

23:38 <waddlesplash> well, might as well add the above patch before the test, it may speed it up more

23:38 <jhj> Yeah, let me fire up a run of our CI with just that patch applied, I'll see if it breaks anything first :)

23:38 <jhj> I can do that while I do other stuff.

23:40 <jhj> waddlesplash: LTO is a big win for us, we used to have *really* gross scripts that would amalgamate all of the source code into a single file and build it, which required massive memory and CPU and often failed obscurely :)

23:40 <waddlesplash> I can imagine

23:41 <jhj> Like I said we stole that from something webkit did back in the day.

23:41 <waddlesplash> WebKit still does that actually

23:41 <jhj> sqlite still does that, but they do it more for making sqlite easier to embed than for performance, but it still has the same effect.

23:41 <waddlesplash> probably differently than it used to though

23:44 <jhj> waddlesplash: The Portland Group compiler (which is now free as NVIDIA HPC SDK C/C++ Compiler) doesn't support LTO, but a build using it is faster than a Clang non-LTO build.

23:44 <jhj> We support that because of the NVIDIA Nsight tooling and it'll ease eventually doing GPU offload for some functionality if we ever go that way.

23:45 <jhj> But some other guys on the project (not me!) have since taken a different approach and instead of doing GPU or FPGA offload are doing a whole CPU in FPGA instead.

23:45 <waddlesplash> that does make sense

23:46 <jhj> The DPS means distributed processing system. For example, the IO controllers are actually separate 18-bit minicomputers.

23:46 <jhj> We don't run their firmware directly, instead, we reimplemented what they do in C for speed.

23:46 <waddlesplash> yeah, high-level emulation

23:46 <jhj> But we do have an emulator that can run the original firmware/OS for them.

23:47 <jhj> Those systems are essentially half a DPS-8 main CPU, 18-bit/36-bit where the main system is 36-bit/72-bit.

23:47 <jhj> The architecture is otherwise nearly the same/

23:49 <jhj> waddlesplash: On the real hardware, everything is wired together on a bus called DIA, I forget how fast it was... but the original systems didn't gain much of anything in speed going beyond 3 CPUs, and even the 4th CPU wasn't that big of a improvement to a loaded system.

23:49 <jhj> We can run 6 CPUs just fine because our bus is infinitely fast. :)

23:50 <jhj> waddlesplash: https://imgur.com/IazQ2Gm

23:51 <jhj> This is how the CPUs are wired

23:51 <jhj> The SCU's contain the system memory.

23:51 <waddlesplash> makes sense

23:51 <waddlesplash> OK, so in a Linux VM here, I get: 9.18 MIPS with master branch, 10.15 MIPS with the cpu_state_t patch, and 10.95 MIPS with the lock avoiding patch

23:52 <waddlesplash> GCC 11.4 on Ubuntu 22.04.4 LTS

23:52 <jhj> waddlesplash: https://imgur.com/193f7Rf is the cabling diagram for the full system.

23:53 <jhj> We wire that up by default, but you can "CABLE DUMP" and wire up everything any way your heart desires.

23:53 <jhj> waddlesplash: I've got just the g7 lock elide stuff running our full CI, so I'll know if it breaks anything or not in about 40 minutes

23:53 _-Caleb-_ has joined #haiku

23:54 <jhj> if it does break anything, I'll investigate.

23:54 <waddlesplash> I can't see how it possibly would

23:54 <waddlesplash> but yeah, ok

23:55 <jhj> It'll run full CI on a push, so better to avoid embarassment now :)

23:55 <waddlesplash> yes, makes sense

23:56 <jhj> if it breaks it CI its insta-shame, as everyone I know gets emailed about the failure lmao

23:56 <jhj> real CI that is :)

23:58 TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

23:58 TMM has joined #haiku