The Battle for Open Data and Application of LLM and ChatGPT in Construction
The topic of open data in architecture and construction cannot be considered without bringing up the existence of SDKs used by CAD vendors.
The topic of data openness in construction and the use of ChatGPT and LLM, which is beginning to creep into the topic of processing data from CAD (BIM) formats, is the subject of a presentation at the BIM Cluster BW conference, "The Battle for Data and the Application of LLM and ChatGPT in Construction"
Anyone interested in this topic - please write your questions or feedback 🙋🏻♂️
Tagged:
Comments
After watching I don't know where to start. But, as I said on another place:

You are comparing the download counts of Pandas, a multifunctional Python analysis library used across a multitude of use cases, against the downloads of the python library of ifcOpenShell, which is specifically used for IFC schema interaction, (22:50) and use that to implicate a less usefulness of an open BIM approach. Seriously?!
doia thanks a lot for watching the video and many thanks for your comment. It's good that you didn't have any questions about the fact that** none of the CAD vendors use IFC for interoperability purposes** and that they all use reverse engineering SDKs for this purpose. Even Autodesk, which supposedly invented IFC (although it had nothing to do with its creation) uses the reverse engineering SDK to offload data from Revit and other products into IFC.
IfcOpenShell is a great tool for working with IFC - but the problem is that IFC is not used by big companies and all CAD vendors without exception and it is not a native format for any CAD program in the world (except BlenderBIM which is not that common yet). IfcOpenShell with the help of the only open source geometry kernel OpenCascade (which is being developed in Russia) allows to get geometric data from EXPRESS markup IFC. OpenCascade has its limitations - but it is sufficient for use in IfcOpenShell. And it is good that using IfcOpenShell you can create data converters and get data from IFC to JSON, CSV, DataFrame and other formats.
But why use IfcOpenShell to re-upload data and export if I can get completely all data from any CAD format using reverse engineering SDK without using CAD program or running it. That is, instead of opening Revit first and using reverse engineering SDK inside Revit->IFC export and then getting non-native IFC, which I will then convert to the format I need using IfcOpenShell - I can immediately get a table or DataFrame from Revit or any other CAD format.
So you are right that IfcOpenShell cannot be directly compared to Pandas (CSV - sourced directly) - IfcOpenShell requires a mandatory first step that CAD vendors need - to start a CAD program. If someone wants to use obsolete EXPRESS markup from the 60's and write on many lines of code - then, of course, any non-native formats can be used
@ArtemBoiko Your discourse is insightful and definitely have a role to play in certain circumstances especially when one wants to be industry agnostic with the data.
Arguably such an approach will also be an affront to industry standardization if every one has to define their own semantic structure from data export. What then will be the common language for interoperability? Not everyone wants to wrangle with raw data (even developers abhor the mundane!), and I want to presume also that just because one likes beef doesn’t mean one wants to meet the cow.
IFC is already a semantically structured data of 30 years continuous development that comprehensively describes the AECO industry in totality. Practitioners are not ignorant of its challenges but are also not oblivious of the increasing rate of its adoption. IFC struggles I believe not even because of its inherent technical shortcomings but lack of adequate support from vendors. But I believe when the cloud is full the rain will fall!
Yes, it’s slow, but in my humble opinion I find your proposals as grandiose in design, nebulous in structure and vague in specific industry semantics and one may doubt if it’s the messianic solution for the industry.
Owura_qu thank you very much, totally agree with you. Why I criticize more not the IFC, but more the marketing company openBIM and the lobbying organization buildingSMART:. In 2021, I was contacted by an expert who was Mr. Obermeyer's right hand man (who started the existence of IFC) in the late 80's and early 90's, after which he headed the largest CAD company in Europe. And this person told me about the history of IFC and IAI (later BuildingSMART) organization and shared insights. I then became familiar with how CAD vendors themselves develop their export modules in different formats - and concluded that none of them use the IFC format for interoperability purposes. I then met the developers of the major SDKs for revering and OpenCascade - which gave me faith that "all is not lost and I believe that when the cloud is full, it will rain" - but that cloud should have nothing to do with the lobbying organization BuildingSMART.
I have personally worked a lot in central Europe with different data formats for different use cases. And what I have seen is this: none of the big companies use IFC for use cases where data accuracy is required. IFC was invented and developed in Germany, but most large companies in German speaking countries use the flat CPIXML - OBJXML format (developed in cooperation with the pioneers of the IFC format) for their 4D-7D processes: ZÜBLIN, STRABAG, HOCHTIEF, Bilfinger, Buro Happold, Implenia, Peter Gross Bau, Deutsche Bahn, Firmengruppe Max Bögl, WOLFF & MÜLLER, Drees & Sommer, ZECH Bau, Kohlbecker Gesamtplan GmbH, Arcadis, Deutsche Telekom, Die Autobahn GmbH des Bundes.
There is no need to invent new data formats - for geometry and metainformation there are already popular and globally standardized formats. For geometry these are GLTF, DAE, OBJ, and for metadata these are CSV, JSON, XML. Or you can use mixed formats like CPIXML or USD.
If now I can get all the data in any popular format without running a CAD program, for which there are already thousands of open source tools and bibilotecs for working with data - why do I need to unload a complex and parametric IFC from a CAD format?
@JanF thanks. Please just state point by point where there is trolling or misrepresentation of facts here. Trolls avoid facts. In this thread so far we have only discussed facts that can be verified.
We agree on one (1) thing. Autodesk and a lot of other CAD vendors do not use IFC directly in their programs. And if they support it, it’s rather underwhelming and poor and mostly just as primarily a data exchange format.
But why would they?! They have no or only minimal economically incentive to do so. Only that their customer scream for it, which they just halfheartedly fulfill to keep them at bay. Technically they could use IFC (in whatever file or db format) to save their program data. On opening this file, transform it to the inner program data model, work with it and serialise it back to IFC on saving. Just like BlenderBim does it. But they won’t do it.
Apart from that, you get one thing fundamentally wrong. You confuse IFC, a data structure specification, with a storage file format like CSV JSON or even STEP. IFC is NOT a file format, IFC is NOT Step, it is specified storage agnostic. You could write IFC directly as a CSV file format if you define the proper table setup.
Your proposed solution of „noBIM“ (have you thought about the implicit meaning of this wording?) on the other hand is just another closed source black box SDK to extract data from vendor file formats. Basically just a serialiser from Revit files to CSV or JSON with a lot of information loss in-between. All fine and a totally legit way of doing things. Just don’t use it as an excuse to say open BIM or the use of IFC is a dead end.
@doia many thanks! "Autodesk and many other CAD vendors don't use IFC directly in their programs," but neither do large construction and design companies use IFC where precise volumes and quantities are involved. Central Europe invented a new flat format for exchanging information from projects to ERP, and this format was created by people who were involved in the history of the IFC format in the late 80's. If you happen to know "popular" construction ERP systems that use IFC - please post, I haven't found such products yet.
I haven't written about "noBIM" for about a year and already I try not to use this term (it only can still appear in old slides, which I will edit and remove this abbreviation). Before that I was trying to get the point across that in construction there is only data and processes and BIM is just a marketing invention of Autodes and openBIM is a marketing invention of Graphisoft and Tekla. Now I'm only talking about open data, which is already used in other areas of the economy and inventing new data formats is probably unnecessary.
"With a big loss of information in between" - where is the big loss of information if with SDKs we get full access to all information in CAD formats. That's why these SDKs are used by every major company in the world without exception. The loss of information occurs when transferring data in IFC format, where data quality depends on both the export and import modules and the professionalism of the person doing the export and import. I know SDK developers and I believe that these products should be available to everyone for free, as it happened in 1998 with the OpenDWG alliance, which, fighting Autodesk's monopoly, provided free use of SDKs to open the DWG format. The history of format discovery is written about in detail here: https://medium.com/@boikoartem/the-struggle-for-open-data-in-the-construction-industry-2b97200e6393
Today we need the same thing, open tools to access closed CAD formats - that's why I distribute my products for free. If we want to make such SDKs fully open source and open source - then we must first come together and understand the importance of SDKs, similar to how this process happened from 1996 until the OpenDWG alliance.
"Just don't use" is my subjective opinion that a format that is not used by any CAD vendor or major company and is run by a lobbying organization headed by people closely associated with Autodesk (with absolutely no interest in IFC) - cannot be considered a major data transfer format. Of course, it can be used for some purposes, but subjectively I think it is a dead end to work according to the rules that CAD vendors invent and manage
Discussing geometry kernel topics, OCCT and industry requirements with Ionut Ciuntuc and one of the OpenCascad developers. Who is interested - please join the discussion:
https://t.me/datadrivenconstruction/2484