How to validate strings against EXPRESS IfcShapeRepresentationTypes function cases

cvillagrasa · September 2023

A technical question here. With IfcOpenShell, how can I programmatically check if a certain pair of strings is within IFC schema admitted values for RepresentationIdentifier and RepresentationType, as provided on the tables from the IfcShapeRepresentation documentation? They are not an EXPRESS enumeration, so I'm having a hard time figuring this out.

I've found that if I build an IfcShapeRepresentation entity instance using the two attributes to be validated, and then check for the EXPRESS rule for function IfcShapeRepresentationTypes, it does the job:

ifcopenshell.express.rules.IFC4.IfcShapeRepresentationTypes(shape_representation.RepresentationType, shape_representation.Items)

However, that requires to declare an entity instance. Is there a simpler way? For instance, when checking for the values of ContextIdentifier and ContextType in an IfcGeometricRepresentationContext (if wanting to validate them against the same referred tables), it seems overkill to declare, validate and delete an IfcShapeRepresentation.

Moult · September 2023

Good question, ping @aothms - I'm not sure how :)

aothms · September 2023

Yeah well, keep in mind that from the outside we don't know how 'complex' an express function is. Also keep in mind that IfcShapeRepresentationTypes goes to quite big effort in some cases to see whether e.g Curve is 2D or 3D, which means invoking the Dim derived attribute which can call another function such as IfcCurveDim(). So using the same function on a RepContext is not even really possible to begin with.

If you want the string labels, maybe you can use the python ast (a bit messy) or you can use ifcopenshell.express.express_parser to get the express AST. But I don't think it gets you much closer to what you want..

import ast, inspect, ifcopenshell.express.rules.IFC4

# Define a function to collect string literals
def collect_string_literals(code):
    tree = ast.parse(code)
    def visit(node):
        if isinstance(node, ast.Str):
            yield node.s
        for child_node in ast.iter_child_nodes(node):
            yield from visit(child_node)
    yield from visit(tree)

print(*collect_string_literals(inspect.getsource(ifcopenshell.express.rules.IFC4.IfcShapeRepresentationTypes)))

cvillagrasa · September 2023

Thanks for the help @aothms !

I see... so there's not any straightforward solution. I wonder why these two lists aren't an Enum in IFC, although that doesn't seem like a quick and easy thing to change.

I got into this one because I was thinking of the high level API as a tool which could validate arguments at runtime up to some useful degree of correctness (as hinted in #3774). In addition to ifcopenshell.validate for EXPRESS rules, which although being ultra powerful, could make it for a slower workflow to fix dumb mistakes. For instance, I'm thinking something that catches "Axes" identifier, instead of "Axis", when creating a representation subcontext, that kind of thing. Maybe it looks like a very minor issue, but I believe a good API design should be explicit on what it covers, issuing warnings or raising errors when any edge case falls outside of what the code was conceived for. That, in turn, makes the little technical debt more visible and thus easier to be addressed with time.

A variation of the abstract syntax tree approach (Python case), returning only the values of interest, could be as in the code below. But of course, 1) this is not handy at all to perform runtime checks and 2) the visit method is curated for exactly this specific EXPRESS function... which feels very volatile.

import ast
import ifcopenshell.express.rules as rules
import importlib
import inspect
from dataclasses import dataclass, field


@dataclass(slots=True)
class ShapeRepresentationTypesVisitor(ast.NodeVisitor):
    version: str = 'IFC4'
    shape_representation_types: list[str] = field(init=False, default_factory=list)

    def __post_init__(self) -> None:
        express_ifc = importlib.import_module(f'{rules.__name__}.{self.version}')
        source: str = inspect.getsource(express_ifc.IfcShapeRepresentationTypes)
        tree: ast.Module = ast.parse(source)
        self.visit(tree)

    @classmethod
    def get_enum(cls, *args, **kwargs) -> list[str]:
        return cls(*args, **kwargs).shape_representation_types

    def visit_If(self, node: ast.If) -> None:
        for child in ast.iter_child_nodes(node):
            if not isinstance(child, ast.Compare):
                continue
            for constant in child.comparators:
                if not isinstance(constant, ast.Constant):
                    continue
                value = constant.value
                if not isinstance(value, str):
                    continue
                self.shape_representation_types.append(value)
        self.generic_visit(node)

And then the representation types could be obtained as (for any required IFC version):

ShapeRepresentationTypesVisitor.get_enum(version='IFC2X3'))

So in summary, I guess the best option for what I'm thinking is to just use this code offline and hardcode the representation types for each schema version.

Moult · September 2023

I have the general impression that checking WRs are expensive, so although it's ideal for the API to check it "on-the-fly", it would be nice for this to be disabled for performance if it does become a performance concern. This is all speculative of course.

cvillagrasa · September 2023

Yes, I agree that on top of what I'm proposing, there might need to be an extra arg validate=False around all the API calls. Then, it'd be possible to keep validation on to prototype, and turn it off for production.

Just thinking out loud now. Are you planning to migrate the high level API to C++ at some point? that would yield real performance, specially if concurrency/multithreading is taken into account (although that could be difficult to implement without requiring some knowledge from the Python API user also). As it is now, it's not like the typical usage of numpy or PyTorch, in which one barely executes any pure Python code.

Moult · September 2023

there might need to be an extra arg validate=False around all the API calls

Or something like model.set_strict_validation(True) to toggle on / off. Similar to how you can enable transactions.

Are you planning to migrate the high level API to C++ at some point? that would yield real performance

Yes, and more than that. I think we should always be critically aware of the appropriateness of different languages. Python is great for development speed, but definitely not great for performance. So sooner or later I expect more of the API to become "upstreamed". We'd probably target performance bottlenecks first (and this process has already been happening in the drawing / model loading area). Prior to that, the API needs a lot of work to make it work for batch editing as opposed to single objects. Batch editing relationships would already provide a huge performance gain regardless of language. (instead of setting RelatedObjects 200 times, just set it once). Similarly, batching ownership history changes per transaction is a smart thing to do and has been on my mental todo list for a long time.

But even more than that, sooner or later I think we also need to accept that just being a Blender Add-on may not cut it for performance. We may need to consider forking Blender to add some specific functions around drawing generation, culling of large models, etc ...

There's years of work ahead of us!

aothms · September 2023

Batch editing relationships would already provide a huge performance
gain regardless of language. (instead of setting RelatedObjects 200
times, just set it once). Similarly, batching ownership history changes
per transaction is a smart thing to do and has been on my mental todo
list for a long time

I think this batching is also hack around the fact that the ifcopenshell data model is so closely related to the serialization format, but it's a more fundamental problem that also involves schema. Ultimately it needs to become much cheaper to do operations on edges in the graph. And there there is a little bit of tension also with the NativeIFC concept. Can we introduce just the minimal amount of abstractions around things like objectified relationships, while still being fully native?

But of course, 1) this is not handy at all to perform runtime checks and
2) the visit method is curated for exactly this specific EXPRESS
function... which feels very volatile

Nice work! But, I think I politely disagree. Collecting the string cases from the function is a one-time operation. It is not really significant wrt to the start up time of the interpreter. It is also not that specific to that single function. It's specific to all functions that use a Express CASE statement which gets compiled into a python if-elif. So there could be others. One thing I can imagine is that we detect this situation in the python code generation step. That instead of converting this Express function to a mere python function, we convert it to a Python functor / class with call function (it technically already is) that is, in addition to validating, also able to provide a bit more introspection in what the function is doing (e.g in this case provide a list of case label strings). Not something I can work on in the near future though in terms of time, but if you want to have a go with the code generator I'd be happy to assist.

cvillagrasa · September 2023

@Moult said:
But even more than that, sooner or later I think we also need to accept that just being a Blender Add-on may not cut it for performance. We may need to consider forking Blender to add some specific functions around drawing generation, culling of large models, etc ...

There's years of work ahead of us!

Cool! I never thought of culling in a desktop app, are you planning to build cities? ;)

@aothms said:
One thing I can imagine is that we detect this situation in the python code generation step. That instead of converting this Express function to a mere python function, we convert it to a Python functor / class with call function (it technically already is) that is, in addition to validating, also able to provide a bit more introspection in what the function is doing (e.g in this case provide a list of case label strings). Not something I can work on in the near future though in terms of time, but if you want to have a go with the code generator I'd be happy to assist.

I guess it's all this pyparsing sorcery under express.express_parser and express.nodes, if not any C++ I'm not seeing. I don't know if I'm going to have time for this in the near future, either, but if I do I'll come back at it!

aothms · September 2023

I guess it's all this pyparsing sorcery under express.express_parser and express.nodes

Codegen is here for express function -> python def is here:

https://github.com/IfcOpenShell/IfcOpenShell/blob/v0.7.0/src/ifcopenshell-python/ifcopenshell/express/rule_compiler.py#L565-L575

How to validate strings against EXPRESS IfcShapeRepresentationTypes function cases

Comments