Usage Statistics of 3D Printing File Formats

an overview of the usage of different file formats used for 3D printing

During our current research on the security of 3D printers and the surrounding ecosystem, we asked ourselves how well different file formats are used. As there seems to be no clear data, just anecdotal evidence, we decided to check for ourselves.

We analyze which are the most used file formats on two popular 3D model online-marketplaces, namely Thingiverse and MyMiniFactory. The data presented here is based on the publicly available files uploaded to both platforms. The data does not contain any information about the usage of different file formats outside this specific use case of private users sharing their 3D printing model with others; especially not regarding an industrial context. No other data is freely available. The How to Get the Data section below provides download links and how-tos for our dataset. The data presented here was collected in June 2021.

How We Collected the Data

Both Thingiverse and MyMiniFactory provide application programming interfaces (APIs) that allow access to their data sets, specifically JSON-based HTTP REST APIs . In both cases, there is no API endpoint to list existing objects, but both websites use incrementing numbers to identify the objects. Thus, gathering a complete data set is a matter of incrementally trying every number until no more objects are found. This can be done for both marketplaces. For both marketplaces, we incremented the object IDs and attempted to download the JSON metadata for the ID.

The data set for Thingiverse contains more than two million entries, where the set for MyMiniFactory amounts to roughly 130,000 entries. For each object the downloaded metadata includes the file names of all files uploaded to that object. To ease the analysis, we stored the uploaded file names and their upload timestamp for each object. Then, we reduced the file name to their suffix(es) (i.e. their file extension) and unified the them to a lower case version. This means our analysis is limited to the knowledge derived from the file suffixes the uploader used. It might be the case that the uploaded file does not match the actual content. Additionally, we do not analyze the content of uploaded .zip (or similar) files.

Overview of the Data

Table 1 lists the file formats that occur the most often in our data sets.All file formats that occur more than 10,000 times. The remaining files account for 8% of all files. Total Occurrences shows the sum of all files uploaded for every object. Multiple files of the same format can be uploaded for the same object. That explains why the number of STL files can exceed that of the total number of objects in the data sets by about a factor of four. There are more than five times as many STL files uploaded than all other file uploads combined.There are 4,592,742 STL files and 787,577 other files in total. The Repetition Factor indicates how many files of the same format are uploaded for the same object on average. For each format, we only counted objects where the given file format was present at least once. Hence, the minimal value of the repetition factor is one. Most repetition values are higher than 1.5 which shows that most file formats are rarely uploaded on their own. That can be attributed to different variants of the same model being uploaded, for example, different scalings or colors. The difference in the repetition factor between formats might be caused by limitations of the format itself or by common practices.


Table 1 Total number of occurrences/uploads of all file formats that occur more than 10,000 times. The suffixes where unified to their lower-case version an the following suffixes were omitted: .pdf, .zip, .0, .1, .svg. AMF is included as it is mentioned by various rankings . The repetition factor indicates how many files of this type were uploaded to a single object on average.

Suffix File Format Description Total Occurrences Repetition Factor
.stl STereoLithography a 4,592,742 2.13
.scad OpenSCAD project file 77,585 1.42
.obj Wavefront Object b 65,556 1.86
.step STandard for the Exchange of Product model data c 44,920 1.72
.sldprt SolidWorks Part file 43,599 2.00
.skp SketchUp project file 32,522 1.48
.f3d Fusion 360 project file 32,275 1.30
.fcstd FreeCAD project file 21,436 1.52
.dxf Drawing Interchange File for AutoCAD d 20,566 1.94
.gcode Toolpath instruction for manufacturing devices e 16,713 1.52
.ipt Inventor project file 14,905 1.96
.3mf 3D Manufacturing Format f 14,823 1.63
.blend Blender project file 13,720 1.61
.123dx 123D project file g 12,146 1.55
.amf Additive Manufacturing Format h 2,451 1.54
  1. Defined by 3D Systems in 1988 . The original specification is not available, but various resources describe the format based on the original specification. [↩︎]
  2. First specified by Wavefront Technologies for their Advanced Visualizer software in the 1990s . “[From] a legal standpoint, the specification is probably proprietary to Autodesk” as Wavefront Technologies was eventually indirectly acquired by Autodesk . [↩︎]
  3. Designed as an exchange format between CAD applications. Standardized through the ISO 10303 family , part 21  defines the file format. Alternatively, uses the suffix .stp. [↩︎]
  4. Designed as an exchange format between CAD applications. Standardized by Autodesk for their AutoCAD software . [↩︎]
  5. There are multiple standards defining G-codes (e.g. ) but most applications and/or firmwares define their own extensions and variations. [↩︎]
  6. The specification is created by the 3MF Consortium. The first version of the specification was published in 2015. The specification is open-source and managed in a Git repository. The specification was not uploaded to GitHub until 2018 (Version 1.2) . Version 1.0 was initially uploaded to the 3MF Consortium’s website but has since been removed.[↩︎]
  7. Discontinued by AutoDesk in 2016. Original Webpage. [↩︎]
  8. Initially proposed as “STL 2.0” by Hiller et.al. . Since the initial proposal, it has been jointly specified by ISO & ASTM . [↩︎]

Overall, only 3% of objects do not have an associated STL file.In total about 56,000 objects. The top three file formats of objects where no STL is uploaded are .obj, .scad, and .dxf. Further, nine of the fifteen listed files are project files for specific programs. Together these facts suggest that the most common use case is for a user to upload an STL file and their project file of the software they created the STL with. Alternatively, the model is uploaded as an OBJ file, or in popular exchange file formats for Computer Aided Design (CAD) software (i.e. .scad and .dxf).

Trend of Usage over Time

To get an overview of the change in usage we plotted the uploads per month of each file format.

As some file formats support multiple models in one file and others do not, we ignore duplicate suffixes on files for the same object that were uploaded on the same day. This sanitization is required, as otherwise there might be biases towards the formats that do not support multiple models in one file, as a user would have to upload multiple files for a complex model with separated parts. This reduces the variance in the repetition factor from Table 1.

As you can see in the graph below, .obj, .step, and .f3d all follow a near identical curve that shows rapid increases in usage. .3mf shows fewer usage overall, but a rapid increase since its initial release. .sldprt, .fcstd, .dxf, .gcode, and .blend show a more steady growth. .ipt and .amf both fluctuate more than others and seem more or less stagnant. .skp, .123dx are declining in usage. In the case of .123dx this is expected, since AutoDesk discontinued the 123D program suite in 2016.

How to Get the Data

Option 1

Download the data we used (collected in June 2021):

Option 2

Download the data yourself.

As of June 2021 this will produce roughly 40 GB of JSON data and make about 10 million requests. The script creates a file for each available entry containing the JSON metadata. This means there will be millions of files in a single folder. I did it this way because it was the simplest, reasonably fast, method that works well with threading. This will obviously be terribly slow with a slow disk. I used an NVME SSD and had a total execution time of about 12 hours.

If you want to do something less stupid, go ahead and change the script ;) For downloading the data once this was fine.

  1. Get access tokens for thingiverse.com and myminifactory.com’s APIs.
    • thingiverse.com
    • myminifactory.com
      • register an app
      • the token shown after creation has not the required access right, you need a user-based token
      • go to: https://auth.myminifactory.com/web/authorize?client_id=XXX&redirect_uri=YYY&response_type=token&state=RANDOM_STRING where client_id should be the name of you app and redirect_uri the same redirect URI that was given for the registration. I used ngrok for the callback URI, but I’m not sure you’d actually need that.
      • You will be forwarded to an URL like: YYY#access_token=TTT&expires_in=604800&state=RANDOM_STRING&token_type=Bearer
      • The token TTT is the one we need.
  2. Run the get_data.py script with these parameters:
    • The first value is the website you want to get the data from.
    • The second the access token.
    • The third and fourth are the minimal and maximal ID, both sites use IDs for their objects, the API script simply tries all ID between the given values. Typically between 1 and the highest value you can find under “newest” on the respective site.
  3. We also recommend to ZIP the data after their processing so your computer is not slowed down by the number of files. (Also the extraction script uses the ZIP files.)

Citation

@online{usage-statistics-of-3d-printing-file-formats,
  author = {Rossel, Jost},
  title = {Usage Statistics of 3D Printing File Formats},
  year = 2022,
  url = {https://upb-syssec.github.io/blog/2022/3d-printing-file-format-usage/},
  urldate = {2024-04-24}
}