Pages

Tuesday, 4 June 2024

DeTikZify: Automating High-Quality Scientific Figures with AI

Introduction

In the rapidly changing field of artificial intelligence, multimodal language models are becoming one of the most important breakthroughs. Models adding multimodal input or output to off-the-shelf big models have gotten much ground. Although this has been the case, the task in the form of the creation of quality scientific figures is quite cumbersome.

That is where the new AI technique, DeTikZify, comes to the rescue. DeTikZify was developed with support from the research groups Natural Language Learning and Data and Web Science at the University of Mannheim, Germany, and works on a solution for the problems described in this report. DeTikZify attempts to close the gap between the ease of sketching one's ideas on paper and the complexity of processes involved in creating high-quality, semantic-preserving scientific figures.

demo video clip
source - https://github.com/potamides/DeTikZify

DeTikZify  implements a design to generate TikZ code from scientific figure sketches as well as actual figures, all with their semantic information. The motivation for developing DeTikZify was to reduce the time used up in formulating scientific statistics, which in most cases requires knowledge of graphic programs such as TikZ.

What is DeTikZify?

DeTikZify is an advanced AI model that makes use of powerful multimodal language models to automatically synthesize scientific figures. It makes use of TikZ, one of the most popular graphics languages that can be directly embedded within LaTeX documents, for the production of high-quality diagrams and figures. These figures do not only look nice but also are semantically correct, so they are indispensable for the representation of complicated concepts and data within scientific literature.

Key Features of DeTikZify

DeTikZify comes with several novel features, some of which include:

  • Scientific Figure Synthesis: DeTikZify can synthesize TikZ graphics programs based on hand-drawn sketches or existing figures. 
  • Use of Datasets:  It uses a dataset of newly created datasets, which encapsulates a vast collection of TikZ graphics and associated metadata. These datasets are  DaTikZv2, SketchFig, and SciCap++.
  • MCTS-based Inference Algorithm: The DeTikZify architecture is powered by a Monte Carlo Tree Search inference algorithm that iteratively refines its output in the transition from initial predictions to better-performing models but without further training.

Capabilities/Use Case of DeTikZify

DeTikZify's capabilities further spread out from its main features to unique use cases that would be of much help to researchers and scientists:

  • Reduction of the Publication Process: Since DeTikZify can transform sketches into very detailed TikZ programs automatically, it reduces the associated time and resources in preparing figures for publication. This then enables them to put more time into their research, which in turn provides valuable information.
  • Data Visualization through DeTikZify: One of the most significant potentials of DeTikZify is that it can synthesize high-quality scientific figures. With the ability to improve data visualization in research papers, even complex data could become much simpler and more accessible; this research will be understood by peers and the general public accessing it.
  • Collaborative Work with DeTikZify: Its function of creating figures can help bring understanding in teammates who are not good at graphic designing because they can, therefore, share and understand the ideas behind some sketches.
  • Educational Tool: DeTikZify can be used in teaching data visualization concepts to the students and how vital clear and precise visualizations are in scientific communication.

How does DeTikZify work?/ Architecture/Design

DeTikZify employs a powerful multimodal language model to translate both sketches and figures into TikZ programs without the need for hand engineering and further compilation through the LaTeX engine. The system architecture comprises some key components that work cooperatively toward the goal of translating an input image into a valid TikZ drawing at the end of the process: a vision encoder, a pre-trained language model, and an innovative, iterative-refinement Monte Carlo tree search algorithm. 

Overview of the DeTikZify architecture
source - https://arxiv.org/pdf/2405.15306

It starts from the vision encoder, SigLIP, and input images of hand-illustrated sketches or even prior figures that can be passed to this model. In this model, they are encoded into patch embedding vectors to be compatible with the inputs required in the pre-trained language model, largely pre-trained on datasets like SciCap++ and DaTikZv2. All the ones mentioned pass through the model for an initial TikZ code generation representing those input images.

DeTikZify is an MCTS-based inference algorithm that significantly increases the quality and precision of generated programs in TikZ. This is to allow the model to iteratively refine its output by exploring various possible paths of code that result in increased rewards for the most rewarding path. The rewards are, therefore, computed concerning both compiler diagnostics and similarity metrics so that the output is not only syntactically correct but also visually similar to the input images. This process is done over and over until the TikZ code is considered to have an acceptable precision and quality level.

Evaluating DeTikZify against Other Models

Comparative analysis that was done by evaluating performance of DeTikZify and Claude 3 alongside GPT-4V as other models. The evaluation process was comprehensive, involving both automatic and human assessments.


source - https://arxiv.org/pdf/2405.15306

In terms of automatic evaluation, DeTikZify consistently outperformed the commercial baselines in synthesizing TikZ programs. This superiority was evident across different metrics such as CrystalBLEU (cBLEU), TEX Edit Distance (TED), DreamSim (DSim), and SelfSim. Besides this, the high Mean Token Efficiency (MTE) of the DeTikZify models indicated that they were generating code effectively.

Human evaluation further cemented DeTikZify’s superiority. Beyond this, top performing model of them all is referred to as DeTikZify-DS7b which outperformed GPT-4V in both output-driven inference (OI) and time-budgeted inference (TI). Also, it became clear that GPT-4V had difficulties in refining its own outputs properly indicating how effective MCTS-based approach used by DeTikZify is.

DeTikZify’s performance under Time-Budgeted Inference (TI) was also analyzed. It was observed that increasing the computational budget significantly improved DeTikZify’s performance. As a result, there were positive changes in refinement even if input varied from an assessment kind due to MCTS-powered algorithmic enhancements.

Accessing and Using DeTikZify

The code, models, and datasets used in DeTikZify are publicly available. The DeTikZify model can be accessed through its GitHub repository. The user can be able to download the Python package and exploit the entire functionalities of the model ranging from compiling to rendering, and saving TikZ graphics. The execution of the model to provide output necessitates a full TeX Live 2023 installation, ghostscript and poppler, which are requirements for it to be operational.

If you are interested to learn more about this AI model, all relevant links are provided under the 'source' section at the end of this article.

Limitations

  • Comparison with Proprietary Systems: The comparison of DeTikZify with these proprietary systems, which lack any kind of training information or internal processes that could be made public, further reduces the fairness and reproducibility of experiments.
  • Inherited Biases and Flaws: DeTikZify may inherent biases, flaws, or limitations that are in the training data, possibly leading to discrepancies between expected results and generated outputs.
  • Potential Misuse: There's a possibility of misuse by malicious elements to create misinformation and fake science.
  • Licensing Restrictions: Some of the TikZ programs are not included in the public release of DaTikZv2, one of the datasets used, because of licensing restrictions. However, creation scripts for the datasets have been made available for independent reproduction of the full version of DaTikZv2.
  • Use Restrictions: OpenAI's use policy places restrictions on their services from being used to produce competing products, and all artificially generated examples have to be used for a non-commercial purpose in DaTikZv2.

Conclusion

DeTikZify is a testament to the progress in AI-driven tools for scientific communication. By automating the creation of TikZ-based figures, it not only saves time for researchers but also democratizes access to high-quality scientific visualizations.


Source
research Paper : https://arxiv.org/abs/2405.15306 
research document : https://arxiv.org/pdf/2405.15306
GitHub Repo: https://github.com/potamides/DeTikZify
Collection: https://huggingface.co/collections/nllg/detikzify-664460c521aa7c2880095a8b

No comments:

Post a Comment

ShowUI: Advanced Open-Source Vision-Language-Action Model for GUI

Introduction Graphical User Interface (GUI) assistants assist users to interact with digital appliances and applications. They can be an ord...