How do I migrate the programming language

Facebook AI's TransCoder translates between programming languages

The development department of Facebook AI presented TransCoder, a tool for translating source code. Thanks to AI, the transcompiler system should be able to translate code from one programming language to another. The training is self-supervised, and according to the announcement in the developer blog, the neural structures do not require parallel data for an upstream training. The AI ​​department at Facebook tested the tool with cross-translations between C ++, Java and Python 3, and the accuracy apparently fluctuated depending on the programming languages ​​selected and the direction of translation.

Translate Java, C ++ and Python - AI becomes more accurate

According to the Facebook blog, the model currently copes best with the translation of Java functions to C ++; the results here were over 90 percent correct. In the opposite direction (C ++ code to Java code), the model was still three-quarters correct (74.8 percent). Java to Python worked just under 70 percent of the time. The hit rates for other commercial offers on the market therefore go in a similar direction, but are slightly lower (around 61 percent according to the blog entry, although the authors do not give a source for this). In the open source area, source code translators have so far been significantly less precise - according to the blog, the accuracy here is a little under 40 percent, but again without the source being cited by the Facebook AI developers.

TransCoder uses three principles of self-supervised learning

Under the hood, the TransCoder developers have created a sequence-to-sequence model (seq2seq), which consists of an encoder and a decoder with a transformer architecture. The tool uses a common model for all programming languages, which is partly based on previous work by Facebook AI on XLM (Cross-lingual Language Modeling). Unattended machine translation is essentially based on initialization, language modeling and back translation.

The difference to previous source code translators based on AI models is self-supervised learning. So far, large data sets have always been necessary to train a model, but TransCoder only seems to need code in the programming language to be learned in order to capture its patterns. The system does not have to be fed with extensive translation examples in advance. It started with the three programming languages ​​C ++, Java and Python, the concept should also be transferable to other programming languages.

Migrate or Repair: Dealing with Legacy Code

According to the TransCoder development team, the creators of a model do not need any knowledge of the programming language they are teaching the model. An evaluation metric geared towards this purpose is apparently built into the system. A future application area for TransCoder could be legacy code in older languages ​​such as COBOL, for which specialists are lacking and the developers want to translate into more modern programming languages ​​for maintenance. With a functioning source code translator, code passages from different teams in companies or in open source projects would be easier to integrate, and barriers to migrating legacy code would be reduced.

As a side effect, TransCoder could also be used to debug and improve existing code with older code stocks. Facebook AI's TransCoder project team has released a test dataset. This should allow other researchers to build on the previous status of TransCoder with their own work in order to further expand self-supervised learning and computationally accurate translation of source code.

Details on the technical and mathematical background of the translation model can be found in the blog entry on Facebook AI. In addition, reading on the use of neural networks for complex mathematical equations could be useful, there are also blog entries from the Facebook development team.


Read comments (21) Go to the homepage


Mondays and Thursdays - everything from heise Developer