AI model maps plant DNA controls linked to crop traits
Researchers say a deep learning tool trained on Arabidopsis can predict plant gene-control sites and apply those patterns to maize.
By Tom Brennan · Health & Medicine Correspondent
3 min read
An international team has built an AI model that predicts where key regulatory proteins bind to plant DNA, a step that could help researchers connect genetic variation with traits in crops. Forschungszentrum Jülich and the IPK Leibniz Institute said the tool was trained on Arabidopsis thaliana data and also worked in maize, where such data are harder to obtain.
The study, published in Nature Communications, focuses on transcription factors, proteins that attach to DNA and influence whether genes are activated and how strongly they are expressed. The researchers said these regulatory regions help explain why plants with similar genes can grow differently or respond differently to stress.
Training on a model plant
According to the research team, the model learned from hundreds of experimental DNA-binding data sets from Arabidopsis, a widely used plant in genetics research. The system was designed to recognize binding patterns across 46 transcription factor families at the same time.
The team said that approach differs from earlier methods that often trained separate models for individual transcription factors. By using a multi-label design, the researchers tested whether the model could identify binding sites it had not previously seen and point to regulatory links across the genome.
Fritz Forbang Peleke, first author of the study, said the results indicate that transcription factors do not act only by recognizing short, isolated DNA motifs. Peleke said the surrounding sequence and the arrangement of multiple signals also shape binding, making gene regulation dependent on context.
Using predicted binding patterns, the model grouped Arabidopsis genes by likely regulation. The researchers said thousands of genes were sorted into 14 broad regulatory clusters, several of which matched shared biological functions and coordinated gene activity.
From DNA variants to plant traits
The team also studied more than 7,000 DNA variants that previous genome-wide studies had associated with traits including flowering time, disease resistance and seedling growth. The researchers reported that about one-fifth of those variants were predicted to alter transcription factor binding.
Jędrzej Szymański, who leads research groups at IPK and Forschungszentrum Jülich, said the model gives scientists a way to estimate how a single regulatory DNA change may affect gene activity and a plant trait. He said that can help connect statistical trait associations to possible molecular mechanisms.
In one flowering-time case, the model predicted that a single base change in a regulatory DNA region would change the binding of several transcription factors at once. The researchers said they then confirmed that prediction with a high-throughput reporter assay.
Testing the approach in maize
Although the model was trained only on Arabidopsis, the team said it could also be used in maize, a distantly related crop. In maize, the system helped identify transcription factors involved in heat-stress responses.
The researchers said known heat-stress regulators, including heat shock factors, appeared as important signals in the maize analysis. They said that result suggests the method could support crop studies in species where experimental transcription factor binding data remain limited.
The paper is titled “Genome-wide modelling of plant transcription factor binding captures regulatory variants associated with phenotypic traits.” It was authored by Peleke and colleagues and published in Nature Communications in 2026.
This story draws on original reporting from Phys.org.