TSSR Improves SMILES Generation Validity and Novelty
Researchers present TSSR, a two-stage, swap-reward-driven reinforcement learning framework for character-level SMILES generation, introduced in a preprint submitted Jan 8, 2026. Stage one rewards local token swaps to repair syntax; stage two gives RDKit-based chemistry diagnostics to reduce valence, aromaticity, and connectivity errors. Evaluated on MOSES with GRU+PPO in pure and fine-tuning RL, TSSR increases syntactic and chemical validity and novelty while preserving drug-likeness and diversity.
Key Points
- 1Introduces TSSR two-stage RL rewarding token swaps and RDKit diagnostics to repair SMILES
- 2Demonstrates substantial increases in syntactic and chemical validity and novelty on the MOSES benchmark
- 3Enables denser, interpretable rewards for molecule generators, improving quality without reducing diversity
Scoring Rationale
Model-agnostic two-stage reward design improves validity and novelty; limited impact outside SMILES-based molecular generation applications.
Sources
Public references used for this report.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems