If I have an XYZ file with a cluster of a dozen of molecules: water, $\ce{O2}$, $\ce{H2}$, $\ce{H2O2}$. A human may easily identify these molecules. But how to do it automatically? I need to identify let's say all water and $\ce{H2}$ molecules, and I know nothing about other molecules ($\ce{O2}$, $\ce{H2O2}$). Generally speaking, I need to recognize a few known small molecules in arbitrary systems of medium size (hundreds of atoms). These systems may contain unknown molecules, or even bulk materials.Please recommend me some solution for that task. I would be happy to find either some existing software (like Python package), or some not-too-complex algorithm that I could implement myself.
Here is an exemplar structure in XYZ format:
20O -1.47655 1.21497 -0.20250H -0.53716 1.39872 -0.20387H -1.84821 1.86921 0.38917O -1.20797 -1.49263 -0.10100H -1.40154 -0.55535 -0.11741H -1.71280 -1.85472 -0.82920O 1.45295 -1.19622 0.39415H 0.53738 -1.39890 0.20209H 1.65598 -1.71658 1.17148O 1.23158 1.47389 -0.09064H 1.40132 0.55553 0.11919H 1.90503 1.70209 -0.73144O -0.76633 3.57210 -0.37181O 0.15413 3.64478 -0.39400O 2.01459 3.79209 -0.43890O 3.44119 3.98651 -0.48207H 3.86223 3.37474 0.11035H 1.59355 4.40386 -1.03132H -4.39535 2.45856 -0.19546H -4.82240 2.33535 -0.65595