The detection of groups of individuals is attracting the attention of many researchers in diverse fields, from surveillance to social robotics, with a growing number of approaches published every year. Unexpectedly, the evaluation metrics for this problem are not consolidated, with some measures being inherited from the people detection field, others come from clustering or were designed specifically for a particular approach, thus lacking in generalization and making the comparisons hard to be carried out. Moreover, the existent metrics are scarcely expressive, ignoring, for example, the fact that groups have different cardinalities, and that, obviously, larger groups are harder to find. This work fills this gap by presenting the GROup DEtection (GRODE) metrics, which formally define precision and recall on the groups, including the group cardinality as a variable. This gives the possibility to investigate aspects never considered so far, such as the tendency of a method of over- or undersegmenting groups, or of better dealing with specific group cardinalities. The GRODE metrics have been applied to all the publicly available approaches of group detection, on several datasets, discovering interesting strengths and pitfalls so far neglected from the state-of-the-art metrics.