Bug in CTC forced alignment: softmax used instead of log_softmax


## 🐛 Bug in CTC forced alignment: softmax used instead of log_softmax

There's a bug in the CTC forced alignment implementation in SenseVoice model where softmax is incorrectly used instead of log_softmax, causing incorrect alignment results.

### To Reproduce

Just read the code, and one could see it.


#### Code sample

branch: main @252eef8b8b29b603d10bc640bc4f0c3fe12c3604
Location: funasr/models/sense_voice/model.py, line 933

Current (incorrect) code:

```
logits_speech = self.ctc.softmax(encoder_out)[i, 4 : encoder_out_lens[i].item(), :]
pred = logits_speech.argmax(-1).cpu()
logits_speech[pred == self.blank_id, self.blank_id] = 0
align = ctc_forced_align(
    logits_speech.unsqueeze(0).float(),
    torch.Tensor(token_ids).unsqueeze(0).long().to(logits_speech.device),
    (encoder_out_lens[i] - 4).long(),
    torch.tensor(len(token_ids)).unsqueeze(0).long().to(logits_speech.device),
    ignore_id=self.ignore_id,
)

```
The issue: The `ctc_forced_align` function expects log probabilities, not regular probabilities.

Evidence from funasr/models/sense_voice/utils/ctc_alignment.py:
- Line 3: Parameter is named log_probs
- Line 12: Docstring states: "log_probs (Tensor): log probability of CTC emission output."
- **Line 53: Uses log-space arithmetic**: 
```
best_score[:, padding_num:] = log_probs[:, t].gather(-1, _t_a_r_g_e_t_s_) + prev_max_value
```
### Expected behavior

The code should use log_softmax instead of softmax:

logits_speech = self.ctc.log_softmax(encoder_out)[i, 4 : encoder_out_lens[i].item(), :]

This will provide log probabilities (range: -∞ to 0) as expected by the ctc_forced_align function, instead of regular probabilities (range: 0 to 1).

### Environment

Not necessary for this bug.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug in CTC forced alignment: softmax used instead of log_softmax #2755

🐛 Bug in CTC forced alignment: softmax used instead of log_softmax

To Reproduce

Code sample

Expected behavior

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug in CTC forced alignment: softmax used instead of log_softmax #2755

Description

🐛 Bug in CTC forced alignment: softmax used instead of log_softmax

To Reproduce

Code sample

Expected behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions