File tree Expand file tree Collapse file tree 1 file changed +11
-2
lines changed
Expand file tree Collapse file tree 1 file changed +11
-2
lines changed Original file line number Diff line number Diff line change @@ -549,8 +549,11 @@ Subtasks:
549549 a `tgt_mask` and a `src_mask` here. `tgt_mask` has both the causal mask and the pad mask applied
550550 for the English input into the Decoder. `src_mask` has the pad mask applied to it.
551551
552- You' ll need to think about where to input the ` src_mask` vs the ` tgt_mask` (hint: the only function
553- that actually deploys any masks is the ` scaled_dot_product_attention` function)
552+ You' ll need to think about where to input the ` src_mask` vs the ` tgt_mask` (hint: the only function
553+ that actually deploys any masks is the ` scaled_dot_product_attention` function)
554+
555+ Remember that our LM task will be decoder-only, so we don' t want to do cross-attention in this case.
556+ When `enc_x` is `None`, make sure to skip the cross-attention step in your `DecoderLayer`.
554557
555558 * Implement `Decoder`. This will be a `ModuleList` of your `DecoderLayer`s, just like in the `Encoder`.
556559 It will also need to handle the target embeddings and positional encoding.
@@ -602,6 +605,12 @@ We've implemented the LM training script for you! Just add the same line
602605that you added in the NMT task in the ` TODO` line in
603606` scripts/train_lm.py` .
604607
608+ # # Tracking experiments
609+
610+ We' ve added simple `wandb` logging to your training scripts.
611+ Make sure to fill in your entity names in both scripts to track
612+ your experiments!
613+
605614## Start training!
606615
6076161. Set your devices to be different values (based on which GPUs
You can’t perform that action at this time.
0 commit comments