|
Post by dominic on Nov 29, 2016 7:24:00 GMT
1drv.ms/f/s!AoNIuW-GkoxOgb5w8bOeMo4S3HWbxw
season 6 - 10 is done, check it over whenever you guys can and Umar you can download all the folders to your google drive if that makes it easier. I will do the movie scripts I found tomorrow as well and maybe look for some more. Otherwise I will try to get started on the presentation and report so we can try to get that done asap after the quiz this friday. Also gotta start studying for that haha.
|
|
|
Post by Umar on Nov 29, 2016 7:39:02 GMT
But just a quick request after glossing over what you posted. Since I don't have Matlab, can you format the scripts so that they have some sort of entry that marks the beginning of a scene? Kind of like this... Scene person1: blblblblahahaha person2: lalalalal ... Scene person3: lalalala ... Yes, I can. Do you need everything in one file in this format i.e. both query and response? Will do that tomorrow.
|
|
saad
New Member
Posts: 12
|
Post by saad on Nov 29, 2016 17:16:37 GMT
great !. I have also looked in umer's model,....It works
|
|
|
Post by jmewasiuk on Nov 29, 2016 17:36:50 GMT
But just a quick request after glossing over what you posted. Since I don't have Matlab, can you format the scripts so that they have some sort of entry that marks the beginning of a scene? Kind of like this... Scene person1: blblblblahahaha person2: lalalalal ... Scene person3: lalalala ... Yes, I can. Do you need everything in one file in this format i.e. both query and response? Will do that tomorrow. All I mean is that to keep the information about where there is a scene start in the sequence of lines. In the original files there is some html which indicates that. You can put it into one file, or not. It's easy enough for me to loop through files in a directory if there's more than one file.
|
|
|
Post by jmewasiuk on Nov 29, 2016 18:28:49 GMT
Ok, so I've had a chance to look closer at the example Umar is replicating. I'm assuming it's still suriyadeepan.github.io/2016-06-28-easy-seq2seq/? I think seq2seq is where we want to go. This is definitely a generative model. The input parsing I described in my other post describes what a seq2seq is doing. The decoder is holding a "running total" state of what all the previous words were plus what words in our output sequence has been generated. The decoder is generating output word-by-word. I was just doing it manually without implementing a decoder LSTM NN daisychained, and then trying to manually pad and bucket them. If TF seq2seq has this built in, this is way easier. So I think the input can be simplified to this (in pseudocode) for scene in scenes { previous_words = [] for line in lines_in_scene { if (line is not target character's line) { previous_words.add(tokenize(line)) } else { input = previous_words target = tokenize(line) seq2seq(input, target) previous_words.add(target) } } } *poo! it keeps removing my indentations!
|
|
|
Post by jmewasiuk on Nov 29, 2016 18:57:25 GMT
I'll try to get the above input model working and see what training on that gives.
|
|