saad
New Member
Posts: 12
|
Post by saad on Nov 25, 2016 21:43:19 GMT
This website is cool, It is exactly the thing we want, Train different models with different personalities bottr.me/
|
|
saad
New Member
Posts: 12
|
Post by saad on Nov 25, 2016 21:43:52 GMT
It uses deep learning with NLP. I am not sure if it uses Torch or TF
|
|
saad
New Member
Posts: 12
|
Post by saad on Nov 25, 2016 21:46:27 GMT
|
|
saad
New Member
Posts: 12
|
Post by saad on Nov 25, 2016 21:49:14 GMT
|
|
|
Post by dominic on Nov 26, 2016 1:38:09 GMT
Here is a link yet again to one of my onedrives. I put a bunch of scripts and I hope the format is consistent enough to transcribe it. 1drv.ms/f/s!AoNIuW-GkoxOgb5w8bOeMo4S3HWbxw Here is the link I got them from in case you guys wanna get anything specific: www.imsdb.com
|
|
|
Post by Umar on Nov 28, 2016 7:58:28 GMT
Hey Everyone, So I trained the seq2seq model with cornell movie database and it works. Not great, but it does.
Then, I wrote a script in Matlab to format the friends data from all episodes. The script does its job, but i still need to prep the data manually first, which takes some time. I have done 2 seasons so far and hopefully will do the rest quickly.
The format I need for training is 2 files: 1 containing the query (input) and the other containing the response (output). So, there are equal number of sentences in both files.
I programmed the script to look for a line said by the desired character (say Chandler) and treat it as target. Then look for the line just before this one and treat is as input.
It goes like this: Any person: Are you tired? Chandler: Yes, I am.
input->Are you tired? target->Yes, I am
Tensorfow's seq2seq model offers padding schemes, so I don't have to worry about different lengths of input and target vectors. We just need to define a bucket size for it to pad automatically.
As soon as I format all episodes, I will train the model.
|
|
|
Post by jmewasiuk on Nov 28, 2016 8:29:18 GMT
Do you have code so we can all work on it?
So just a different question... the work I did yesterday, are we going to use it? I haven't really heard any responses from anybody regarding it and I put in a good 14 hours on saturday to get that far.
|
|
|
Post by Umar on Nov 29, 2016 1:23:23 GMT
Hi all, The matlab script, processed data files and seq2seq model are in my google drive, drive.google.com/drive/folders/0B-enYXLoEzulTm1jM21tTmxzcGM?usp=sharingPlease, see the readme file to procced. I have done 5 seasons so far, but don't have more time today. Can somebody do the rest? The script I wrote, needs Excel files in a format, which needs to be converted. What I need you to do is look at the html files in html_episodes folder. Open them in the browser, copy the script text only leaving the 'written by info; at the top and 'END' at the bottom. When you have copied the text, past it in an excel file. Then do a 'text to columns' operation with the colon delimiter ":" to separate into 2 columns. Format the first column with character identifiers (such as chandler, rachel etc) as text and then save. You can look at the processed Excel files in the Data folder to see what I need. Thanks
|
|
|
Post by Umar on Nov 29, 2016 1:28:49 GMT
Oh, one more thing. The model I trained with Chandler had low training error and high testing error. So its over fitting I guess. I am looking to figure out how to deal with it in tensorflow. I noticed that same thing with the Cornell data base. Do you guy have some info about this? I can't spend more time on it today as I have to write my Research Report.
|
|
|
Post by dominic on Nov 29, 2016 2:43:17 GMT
I can try to separate the scripts into excel files for you, unless someone is already doing it. I will look into the read me file and ask you if I have any questions.
|
|
|
Post by Umar on Nov 29, 2016 2:55:02 GMT
Sure. Thanks.
I also have started training the model again for Chandler with 4 seasons as training data and 1 season as test data. Lets see what I get.
|
|
|
Post by dominic on Nov 29, 2016 4:34:35 GMT
I am working through the scripts but here is the link and I will update it through the night as I progress through each script. 1drv.ms/f/s!AoNIuW-GkoxOgb5w8bOeMo4S3HWbxw
|
|
|
Post by Umar on Nov 29, 2016 4:46:51 GMT
Hey guys. So even though I am writing my research report, I couldn't stop myself from peeking into the model trained so far. Observation: It definitely doesn't look like a retrieval model. It is generating sentences from the script, so that makes it generative?
Sample Responses, -Hi Hey -Hello You 're you out -Are you a male? Ah, y're 're right. -Are you a female? No. -Central Perk You pants
Lol at 'You Pants'
It replies 'You are in' and 'You are out' a lot. I guess it does that when it can't find a proper response.
Okay, I'll let it train more for now and work on my report.
|
|
|
Post by jmewasiuk on Nov 29, 2016 6:45:19 GMT
I'll take a closer look tomorrow. Working on some compiler stuff right now.
|
|
|
Post by jmewasiuk on Nov 29, 2016 6:55:56 GMT
But just a quick request after glossing over what you posted. Since I don't have Matlab, can you format the scripts so that they have some sort of entry that marks the beginning of a scene?
Kind of like this...
Scene person1: blblblblahahaha person2: lalalalal ... Scene person3: lalalala ...
|
|