Project Discussion

saad New Member Posts: 12	Project Discussion Nov 25, 2016 21:43:19 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by saad on Nov 25, 2016 21:43:19 GMT This website is cool, It is exactly the thing we want, Train different models with different personalities bottr.me/

saad New Member Posts: 12	Project Discussion Nov 25, 2016 21:43:52 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by saad on Nov 25, 2016 21:43:52 GMT It uses deep learning with NLP. I am not sure if it uses Torch or TF

saad New Member Posts: 12	Project Discussion Nov 25, 2016 21:46:27 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by saad on Nov 25, 2016 21:46:27 GMT I was reading this publication, Very good work arxiv.org/abs/1603.06155

saad New Member Posts: 12	Project Discussion Nov 25, 2016 21:49:14 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by saad on Nov 25, 2016 21:49:14 GMT github.com/chamkank/flask-chatterbot

dominic
New Member

Posts: 20

Project Discussion Nov 26, 2016 1:38:09 GMT

Quote

Post by dominic on Nov 26, 2016 1:38:09 GMT

Here is a link yet again to one of my onedrives. I put a bunch of scripts and I hope the format is consistent enough to transcribe it.

1drv.ms/f/s!AoNIuW-GkoxOgb5w8bOeMo4S3HWbxw

Here is the link I got them from in case you guys wanna get anything specific: www.imsdb.com

Umar
Administrator

Posts: 15

Project Discussion Nov 28, 2016 7:58:28 GMT

Quote

Post by Umar on Nov 28, 2016 7:58:28 GMT

Hey Everyone,
So I trained the seq2seq model with cornell movie database and it works. Not great, but it does.

Then, I wrote a script in Matlab to format the friends data from all episodes. The script does its job, but i still need to prep the data manually first, which takes some time. I have done 2 seasons so far and hopefully will do the rest quickly.

The format I need for training is 2 files: 1 containing the query (input) and the other containing the response (output). So, there are equal number of sentences in both files.

I programmed the script to look for a line said by the desired character (say Chandler) and treat it as target. Then look for the line just before this one and treat is as input.

It goes like this:
Any person: Are you tired?
Chandler: Yes, I am.

input->Are you tired?
target->Yes, I am

Tensorfow's seq2seq model offers padding schemes, so I don't have to worry about different lengths of input and target vectors. We just need to define a bucket size for it to pad automatically.

As soon as I format all episodes, I will train the model.

jmewasiuk
New Member

Posts: 23

Project Discussion Nov 28, 2016 8:29:18 GMT

Quote

Post by jmewasiuk on Nov 28, 2016 8:29:18 GMT

Do you have code so we can all work on it?

So just a different question... the work I did yesterday, are we going to use it? I haven't really heard any responses from anybody regarding it and I put in a good 14 hours on saturday to get that far.

Last Edit: Nov 28, 2016 8:34:51 GMT by jmewasiuk

Umar
Administrator

Posts: 15

Project Discussion Nov 29, 2016 1:23:23 GMT

Quote

Post by Umar on Nov 29, 2016 1:23:23 GMT

Hi all,

The matlab script, processed data files and seq2seq model are in my google drive,
drive.google.com/drive/folders/0B-enYXLoEzulTm1jM21tTmxzcGM?usp=sharing

Please, see the readme file to procced.

I have done 5 seasons so far, but don't have more time today.

Can somebody do the rest?
The script I wrote, needs Excel files in a format, which needs to be converted.
What I need you to do is look at the html files in html_episodes folder. Open them in the browser, copy the script text only leaving the 'written by info; at the top and 'END' at the bottom.
When you have copied the text, past it in an excel file. Then do a 'text to columns' operation with the colon delimiter ":" to separate into 2 columns. Format the first column with character identifiers (such as chandler, rachel etc) as text and then save. You can look at the processed Excel files in the Data folder to see what I need.

Thanks

Last Edit: Nov 29, 2016 1:24:56 GMT by Umar

Umar
Administrator

Posts: 15

Project Discussion Nov 29, 2016 1:28:49 GMT

Quote

Post by Umar on Nov 29, 2016 1:28:49 GMT

Oh, one more thing.
The model I trained with Chandler had low training error and high testing error. So its over fitting I guess. I am looking to figure out how to deal with it in tensorflow. I noticed that same thing with the Cornell data base. Do you guy have some info about this?

I can't spend more time on it today as I have to write my Research Report.

Last Edit: Nov 29, 2016 1:29:41 GMT by Umar

dominic New Member Posts: 20	Project Discussion Nov 29, 2016 2:43:17 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by dominic on Nov 29, 2016 2:43:17 GMT I can try to separate the scripts into excel files for you, unless someone is already doing it. I will look into the read me file and ask you if I have any questions.
	Last Edit: Nov 29, 2016 2:43:35 GMT by dominic

Umar Administrator Posts: 15	Project Discussion Nov 29, 2016 2:55:02 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Umar on Nov 29, 2016 2:55:02 GMT Sure. Thanks. I also have started training the model again for Chandler with 4 seasons as training data and 1 season as test data. Lets see what I get.
	Last Edit: Nov 29, 2016 2:56:39 GMT by Umar

dominic New Member Posts: 20	Project Discussion Nov 29, 2016 4:34:35 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by dominic on Nov 29, 2016 4:34:35 GMT I am working through the scripts but here is the link and I will update it through the night as I progress through each script. 1drv.ms/f/s!AoNIuW-GkoxOgb5w8bOeMo4S3HWbxw

Umar
Administrator

Posts: 15

Project Discussion Nov 29, 2016 4:46:51 GMT dominic likes this

Quote

Post by Umar on Nov 29, 2016 4:46:51 GMT

Hey guys. So even though I am writing my research report, I couldn't stop myself from peeking into the model trained so far. Observation: It definitely doesn't look like a retrieval model. It is generating sentences from the script, so that makes it generative?

Sample Responses,
-Hi
Hey
-Hello
You 're you out
-Are you a male?
Ah, y're 're right.
-Are you a female?
No.
-Central Perk
You pants

Lol at 'You Pants'

It replies 'You are in' and 'You are out' a lot. I guess it does that when it can't find a proper response.

Okay, I'll let it train more for now and work on my report.

jmewasiuk New Member Posts: 23	Project Discussion Nov 29, 2016 6:45:19 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by jmewasiuk on Nov 29, 2016 6:45:19 GMT I'll take a closer look tomorrow. Working on some compiler stuff right now.

jmewasiuk
New Member

Posts: 23

Project Discussion Nov 29, 2016 6:55:56 GMT

Quote

Post by jmewasiuk on Nov 29, 2016 6:55:56 GMT

But just a quick request after glossing over what you posted. Since I don't have Matlab, can you format the scripts so that they have some sort of entry that marks the beginning of a scene?

Kind of like this...

Scene
person1: blblblblahahaha
person2: lalalalal
...
Scene
person3: lalalala
...

Ml-group

Project Discussion

Post by saad on Nov 25, 2016 21:43:19 GMT

Post by saad on Nov 25, 2016 21:43:52 GMT

Post by saad on Nov 25, 2016 21:46:27 GMT

Post by saad on Nov 25, 2016 21:49:14 GMT

Post by dominic on Nov 26, 2016 1:38:09 GMT

Post by Umar on Nov 28, 2016 7:58:28 GMT

Post by jmewasiuk on Nov 28, 2016 8:29:18 GMT

Post by Umar on Nov 29, 2016 1:23:23 GMT

Post by Umar on Nov 29, 2016 1:28:49 GMT

Post by dominic on Nov 29, 2016 2:43:17 GMT

Post by Umar on Nov 29, 2016 2:55:02 GMT

Post by dominic on Nov 29, 2016 4:34:35 GMT

Post by Umar on Nov 29, 2016 4:46:51 GMT

Post by jmewasiuk on Nov 29, 2016 6:45:19 GMT

Post by jmewasiuk on Nov 29, 2016 6:55:56 GMT

Quick Reply