412426 – agent.srvcon : doing successive srvcon operations leads to incomplete M3DA data sent to server

Bug 412426 - agent.srvcon : doing successive srvcon operations leads to incomplete M3DA data sent to server

Summary: agent.srvcon : doing successive srvcon operations leads to incomplete M3DA da...

Status:	RESOLVED FIXED

Alias:	None

Product:	Mihini
Classification:	IoT
Component:	Agent (show other bugs)
Version:	0.9
Hardware:	PC Linux

Importance:	P1 major (vote)
Target Milestone:	0.9 M1
Assignee:	Project Inbox
QA Contact:	Robert Jacolin

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2013-07-05 13:46 EDT by Laurent Barthelemy
Modified:	2013-08-02 04:50 EDT (History)
CC List:	1 user (show)

See Also:

Attachments
Lua code to reproduce the issue (633 bytes, text/plain) 2013-07-05 13:46 EDT, Laurent Barthelemy	no flags	Details
Correct M3DA message (300 bytes, text/plain) 2013-07-05 13:47 EDT, Laurent Barthelemy	no flags	Details
Incomplete M3DA message (225 bytes, text/plain) 2013-07-05 13:47 EDT, Laurent Barthelemy	no flags	Details
Server logs (34.51 KB, text/plain) 2013-07-05 13:48 EDT, Laurent Barthelemy	no flags	Details
Tcp dump (17.93 KB, application/octet-stream) 2013-07-05 13:48 EDT, Laurent Barthelemy	no flags	Details
simpler code sample reproducing the issue (539 bytes, text/plain) 2013-07-12 14:00 EDT, Fabien Fleutot	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Laurent Barthelemy

2013-07-05 13:46:43 EDT

Created attachment 233137 [details]
Lua code to reproduce the issue

agent.srvcon : doing successive srvcon operations leads to incomplete M3DA data sent to server


While executing racon unittest suite, some server connections end with "connection closed".
In m3da-server logs we see: "m3da.codec.DecoderException: this is not a map OPCODE : 97" (More server logs in attachement).

I have found a way to reproduce the issue every time: 
 -a specific succession of calls to Lua user API connectToServer, pushData, triggerPolicy make this appear
 - see reproduce_connection_close.txt in attachement.
 
I don't think we should focus on "OPCODE 97" ., it's more likely an incomplete message (see below).

I've checked EMP data coming from asset, it's seems ok to me.

Looking into the function used to reproduce the problem : I have remark
 - need to call internal_code function 2 times quickly to reproduce the problem
 - when replacing connectToServer(0) by connectToServer(1) or connectToServer(nil) seems to avoid the issue.
Those 2 remarks make me think it may be a concurrency issue.

Now looking at message sent to the server, especially those containing the asset data
(see m3da_msg1.txt and m3da_msg2.txt): As the same variable is pushed, I would have expected the M3DA payload to be identical, there are not:
in m3da_msg2, I think the M3DA msg is incomplete: the body is missing!

I was running locally m3da server from https://github.com/SierraWireless/m3da-server, rev 51a7521967e5a4fdf6d0e407f190145f3a6159e3.

I've also attached the full tcpdump.

Comment 1 Laurent Barthelemy

2013-07-05 13:47:34 EDT

Created attachment 233138 [details]
Correct M3DA message

Comment 2 Laurent Barthelemy

2013-07-05 13:47:52 EDT

Created attachment 233139 [details]
Incomplete M3DA message

Comment 3 Laurent Barthelemy

2013-07-05 13:48:12 EDT

Created attachment 233140 [details]
Server logs

Comment 4 Laurent Barthelemy

2013-07-05 13:48:31 EDT

Created attachment 233141 [details]
Tcp dump

Comment 5 Fabien Fleutot

2013-07-12 14:00:29 EDT

Created attachment 233435 [details]
simpler code sample reproducing the issue

It's a race condition between agent.srvcon.connectToServer() and the racon.triggerPolicy() operations. The first connectToServer will be acknowledged while the first triggerPolicy runs, and will purge an SDB table without properly dereferencing it from its policy.

Given that connectToServer() isn't an official API (only accessible from within the agent's process), and that I can't reproduce this without calling connectToServer calls in the middle of normal EMP-driven commands, is it really a bug?

Comment 6 Fabien Fleutot

2013-07-16 06:22:08 EDT

Fixed: the list of database-cleaning callbacks, to be executed when srvcon receives the server's "200 OK", was the wrong one. Instead of taking the callbacks present when data were sent, it was taking the callbacks present when the OK was received, thus incorrectly cleaning the data buffered accumulated between the request emission and the response reception.