Crash when using ChaiScript from plugins


#1

Hi,

We face a strange problem since we uses several ChaiScript engines loaded at the same time. Not sure if this related, but the engines are inside plugins (linux shared libraries loaded dynamically with dlopen). We uses gcc 4.9.1.

The second time we load the plugin, we either have a crash in the constructor of ChaiScript or later, and the call stack shows access to some ThreadStorage<> variables. We have defined DCHAISCRIPT_NO_THREADS and the problem is gone. This is not a viable solution as we use the engine from multiples threads (not at the time of the crash but later). Not defining CHAISCRIPT_HAS_THREAD_LOCAL seems to NOT solve the problem.

At this point I was not able to isolate the problem in a small test application.

Any ideas ?


#2

Multithreaded on Linux is pretty well tested, so I’m kind of surprised that CHAISCRIPT_NO_THREADS fixes it…

  • The crash occurs in both Debug and Release builds?
  • Can you post a full crash stack trace here?
  • Have you tried either moving forward to develop or moving backward to a previous release to see if that makes any difference?
  • Have you tried compiling with clang?
  • Have you tried compiling (either clang or gcc) AddressSanitizer? It will cause the application to crash at the exact moment the corruption occurs, then we can investigate the stack trace from there.

Ideally, if you could give me test code that reproduces the problem, I would work on fixing it and probably make it part of the normal test suite.


#3

Hi,

I have done some progress on this issue, thanks to Valgrind…

The problems is related to const_var(true) and const_var(false). When CHAISCRIPT_HAS_MAGIC_STATICS is defined, they are defined as static vars. Undefining CHAISCRIPT_HAS_MAGIC_STATICS fixes the problem.

As previously said, defining DCHAISCRIPT_NO_THREADS prevented to crash, but valgrind still report the same error.

In my setup I have two dynamically loaded so libraries using chaiscript, I call dlopen(so1), then dlopen(so2), dlclose(so2), then dlopen(so2) again, that crashes (or corrupts memory and crash occurs later).

It seems that static (global) object variables is not well handled in dynamically loaded libraries…

The call stack, when valgrind pop us is:

==9370== Invalid read of size 4
==9370== at 0x1520EC97: __gnu_cxx::__atomic_add(int volatile*, int) (atomicity.h:53)
==9370== by 0x1520ED54: __gnu_cxx::__atomic_add_dispatch(int*, int) (atomicity.h:96)
==9370== by 0x1527D166: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_add_ref_copy() (shared_ptr_base.h:133)
==9370== by 0x15252BD2: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&) (shared_ptr_base.h:673)
==9370== by 0x15215980: std::__shared_ptr<chaiscript::Boxed_Value::Data, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<chaiscript::Boxed_Value::Data, (__gnu_cxx::_Lock_policy)2> const&) (shared_ptr_base.h:912)
==9370== by 0x152159A6: std::shared_ptrchaiscript::Boxed_Value::Data::shared_ptr(std::shared_ptrchaiscript::Boxed_Value::Data const&) (shared_ptr.h:103)
==9370== by 0x152159CC: chaiscript::Boxed_Value::Boxed_Value(chaiscript::Boxed_Value const&) (boxed_value.hpp:197)
==9370== by 0x15216014: chaiscript::const_var(bool) (boxed_value.hpp:420)
==9370== by 0x152258FC: chaiscript::eval::Id_AST_Node::get_value(std::string const&) (chaiscript_eval.hpp:175)
==9370== by 0x152256C1: chaiscript::eval::Id_AST_Node::Id_AST_Node(std::string const&, chaiscript::Parse_Location) (chaiscript_eval.hpp:150)
==9370== by 0x152A0AEE: std::shared_ptrchaiscript::AST_Node chaiscript::make_shared<chaiscript::AST_Node, chaiscript::eval::Id_AST_Node, std::string, chaiscript::Parse_Location>(std::string&&, chaiscript::Parse_Location&&) (chaiscript_defines.hpp:72)
==9370== by 0x15261CE1: std::shared_ptrchaiscript::AST_Node chaiscript::parser::ChaiScript_Parser::make_nodechaiscript::eval::Id_AST_Node(std::string, int, int) (chaiscript_parser.hpp:702)
==9370== by 0x152373FA: chaiscript::parser::ChaiScript_Parser::Id() (chaiscript_parser.hpp:831)
==9370== by 0x1523E57D: chaiscript::parser::ChaiScript_Parser::Dot_Fun_Array() (chaiscript_parser.hpp:1904)
==9370== by 0x1524063C: chaiscript::parser::ChaiScript_Parser::Value() (chaiscript_parser.hpp:2093)
==9370== by 0x152411A0: chaiscript::parser::ChaiScript_Parser::Operator(unsigned long) (chaiscript_parser.hpp:2170)
==9370== by 0x15240794: chaiscript::parser::ChaiScript_Parser::Operator(unsigned long) (chaiscript_parser.hpp:2110)
==9370== by 0x15240794: chaiscript::parser::ChaiScript_Parser::Operator(unsigned long) (chaiscript_parser.hpp:2110)

==9370== Address 0x12fa6ea8 is 8 bytes inside a block of size 80 free’d
==9370== at 0x4A06068: operator delete(void*) (in /opt/rh/devtoolset-3/root/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==9370== by 0x13B3CA33: ???
==9370== by 0x13B17ABF: ???
==9370== by 0x13BC4B1E: ???
==9370== by 0x430DCF: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:166)
==9370== by 0x139E51D8: ???
==9370== by 0x139A87E3: ???
==9370== by 0x139A87FD: ???
==9370== by 0x139A89E7: ???
==9370== by 0x324C235EBC: __cxa_finalize (in /lib64/libc-2.12.so)
==9370== by 0x139A1B02: ???
==9370== by 0x324BE13E00: _dl_close_worker (in /lib64/ld-2.12.so)
==9370== by 0x324BE147FD: _dl_close (in /lib64/ld-2.12.so)
==9370== by 0x324BE0E265: _dl_catch_error (in /lib64/ld-2.12.so)
==9370== by 0x324CA0129B: _dlerror_run (in /lib64/libdl-2.12.so)
==9370== by 0x324CA0100E: dlclose (in /lib64/libdl-2.12.so)


#4

That unfortunate and ironic, since I’m breaking some of my own best practices rules and they are biting me in familiar ways.

Thank you very much for figuring out the issue. You should be able to operate with no MAGIC_STATICS for now, and I’ll get a fix in for it (that maintains performance) in the next release of ChaiScript.


#5

Once I have re-enabled the multithread support in chaiscript, I faces a similar problem with Thread_Storage variables (declared as thread_local static).

static std::unordered_map<void*, T> &t()
{
    thread_local static std::unordered_map<void *, T> my_t;
    return my_t;
}

Not defining CHAISCRIPT_HAS_THREAD_LOCAL fixes the problem

The problem detected by valgrind occurs on the second loading of the so library, it is like if the the global variables was not created the second time. We can see in the call stack that the dlclose has deleted the variables. The problem with const_var(true/false) was the same.

Depending the combination of so libraries in my application, the so are not always unloaded on dlclose, so preventing the problem to occur.

I suspect a problem in the compiler/linker (g++ 4.9.1 for static variables local in functions of so loaded several times with dlopen. What do you think?

==8225== Invalid read of size 8
==8225==    at 0x14FD0598: std::_Hashtable<void*, std::pair<void* const, chaiscript::detail::Dispatch_Engine::Stack_Holder>, std::allocator<std::pair<void* const, chaiscript::detail::Dispatch_Engine::Stack_Holder> >, std::__detail::_Select1st, std::equal_to<void*>, std::hash<void*>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_find_before_node(unsigned long, void* const&, unsigned long) const (hashtable.h:1438)
==8225==    by 0x14F9E763: std::_Hashtable<void*, std::pair<void* const, chaiscript::detail::Dispatch_Engine::Stack_Holder>, std::allocator<std::pair<void* const, chaiscript::detail::Dispatch_Engine::Stack_Holder> >, std::__detail::_Select1st, std::equal_to<void*>, std::hash<void*>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_find_node(unsigned long, void* const&, unsigned long) const (hashtable.h:625)
==8225==    by 0x14F6FCC5: std::__detail::_Map_base<void*, std::pair<void* const, chaiscript::detail::Dispatch_Engine::Stack_Holder>, std::allocator<std::pair<void* const, chaiscript::detail::Dispatch_Engine::Stack_Holder> >, std::__detail::_Select1st, std::equal_to<void*>, std::hash<void*>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](void* const&) (hashtable_policy.h:597)
==8225==    by 0x14F0A3EC: std::unordered_map<void*, chaiscript::detail::Dispatch_Engine::Stack_Holder, std::hash<void*>, std::equal_to<void*>, std::allocator<std::pair<void* const, chaiscript::detail::Dispatch_Engine::Stack_Holder> > >::operator[](void* const&) (unordered_map.h:627)
==8225==    by 0x14ECFB2A: chaiscript::detail::threading::Thread_Storage<chaiscript::detail::Dispatch_Engine::Stack_Holder>::operator->() const (chaiscript_threading.hpp:90)
==8225==    by 0x14E9322D: chaiscript::detail::Dispatch_Engine::get_stack_data() const (dispatchkit.hpp:1118)
==8225==    by 0x14E90D37: chaiscript::detail::Dispatch_Engine::get_parent_locals() const (dispatchkit.hpp:699)
==8225==    by 0x14EA4747: chaiscript::eval::Method_AST_Node::eval_internal(chaiscript::detail::Dispatch_Engine&) const (chaiscript_eval.hpp:1390)
==8225==    by 0x14E9705D: chaiscript::AST_Node::eval(chaiscript::detail::Dispatch_Engine&) const (chaiscript_common.hpp:482)
==8225==    by 0x14EA1DA3: chaiscript::eval::File_AST_Node::eval_internal(chaiscript::detail::Dispatch_Engine&) const (chaiscript_eval.hpp:1130)
==8225==    by 0x14E9705D: chaiscript::AST_Node::eval(chaiscript::detail::Dispatch_Engine&) const (chaiscript_common.hpp:482)
==8225==    by 0x14EB55A1: chaiscript::ChaiScript::do_eval(std::string const&, std::string const&, bool) (chaiscript_engine.hpp:276)
==8225==    by 0x14EB9B5A: chaiscript::ChaiScript::eval(std::string const&, std::shared_ptr<chaiscript::detail::Exception_Handler_Base> const&, std::string const&) (chaiscript_engine.hpp:889)
==8225==    by 0x14F18B23: void chaiscript::Module::apply_eval<chaiscript::ChaiScript, __gnu_cxx::__normal_iterator<std::string const*, std::vector<std::string, std::allocator<std::string> > > >(__gnu_cxx::__normal_iterator<std::string const*, std::vector<std::string, std::allocator<std::string> > >, __gnu_cxx::__normal_iterator<std::string const*, std::vector<std::string, std::allocator<std::string> > >, chaiscript::ChaiScript&) (dispatchkit.hpp:255)
==8225==    by 0x14EE0298: void chaiscript::Module::apply<chaiscript::ChaiScript, chaiscript::detail::Dispatch_Engine>(chaiscript::ChaiScript&, chaiscript::detail::Dispatch_Engine&) const (dispatchkit.hpp:191)
==8225==    by 0x14EB8E90: chaiscript::ChaiScript::add(std::shared_ptr<chaiscript::Module> const&) (chaiscript_engine.hpp:731)
==8225==    by 0x14EB6452: chaiscript::ChaiScript::build_eval_system(std::shared_ptr<chaiscript::Module> const&) (chaiscript_engine.hpp:356)
==8225==    by 0x14EB88E5: chaiscript::ChaiScript::ChaiScript(std::shared_ptr<chaiscript::Module> const&, std::vector<std::string, std::allocator<std::string> >, std::vector<std::string, std::allocator<std::string> >) (chaiscript_engine.hpp:456)


==8225==  Address 0x12c85330 is 64 bytes inside a block of size 88 free'd
==8225==    at 0x4A06068: operator delete(void*) (in /opt/rh/devtoolset-3/root/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==8225==    by 0x137E35E7: ???
==8225==    by 0x137C6550: ???
==8225==    by 0x137A4457: ???
==8225==    by 0x1377956B: ???
==8225==    by 0x13746FE3: ???
==8225==    by 0x137180F7: ???
==8225==    by 0x136B2A75: ???
==8225==    by 0x1393FF05: ???
==8225==    by 0x324C235EBC: __cxa_finalize (in /lib64/libc-2.12.so)
==8225==    by 0x1362AAF2: ???
==8225==    by 0x324BE13E00: _dl_close_worker (in /lib64/ld-2.12.so)
==8225==    by 0x324BE147FD: _dl_close (in /lib64/ld-2.12.so)
==8225==    by 0x324BE0E265: _dl_catch_error (in /lib64/ld-2.12.so)
==8225==    by 0x324CA0129B: _dlerror_run (in /lib64/libdl-2.12.so)
==8225==    by 0x324CA0100E: dlclose (in /lib64/libdl-2.12.so)

#6

It does make me wonder if it’s some problem with the compiler or linker options or something.

Do you have a complete example you can post so I can test this myself? Have you tried with other compilers?


#7

I still would like to properly fix this, if possible, but as I understand, if you:

undefine CHAISCRIPT_HAS_MAGIC_STATICS
undefine CHAISCRIPT_HAD_THREAD_LOCAL

Then you can safely use it in a multithreaded environment while loading from your plugins?

On the ‘develop’ branch I’ve recently done some work that mitigates most of the expense of not having thread local variables. This was necessary for performance on MSVC2013.

I feel confident that you should get reasonable performance, plus the thread safety that you need if you move forward with those settings.


#8

Yes, by undefining these two defines, I can use chaiscript from multiples plugins in the same time without problems.

Glad to know that you have increased performance when thread_local is not available.

Back to the problem, I realised that when we use function local static variables (as used by const_var(true) and Thread_Storage with thread_local), the compiler marks the symbols with STB_GNU_UNIQUE flag and the dynamic loader insures that a single copy exists in memory for all plugins.

So when a plugin is unloaded all variables are deleted, except those that are marked as unique, can this invalidate these global unique variables, such as at next plugin loading we have a crash.

With this info can you see an explanation of the crash ?


#9

I’ve been thinking about this for the last several days and no - that doesn’t really help explain the issue. This is the kind of code I’ve used with success in other projects before.

Access to some source would be best. But I’m starting to think I need to entirely re-think how I manage the local stack and not use statics at all (thread_local or otherwise). Plus I just realized there’s a resource leak if you are creating and destroying threads and don’t have thread_local support enabled.

I also need to make sure I understand fully what you are doing.

  • You have a loadable module that creates a ChaiScript instance(s)?
  • The loadable module is loaded multiple times somehow?
  • The ChaiScript instance(s) are then accessed simultaneously from multiple threads?

-Jason


#10

Deep debugging analysis let me conclude on a g++ bug, that was reported (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66830).

See https://www.redhat.com/archives/posix-c++-wg/2009-August/msg00002.html for a explanantion of unique symbols.

No feedback received so far, so I will probably build a small test application that reproduce it.

Philippe


#11

Bug report to gcc (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66830) was completed with a small test application.

Can you take a look and tell me if you come to the same conclusion, please ?


#12

I will definitely look into it. I’ve been wondering if (from a ChaiScript perspective) your project could be solved by simply #include "chaiscript.hpp" somewhere in your core application, to give these static objects a guaranteed lifetime.

Is that an option that can be considered?

Also, I realized that I’ve never really worked on a project that unloaded shared libraries at runtime. Only one that loaded them.


#13

I just read the response you got from the G++ developers. I think this topic is worth of making sticky.

Saved here for anyone else who might come across this issue:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66830#c2

Jonathan Wakely 2015-07-21 17:11:58 UTC
(In reply to pleuba from comment #0)

Is this problem already known ?

Yes, it has been known for a long time, but it’s not an ideal situation and there is currently no better solution than -fno-gnu-unique


#14

#15

An #include “chaiscript.hpp” will not solve the problem for two reasons:

  • In order for the static variable to be instantiated, the inline function containing this variable need to be called, so in the case of chaiscript, an engine need to be created.
  • A test showed me that unique symbols are only shared between plugins, but not with the main process.

So for me the only working solution is the ‘-fno-gnu-unique’ compiler option.