2017-08-15 70 views
1

我想共享使用多处理模块的sharedctypes部分的进程之间的一些字符串。在多处理sharedctypes中存储字符串Array

TL; DR: 我希望把我的琴弦成sharedctypes阵列,像这样:

from multiprocessing.sharedctypes import Array 

Array(ctypes.c_char, ['a string', 'another string']) 

更多信息:

docs有这样一个字条:

请注意,ctypes.c_char的数组具有值和原始属性,允许用户使用它来存储和检索字符串。

使用c_char独自:

from multiprocessing.sharedctypes import Array 

Array(ctypes.c_char, ['a string', 'another string']) 

我得到一个类型的错误,这是有道理的:

TypeError: one character bytes, bytearray or integer expected 

这可以(种)由splittingthe蜇到字节的工作(这使得也有意义):

from multiprocessing.sharedctypes import Array 

multiproccessing.sharedctypes.Array(ctypes.c_char, [b's', b't', b'r', b'i', b'n', b'g']) 

但这不是很方便f或存储大量字符串列表。

然而,当我尝试使用的文档here所示,并提到在valueraw属性仍然没有魔法:

Array(ctypes.c_char.value, ['string']) 

给出了这样的错误:

TypeError: unsupported operand type(s) for *: 'getset_descriptor' and 'int' 

raw给出了这个:

Array(ctypes.c_char.raw, ['string']) 

AttributeError: type object 'c_char' has no attribute 'raw' 

我也使用c_wchar_p类型的在原始C兼容型数据类型的表(在docs找到)直接对应于字符串的尝试:

Array(ctypes.c_wchar_p, ['string']) 

崩溃蟒,没有错误码是报告,该过程只是退出代码0.

为什么不能共享类型数组像c_wchar_p类型保存指针?关于如何将字符串存储在共享类型数组中的任何其他解决方案或建议非常受欢迎!

更新 - 此代码偶尔作品(大部分时间蟒停止工作,但偶尔我得到的字符串回来,虽然他们大多是废话)。但评论提到它在Windows上工作正常。

from multiprocessing import Process, Lock 
from multiprocessing.sharedctypes import Value, Array 
import ctypes 


def print_strings(S): 
    """Print strings in the C array""" 
    print([a for a in S]) 

if __name__ == '__main__': 
    lock = Lock() 
    string_array = Array(ctypes.c_wchar_p, ['string']) 
    q = Process(target=print_strings, args=(string_array,)) 
    q.start() 
    q.join() 

Update 2

This is the gibberish I get:

['汣猎癞汥⁹景椠瑮搠祴数\u2e73ਊ††敓\u2065汁潳\u200a †ⴠⴭⴭⴭਭ††捳滟\u2e79灳捥慩\u2e6c癞\u202c捳滟\u2e79灳捥慩\u2e6c癞\u0a65\u200a†丠琅獥\u200a†ⴠⴭⴭ\u200a†圠\u2065猎\u2065桴\u2065污潧楲桴\u206d异汢獩敨\u2064祢䌠敬狝慨⁷ㅛ彝愠摮爠晥牥湥散\u2064祢\u200a†䄠牢浡睯莹⁺湡\u2064瑓来湵嬠崲\u2c5f映牯眠桢档琠敨映湵琐潩\u206e润慭湩椠ੳ†慰玱莹潩敮\u2064湩潴琠敨琠潷椠瑮牥庆獬嬠ⰰ崸愠摮⠠ⰸ湩⥦\u202c湡\u2064桃扥獹敨\u0a76††溃祬潮业污攠灸湡楳汤\u2073牡\u2065浥汰祯摥椠\u206e慥档椠瑮牥庆\u2e6c删汥瑡癞\u2065牥潲\u2072汤\u200a†琠敨搠浯楡\u206eせ㌬崰甠楳杮䤠䕅⁅牡莹浨瑥捩椠\u2073润畣敭瑮摥嬠崳\u205f獡栠痴湩\u2067\u0a61††数欢漠\u2066⸵攸ㄭ‶楷桴愠\u206e浲\u2073景ㄠ㐮ⵥ㘱⠠ \u206e‽ 〳〰⤰ਮ\u200a†删晥牥湥散ੳ††ⴭⴭⴭⴭⴭ\u200a†⸠\u202eㅛ⁝\u2e43圠\u202e汃湥桳睡\u202c䌢敨祢桳症猠牥敩\u2073潦\u2072慭桴浥瑡捩污映湵琐潩狝Ⱒ椠੮†††††⨠慎楴汤污倠票楳惯\u206c慌潢慲潴祲䴠瑡敨慭楴惯\u206c慔汢獥Ⱚ瘠汯\u202eⰵ䰠汤润㩮\u200a†††††效\u2072愠敪瑳❹\u2073瑓瑡潩敮祲传晦捩ⱥㄠ㘹⸲\u200a†⸠\u202e㉛⁝\u2e4d䄠牢浡睯莹⁺湡\u2064\u2e49䄠\u202e瑓来湵\u202c䠪湡扤浔\u206b景䴠瑡敨慭楴惯੬†††††䘠湵琐潩狝Ⱚㄠ琰\u2068牰湩楴杮\u202c敎⁷沩岁›漱敶Ⱳㄠ㘹ⰴ潆\u2e70㌠㤷ਮ†††††栠瑴㩰⼯睷\u2e77慭桴献畦挮⽡捾浢愯湡獤瀯条彦㜳⸹瑨੭††⸮嬠崳栠瑴㩰⼯潫敢敳牡档挮慰\u2e6e牯⽧瑨润獣䴯瑡\u2d68暋桰獥䴯瑡⽨暋桰獥栮浴੬\u200a†䔠慸灭敬ੳ††ⴭⴭⴭⴭ\u200a†㸠㸾渠\u2e70ど嬨⸰⥝\u200a†愠牲祡ㄨ〮\u0a29††㸾‾灮椮⠰せⰮㄠ\u20 2e\u202b樲⥝\u200a†愠牲祡嬨ㄠ〮〰〰〰⬰⸰\u206a†††Ⱐ†⸰㠱㠷㌵㌷〫㘮㘴㘱㐹樴⥝ਊ††', 'ਊ††敓\u2065汁潳\u200a†ⴠⴭⴭⴭਭ††捳滟\u2e79灳捥慩\u2e6c癞\u202c捳滟\u2e79灳捥慩\u2e6c癞\u0a65\u200a†丠琅獥\u200a†ⴠⴭⴭ\u200a†圠\u2065猎\ u2065桴\u2065污潧楲桴\u206d异汢獩敨\u2064祢䌠敬狝慨⁷ㅛ彝愠摮爠晥牥湥散\u2064祢\u200a†䄠牢浡睯莹⁺湡\u2064瑓来湵嬠崲\u2c5f映牯眠桢档琠敨映湵琐潩\u206e润慭湩椠ੳ††慰玱莹潩敮\u2064湩潴琠敨琠潷椠瑮牥庆獬嬠ⰰ崸愠摮⠠ⰸ湩⥦ \u202c湡\u2064桃扥獹敨\u0a76††溃祬潮业污攠灸湡楳汤\u2073牡\u2065浥汰祯摥椠\u206e慥档椠瑮牥庆\u2e6c删汥瑡癞\u2065牥潲\u2072汤\u200a†琠敨搠浯楡\u206eせ㌬崰甠楳杮䤠䕅⁅牡莹浨瑥捩椠\u2073润畣敭瑮摥嬠崳\u205f獡栠痴湩\u2067\u0a61††数欢漠\u2066⸵攸ㄭ‶楷桴愠\u206e浲\u2073景ㄠ㐮ⵥ㘱⠠\u206e‽ 〳〰⤰ਮ\u200a†删晥牥湥散ੳ††ⴭⴭⴭⴭⴭ\u200a†⸠\u202eㅛ⁝\u2e43圠\u202e汃湥桳睡\u202c䌢敨祢桳症猠牥敩\u2073潦\u2072慭桴浥瑡捩污映湵琐潩狝Ⱒ椠੮†††††⨠慎楴汤污倠票楳惯\u206c慌潢慲潴祲䴠瑡敨慭楴惯\u206c慔汢獥Ⱚ瘠汯\u202eⰵ䰠汤润㩮\u200a†††††效\u2072愠敪瑳❹\u2073瑓瑡潩敮祲传晦捩ⱥㄠ㘹⸲\u200a†⸠\u202e㉛⁝\u2e4d䄠牢浡睯莹⁺湡\u2064\u2e49䄠\u202e瑓来湵\u202c䠪湡扤浔\u206b景䴠瑡敨慭楴惯੬†††††䘠湵琐潩狝Ⱚㄠ琰\u2068牰湩楴杮\u202c敎⁷沩岁›漱敶Ⱳㄠ㘹ⰴ潆\u2e70㌠㤷ਮ†††††栠瑴㩰⼯睷\u2e77慭桴献畦挮⽡捾浢愯湡獤瀯条彦㜳⸹瑨੭††⸮嬠崳栠瑴㩰⼯潫敢敳牡档挮慰\u2e6e牯⽧瑨润獣䴯瑡\u2d68暋桰獥䴯瑡⽨暋桰獥栮浴੬\u200a†䔠慸灭敬ੳ††ⴭⴭⴭⴭ\u200a†㸠㸾渠\u2e70ど嬨⸰⥝\u200a†愠牲祡ㄨ〮\u0a29††㸾‾灮椮⠰せⰮㄠ\u20 2e\u202b樲⥝\u200a†愠牲祡嬨ㄠ〮〰〰〰⬰⸰\u206a†††Ⱐ†⸰㠱㠷㌵㌷〫㘮㘴㘱㐹樴⥝ਊ††']

(yes that apparently all came from 'string', don't ask me how)

+1

我刚试过'阵列(ctypes.c_wchar_p,[ '字串'])'和它似乎在Windows上的Python 3.5.3 ... – jdehesa

+0

@jdehesa是我正要更新一些有关的信息,我的输出是不一致的 - 偶尔我得到字符串回来(但他们是胡言乱语),但大多数时候我只是得到一个弹出窗口说,Windows已经崩溃 –

+0

@jdehesa我已经添加了一个完整的程序的更新,这是否适合你? (我也在windows上) –

回答

2

The problem that you are having is mentioned in the documentation :

Note: Although it is possible to store a pointer in shared memory remember that this will refer to a location in the address space of a specific process. However, the pointer is quite likely to be invalid in the context of a second process and trying to dereference the pointer from the second process may cause a crash.

This means that storing pointers (like strings) is not going to work, because only the address will get to the child process, and that address will not be valid anymore there (hence the segmentation fault). Consider, for example, this alternative, where all the strings are concatenated into one array and another array with the lengths is passed too (you can tweak it to your convenience):

from multiprocessing import Process, Lock 
from multiprocessing.sharedctypes import Value, Array 
import ctypes 

def print_strings(S, S_len): 
    """Print strings in the C array""" 
    received_strings = [] 
    start = 0 
    for length in S_len: 
     received_strings.append(S[start:start + length]) 
     start += length 
    print("received strings:", received_strings) 

if __name__ == '__main__': 
    lock = Lock() 
    my_strings = ['string1', 'str2'] 
    my_strings_len = [len(s) for s in my_strings] 
    string_array = Array(ctypes.c_wchar, ''.join(my_strings)) 
    string_len_array = Array(ctypes.c_uint, my_strings_len) 
    q = Process(target=print_strings, args=(string_array, string_len_array)) 
    q.start() 
    q.join() 

Output:

received strings: ['string1', 'str2'] 

About addresses in subprocess:

This is a bit off topic of the question, but it was to long to put into a comment. Honestly this starts to be out of my depth, take a look at eryksun 's comments below for more informed insights, but here's my u nderstanding anyway. On Unix(-like) a new process created through fork has the same memory and (virtual) addresses than the parent process, but if you then exec some program that's not the case anymore; I don't know if Python's multiprocessing runs an exec or not on Unix (note: see eryksun's comment for more on this and set_start_method), but in any case I wouldn't assume there is any guarantee that any address in the Python-managed memory pool should stay the same.在Windows上,CreateProcess从一个可执行文件创建一个新的进程,该文件原则上与父进程没有共同之处。我不认为即使是多进程使用的共享库(.so/.dll)也应该位于任一平台的相同地址。我不认为共享内存之间共享(虚拟)地址甚至在使用共享内存时有意义,因为如果我记得正确(我可能不会),共享内存块映射到每个进程上的任意虚拟地址。所以我的印象是,没有什么好的理由(或至少是“好的和明显的”)与子进程共享地址(当然,在同一进程中与​​的指针类型仍然可以与本地库进行对话)。

正如我所说的,我对此并非百分之百有信心,但我认为总体思路就是这样。

+0

好的答案,我会想象在进程之间的共享内存将意味着有效的指针。内存地址是直接传递还是指向不应该有很大区别?只要内存仍然可以访问。或者内存地址在进程之间有所不同? (显然有我缺少的东西) –

+0

@哈里德温顿我打算写评论,但越来越长,我已经延长了答案。 – jdehesa

+2

从Python 3.4开始,多处理的启动方法(通过'set_start_method'设置)默认为POSIX系统上的“fork”,与线程结合时会出现问题。您可以将启动方法更改为“spawn”以获得类似于Windows的行为,并且还有第三个“forkserver”选项可以尝试提供两全其美的优点。 – eryksun

1

获得.raw.value的其他示例可以使用。每文档它仅适用于Array(ctypes.c_char,...)

from multiprocessing import Process 
from multiprocessing.sharedctypes import Value, Array 
import ctypes 

def print_strings(s): 
    """Print strings in the C array""" 
    print(s.value) 
    print(len(s)) 
    s[len(s)-1]=b'x' 

if __name__ == '__main__': 
    string_array = Array(ctypes.c_char, b'string') 
    q = Process(target=print_strings, args=(string_array,)) 
    q.start() 
    q.join() 
    print(string_array.raw) 

输出表示共享缓冲器被修改:

b'string' 
6 
b'strinx' 
+0

因此'c_char'只适用于存储为字节数组的单个*字符串? (比手动将字符串分割成字节更方便,但不能存储多个字符串?)(也是很好的答案) –